Modeling Computer Viruses
Luite Menno Pieter van Zelst

Abstract:
Computer viruses tap into a primordial human fear: that of loosing
control. The public fascination with viruses exceeds our fascination
with other ways in which we lose control over our computers, for
example random hardware faults. A probable cause is the perceived
prevalence of computer viruses: one expert website indicates that in
March 2008 the ‘NetSky’ virus was at 22 percent (!), although without
enlightening the reader what that means [45]. Another reason may be
the way that the virus disaster strikes: through infection. We feel
that we ought to be able to prevent it.

What simple rules should you and I follow to prevent our computers
from being infected? Here is a common view: buy and install a virus
scanner, adware scanner, email scanner, firewall, etc. On top of that,
we might want to tell a commercial company what websites we visit, in
order to prevent visiting ‘suspect sites’. In short, we are willing to
hand over our money and our privacy to feel more secure.

Still disaster may strike. And perhaps we have good reason to be
pessimistic, for it has long since been established that “virus
detection is undecidable”. However, we would like to impress upon you
that such a claim is very imprecise and based on a mathematical
abstraction. In its entirety, the claim should be read as: “given any
machine of infinite capacity and any program, can we decide whether
the program is a virus for that machine?” The claim then is that we
cannot. This might lead us to be overly pessimistic when we consider
the question whether for a specific type of computer we can detect
viruses within a specific class of programs.

The claim that virus detection is undecidable is based on one seminal
work. In 1985 Fred Cohen wrote his doctoral thesis entitled “Computer
viruses”. He was the first to use the words “computer viruses” and
he proved the undecidability of virus detection. Cohen based his
‘undecidability’ result on a formal model of computer viruses. In his
model, viruses are represented as ‘sequences of symbols’ and computers
are represented as ‘Turing machines’. Are these representations
appropriate? For if they are not so, Cohen’s undecidability result is
on loose ground.

The purpose of this thesis is to examine the following question: How
can we adequately model computers and computer viruses? To that end we
ask ourselves: Are computers adequately represented by Turing
machines? What ingredients are essential for modeling computers? What
are computer programs? What are computer viruses?