Modeling Computer Viruses Luite Menno Pieter van Zelst Abstract: Computer viruses tap into a primordial human fear: that of loosing control. The public fascination with viruses exceeds our fascination with other ways in which we lose control over our computers, for example random hardware faults. A probable cause is the perceived prevalence of computer viruses: one expert website indicates that in March 2008 the ‘NetSky’ virus was at 22 percent (!), although without enlightening the reader what that means [45]. Another reason may be the way that the virus disaster strikes: through infection. We feel that we ought to be able to prevent it. What simple rules should you and I follow to prevent our computers from being infected? Here is a common view: buy and install a virus scanner, adware scanner, email scanner, firewall, etc. On top of that, we might want to tell a commercial company what websites we visit, in order to prevent visiting ‘suspect sites’. In short, we are willing to hand over our money and our privacy to feel more secure. Still disaster may strike. And perhaps we have good reason to be pessimistic, for it has long since been established that “virus detection is undecidable”. However, we would like to impress upon you that such a claim is very imprecise and based on a mathematical abstraction. In its entirety, the claim should be read as: “given any machine of infinite capacity and any program, can we decide whether the program is a virus for that machine?” The claim then is that we cannot. This might lead us to be overly pessimistic when we consider the question whether for a specific type of computer we can detect viruses within a specific class of programs. The claim that virus detection is undecidable is based on one seminal work. In 1985 Fred Cohen wrote his doctoral thesis entitled “Computer viruses”. He was the first to use the words “computer viruses” and he proved the undecidability of virus detection. Cohen based his ‘undecidability’ result on a formal model of computer viruses. In his model, viruses are represented as ‘sequences of symbols’ and computers are represented as ‘Turing machines’. Are these representations appropriate? For if they are not so, Cohen’s undecidability result is on loose ground. The purpose of this thesis is to examine the following question: How can we adequately model computers and computer viruses? To that end we ask ourselves: Are computers adequately represented by Turing machines? What ingredients are essential for modeling computers? What are computer programs? What are computer viruses?