Title: Deep Inside an AntiVirus Engine Network Associates, Inc. Jimmy Kuo Director, AV Research cjkuo@alumni.caltech.edu
1 Deep Inside anAntiVirus EngineNetwork
Associates, Inc.Jimmy KuoDirector, AV
Researchcjkuo_at_alumni.caltech.edu
Stanford, 16MAR99
2Agenda
- Short description of viruses
- Environments
- Purposes of an antivirus engine
- Detection technologies
- Virus removal technologies
- Wrap-up
3 4Virus Types
- File Viruses
- Com, Exe (DLL, VxD), Bat, Sys, mIRC, Html
- Boot Viruses
- Boot Sector
- Master Boot Records
- Macro Viruses
- Word, Excel, PowerPoint, Access
- Multipartite
540000
- Virus growth through the years
- Dr Solomons count of viruses and trojans
6Environment Determination
- PC
- OLE2 files
- Compressed files
- Self-extracting files
- .BAT files, mIRC script, VB Script
- UNIX filesystems
7Protocols
- FTP
- HTTP
- SNMP
- SMTP
- NNTP
- TCP/IP
- Mime, uuencode, SSL, PGP
8Purpose
- We deal with users with problems on their
computer, problems they do not know how to
handle. - 1. Relieve the panic.
- 2. Understand the problem.
- 3. Resolve what the user understands to be the
problem.
9McAfee (NAI) Mantra
- 1. Detect all viruses.
- 2. The program is running on a clean machine.
- 3. Dont give them a reason not to use your
product.
10 The Technology
11Signature Search
- Data organization
- In memory
- On disk
- Only things essential to detection are stored in
memory. Names, repair information, virus
information all stored on disk.
12Signature Organization, Case 0
- All strings kept in memory.
- All strings of the same type.
- Method died out when viruses neared 1000.
13Signature Organization, Case 1
- Split into virus types Boot viruses, File
viruses (Algorithmic detection, CRC detection),
Macro viruses. - Boot virus strings swapped to disk. Pull it in
only if target file looks like a boot image.
(55AA signature) - CRCs used for those viruses that dont change.
Keep verification information on disk.
14Signature Organization, Case 2
- All detection strings kept in memory.
- Sorted into separate bins.
- Only the particular bin that could contain the
virus string is stored in low memory. - All else stored in EMS or XMS.
- Virus removal information and names stored on
disk.
15Signature Organization, Case 3
- All detection strings stored in memory. Some are
classed as not necessary for the average user
and not used unless specifically requested. - All verification information stored on disk (or
EMS or XMS if available). - Strings sorted into groups which have common
start characteristics.
16Signature Search Algorithms
- Needs to be front end fast. If theres a
virus, it can take longer. But most things are
not viruses, so it should be as quick as possible
to determine that the target is not there. - No time allowed for front end setup.
- So, quick and simple wins out.
17Code Tracers
Simplified emulation, but faster. Static
emulation. Only have to know instruction length
and flow transfer statements.
18Code Tracing, Case 1
Given a target COM file, For specific cases of
known flow transfers (jmp, call, push/ret, minor
variations of such), Get to a fixed location,
start searching for viruses here. Problem cases
polymorphic entry code
19Code Tracing, Case 2
Given a target COM file, Trace code path through
all available paths, until out of buffer.
Remember opcodes. Use in opcode string
matching. First time out of buffer, trace again.
Remember opcodes again. Problem cases Appending
virus, appended to a small host.
20Code Tracing, Case 3
Given a target COM file, Organize your virus
database according to the different types of
entry code. Search against only those viruses
that use that type of entry code. This is the
current technique were using.
21Emulators
- Intel 80x86, primarily 8086.
- Now 80386 also needed.
- Portable.
- Apple emulation.
22Emulator Problems
- Prefetch queue length.
- How much of the environment do you include in the
emulation? - The perfect emulator takes too much time and
memory. - Result Emulate situations required for known
viruses. Needs upgrading to match reality.
23Code Matrix
- Matrix of opcode digraphs.
- Map the set of opcodes gathered from code trace
onto the opcode digraphs of known viruses. If it
does not match, it cannot be that virus. - Add digraph matrices together to save memory
space.
24Special Case Code
- Loop detection.
- Likely to need decrypting (emulate)
- Probability distribution (a particular virus uses
Rotates much too often). - Polymorphic viruses too difficult to handle
otherwise.
25OLE2 Files (Macro Viruses)
- An OLE2 file is a filesystem in a file.
- Its a proprietary format belonging to Microsoft.
- Cracking the OLE2 format was easy. Next comes
the Word document stream, the Excel spreadsheet
stream, WordBasic, Visual Basic, ...
26Other Macro Virus Issues
- Word6 macro encryption/protection is a single
byte XOR. Key is available in document. - Office97 macro protection is GUI only. Actual
code is not encrypted at all. - Excel95 password protection is almost trivial.
Uses 16 byte XOR key with minor on-the-fly
calculations.
27Other Macro Virus Issues...
- Office97 password protection against Open uses
MD5. Yuk! - PowerPoint97 streams stored as GZIP compressed
data streams. - WordBasic is tokenized language.
- VisualBasic is p-code. But theres a separate
compressed code body for Edit.
28Still More Macro Virus Issues
- VisualBasic5 now supported across other
applications! Soon, well have to crack other
file formats, not just OLE2. - VisualBasic6 coming out in next few months.
- Things can up-convert, some can downconvert.
- Emulators for all these languages!
29Virus Removal
- Must have sufficient variant determination.
- Bytes to cut.
- Where from.
- Where to retrieve original information.
- Where did the virus replace/take that original
info from? - Virus removal database does not need to stay in
memory.
30Varient Determination
- Variant determination.
- Different sizes.
- Different CRC values over different ranges.
- String found at different position.
- The specific variant does something unique. Need
to know this for user information.
31Side Effect Removal
- Virus Payloads affect
- Registry
- Added instructions in files. (AUTOEXEC.BAT)
- Additional files dropped.
- Things added to WIN.INI.
- Bad sector repair.
- Anything software can do.
32Speed Issues
Memory management. Memory, hard disk,
floppy. 640K memory, XMS, EMS, 32-bit memory,
memory swapping. Clean machine.
33Final Thought
Compare what was covered in this presentation
against an access control package. Project The
following files are allowed to be executed by
this set of people. AND NOTHING ELSE!
34Questions Answers
35Your Partner Against the Virus Problem