Title: DataDriven Malware Detection within Microsoft Word Documents
1Data-Driven Malware Detection within Microsoft
Word Documents
- Wei-Jen Li,
- Salvatore J. Stolfo,
- Angelos Stavrou,
- Elli Androulaki
- Department of Computer Sciente,
- Columbia University
2Motivation
- Threat
- Modern document formats (Word, PowerPoint, PDF)
are code injection platforms - Malcode is EASY TO EMBED in documents and is HARD
TO DETECT - Many forms of malicious code Macros, JavaScript,
arbitrary object code - Anywhere in a document Tabular data structure,
padding area - Unlike viruses and worms, they dont propagate
and spread by themselves - Passive driven-by fashion website (Wikipedia) or
search engine with attractive information - Email, IM
- CD-ROMS and USB drivers
- Current Solutions
- General AV scanners
- Look for specific strings as signature for each
attack - No zero-day detection
- Firewall
3Embed Malicious Content
4Goal
- Project goals
- To detect malicious documents
- To locate the malcode embedded in documents
- Zero-day detection
5Approach 1 Static Analysis
- File binary content analysis
- N-gram based statistical modeling technique
- SPARSEGui program
Anagram Bloom filter
1-class anomaly detection
2-class mutual-information
Bloom filter
1 0 0 1 0 0 0 0 1 0 1 1 1
Current status Performs as well as COTS AV
scanners
6Approach 2 Dynamic Run-Time Tests (Sandbox)
Open/execute documents in multiple, diverse
virtual environments Coarse-grained detect
anomalous behavior Fine-grained detect attacks
on a programs vulnerabilities No complex
programming, easy to duplicate
Current status 100 detection accuracy
7Future Work
- More static and dynamic analysis
- Hybrid
- Combine both static and dynamic approaches
- Collaborate with other mechanism