Intelligent Detection of Malicious Script Code - PowerPoint PPT Presentation

About This Presentation
Title:

Intelligent Detection of Malicious Script Code

Description:

... and develop an effective structure for storing data and link it to webcrawler ... Webcrawler will be used to grab additional URLs, and Norton Antivirus will be ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 18
Provided by: kam9
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Detection of Malicious Script Code


1
Intelligent Detection of Malicious Script Code
  • CS194, 2007-08
  • Benson Luk
  • Eyal Reuveni
  • Kamron Farrokh
  • Advisor Adnan Darwiche
  • Sponsored by Symantec

2
Outline for Project
  • Phase I Setup
  • Set up machine for testing environment
  • Ensure that whitelist is clean
  • Phase II Crawling
  • Modify crawler to output only necessary data.
    This means
  • Grab only necessary information from webcrawling
    results
  • Listen into Internet Explorers Javascript
    interpreter and output relevant behavior
  • Phase III Database
  • Research and develop an effective structure for
    storing data and link it to webcrawler
  • Phase IV Analysis
  • Research and develop an effective algorithm for
    learning from massive amounts of data

3
Completed Tasks First Quarter
  • Phase I
  • Configured machine with Norton Antivirus and
    Heritrix web crawler
  • Webcrawler will be used to grab additional URLs,
    and Norton Antivirus will be used to verify that
    a URL has not launched an attack
  • Created a Python script to ensure that visited
    sites are clean
  • Captures Nortons web attack logs before and
    after loading a site in Internet Explorer, then
    compares the logs for new entries and signals
    whether or not a sites data should be discarded
  • Phase II
  • Configured Heritrix to run specific crawls that
    target a set of domains, and output minimal
    information
  • The purpose is to gather as many URLs with
    scripts as possible for a large sample base
  • Created a parser for Heritrix logs to filter out
    irrelevant websites
  • For example, we are omitting URLs that point to
    images since they will not contain scripts

4
Completed Tasks Second Quarter
  • Phase I
  • Whitelist integrated Symantec component to check
    whether visited site is malicious, so all of the
    data we gather is from clean sources
  • Hard drive installed a 750 GB hard drive

5
Completed Tasks Second Quarter
  • Phase II
  • Crawling We ran a shallow crawl with 200 domains
    as seed, and that is the current base of our
    data. The result was 18,500 URLs that we run
    through with our Script Listening component

6
Completed Tasks Second Quarter
  • Phase II
  • Script Listening received a customizable tool
    from Symantec that listens to the Javascript
    interpreter in Internet Explorer
  • We modified it to output the information we need
  • GUID -gt DISPID -gt ArgType -gt ArgVal

7
Completed Tasks Second Quarter
  • Example of data

DISPID (function) GUID (object) of Args Arg Type Arg Value
1030 3050f55f-98b5-11cf-bb82-00aa00bdce0b 1 BSTR 130
8
Completed Tasks Second Quarter
  • Phase III
  • The amount of data we have gotten is too large to
    use in a database. The pure text file is 4GB (50
    million function calls), and querying such a
    database is too slow on the computer we have.
  • Instead, we are storing the data as a text file,
    and doing operations on it with Python scripts.

9
Results and Findings Second Quarter
  • Phase IV
  • We have analyzed data from our first two result
    sets
  • Crawl with 5 initial seeds
  • 3,476,348 function calls
  • 109 distinct GUIDs, 7364 GUID-DispID pairs
  • Crawl with 15 initial seeds
  • 3,706,454 function calls
  • 95 distinct GUIDS, 5575 GUID-DispID pairs
  • Looked at most common functions, most common
    int-argument functions, and distribution of the
    argument values for these functions

10
Results and Findings Second Quarter
  • Function 1
  • GUID 3050f55d-98b5-11cf-bb82-00aa00bdce0b
  • GUID object name DispHTMLWindow2
  • DispID 1103
  • Most popular int-argument function in both result
    sets
  • Mostly random distribution, but signs of
    regularity
  • Results from two sets show significant differences

11
Results and Findings Second Quarter
12
Results and Findings Second Quarter
  • Function 2
  • GUID 3050f55f-98b5-11cf-bb82-00aa00bdce0b
  • GUID object name DispHTMLDocument
  • DispID 1013
  • Second most popular int-argument function in both
    result sets
  • Shows a regular distribution with distinct
    characteristics
  • Results from two sets show significant differences

13
Results and Findings Second Quarter
14
Results and Findings Second Quarter
  • Function 3
  • GUID 3050f51b-98b5-11cf-bb82-00aa00bdce0b
  • GUID object name DispHTMLIFrame
  • Dispid -2147418107
  • Third most popular int-argument function 1st
    result set, 95th most popular in 2nd result set
  • Shows a random distribution with distinct
    characteristics
  • Results are dramatically different between data
    sets
  • All arguments in the 2nd result set are 0

15
Results and Findings Second Quarter
16
Results and Findings Second Quarter
  • Found significant differences between the data
    sets in both the frequencies of specific
    functions, and the arguments of specific
    functions
  • Suspect that differences result from biases due
    to small amount of original seeds (5 and 15)
  • Ran a much broader crawl (200 seeds) in hopes of
    getting more general, unbiased results
  • Just from partial results of this crawl (roughly
    8000 websites), we have so far found
  • A much larger average of calls to our listener
    per website
  • A large percentage of function calls that take 0
    arguments
  • Will post complete results once crawl is finished

17
Direction for Next Quarter
  • Further analyze the gathered data for patterns
  • Compare trends in normal data to what occurs in
    malicious scripts
Write a Comment
User Comments (0)
About PowerShow.com