Title: ForNet: A Distributed Forensic Network
1ForNet A Distributed Forensic Network
- Kulesh Shanmugasundaram
- http//isis.poly.edu/projects/fornet/
2Security Fails.
3What Mechanisms Do We Have?
- Logging mechanisms and audit trails
- Alerts/Logs from security components
- Logs only perceived security threats
- Host Logs
- First thing to get disabled upon intrusion
- An insider would rather use a host without any
logs - Mobility, wireless networks create new problems
- Packet Logs
- Usually at the edges hence blinded easily
- Cant keep data for long
- E.g Infinistream, NFR, NetWitness, SilentRunner
4What do we need?
- A reliable mechanism for analysis and
attribution - An effective response model
- Current response model is mostly manual
- Response time is in days or weeks
- Digital evidence disappears quickly
- Tools that complement security components
- Need some safety net to fall back
- Forensics is that net
5Challenges Facing Network Forensics
- Lack of Infrastructure
- For data collection, archival, and dissemination
- Volume of Data
- Prolonged storage, processing, and sharing of raw
data infeasible - Even a network of 3000 hosts have a 1TB/day
requirement! - Process is Manual
- Spans multiple administrative domains
- Response times very long (digital evidence
disappears) - Unreliable Logging Mechanisms
- Host logs are usually compromised
- Growing support for mobility makes it difficult
to maintain prudent logging policies on hosts
6Our Solution
- Securely collect, store, disseminate, and process
synopsis of network traffic. - In other words a device analogous to a
surveillance camera for the network. Even better
a co-operating network of such surveillance
cameras. - Goal of Project ForNet development of tools,
techniques, and infrastructure to aid rapid
investigation and identification of cyber crimes
7ForNet A Unified, End-to-End Approach to Network
Forensics
8Design Goals and Trade-Offs
- Complete Correct Evidence
- Longevity
- Succinctness
- Accessibility of Evidence
- Security Privacy of Evidence
- Ubiquity
- Incremental Deployment
- Modular
- Scaleable Design
9Research Challenges
- Legally admissible evidence
- Chain of custody
- Privacy
- High-speed packet capture
- Storage systems
- Databases
- GUI
- Secure architecture
- Attacks on ForNet
- Use of ForNet
- Identification of events
- Architecture of ForNet
- Query Routing
- Fault Tolerance
- Design of Synopses
- Quantitative Analysis
10Components of ForNet
- Architectural components can be grouped under
following functions - Data Collection
- Data Retention
- Data Dissemination
11Architecture of a SynApp
12Synopses in ForNet
Currently being implemented
13Architecture of Forensic Server
14Forensic Server
- Forensic Server has the following functions
- Data archiving
- Query processing
- Policy management and enforcement
- Data Archiving
- Periodically receives data from SynApps
archives them on disk.
15Forensic Server (cont)
- Query Processor
- XML-based query language
- No relational databases (uses Berkeley DB)
- Does coalescence of synopses
- Given a query can locate appropriate synopses
- Generate and execute a query plan to answer the
query - Distributed query processing
- Currently being implemented to talk to nearby
forensic servers to answer/corroborate
queries/evidence - Policy Enforcer
- Two types of policies in ForNet
- Monitoring Policy informs user of what is being
monitored - Privacy Policy informs user of how/with whom
data is shared etc.
16Architecture of Panorama
- A user interface for
- Building queries
- Data mining
- Visualization
17Capturing Evidence on The Internet
- Internet is a gigantic state machine
- To support forensics we need to know its precise
state at any given time! - Therefore, we need to keep track of state
transitions - State transitions can be characterized by
- Links/connections between nodes
- Link content
- Protocol mappings and aliases
- Aggregates generated by state transitions
Archiving raw traffic for this information is an
overkill!
18What is a Synopsis?
- Data structures and algorithms for representing a
set of elements succinctly with predefined loss
in information and has the ability to recall the
original set of elements with a preset accuracy.
19Properties of a Good Synopsis
- Contains enough data to answer certain classes of
queries - Who sent payload xyz?
- What did host bug.poly.edu send?
- Contains enough data to quantify confidence of
its answers - Im 99.37 sure bug.poly.edu sent XYZ
- Have small memory footprint and easy to update
- Need 20GB/day to keep 1TB/day of raw network data
- Need to compute two hashes per packet
- Resource requirements are tunable
- Can only afford 3GB/day, adjust the accuracy to
accommodate this.
20Advantages of Using Synopses
Can retain potential evidence for months!
- Succinct representation of raw data makes it
possible to transfer network data to disks - Sharing/transferring raw data over network is
impossible but synopsis can be moved to remote
sites - Query processing would be expensive with raw data
- Whats the frequency of traffic to port 80 in the
past week? (raw data vs. a histogram) - Easily adaptable to various resource requirements
- Can adopt the size, processing requirements of a
Bloom Filter based on various hardware resources
and network load
Allows for cascading different techniques!
21Cascading of Synopses
22Packet Digests Bloom Filters
- Snoeren et. al. used it successfully in SPIE for
single packet traceback (Hash-Based IP
Traceback) - Space Efficient
- 16-bits per packet (m/n16) and 8 hashes (k8)
false positive (FP) 5.74 x 10-4 - No false negatives!
- However, suppose we dont have packets.
- We only have some excerpts of payload
- Dont know where the excerpt was aligned in the
packet
Extend Bloom Filters to support excerpt/substring
matching
23Block-based Bloom Filter
Insert each block into a Bloom Filter
24H1(q00)1 H2(q00)1 H3(q00)0
X
q0q1q2 was seen in a payload at offset s
25Offset Collisions
For query strings AD, CB, DR, AA etc.
BBF falsely identifies them as seen in the
payload!
Because BBF cannot distinguish between P1 and P2
26Hierarchical Bloom Filter
- An HBF is basically a set of BBF for
geometrically increasing sizes of blocks.
27Hierarchical Bloom Filter
- Querying is similar to BBF.
- Matches at each level can be confirmed a level
above.
28Adapting an HBF for ForNet
- So far an HBF can attest for the presence of a
bit-string in payloads - We need to tie this bit-string to a source and/or
destination hosts - Our Approach
- Similar to tying an offset to a block/bit-string
- In addition to inserting (blockoffset) also
insert (blockoffsethostid) - Hostid could be (srcIPdstIP)
29Coalescence of Synopses
- HBF requires
- Source IP, destination IP, excerpt
- But where do we get Source IP, Destination IP
- Connection Record/Neoflow
- Given two time intervals can give us list of
source, destination IPs - Coalesce these synopses
30ForNet in Intranet Usage
- Investigation based on payload characteristics
- Determine victims of worm, trojans and other
malware. - Detection of potential victims of phising and
spyware - Source of IP theft
- Investigation based on connection characteristics
- Detection of zombies
- Detection of malware based on connection pattern
- Investigation based on aggregate characteristics
- Insider abuse
- Network resource abuse
- Etc Etc
31ForNet Deployed on Internet
- Investigation based on payload characteristics
- Traceback based on partial content of single
packets - Source of malware, worms, etc.
- Investigation based on connection characteristics
- Stepping stone detection
- Investigation based on aggregate characteristics
- Attack attribution
32Current Status
- Implemented a PC based SynApp device for
placement within an intranet. - Connection records, HBF, DNS, NewFlow, Mappings
- Implemented Forensics Server with simple querying
capabilities. - Current Forensic Server has 1.3TB of storage with
over 3 months worth of data from the edge-router
and two subnets - Normal bandwidth consumption of network is about
a 1 2 TB/day - Synopses reduces this traffic to about 20GB/day
- A 4TB Forensic Server will take over operations
in July - Implemented Panorama (GUI Client)
33Tracking MyDoom
- Recorded all email traffic for a week
- Using HBF and raw traffic
- Was not aware of MyDoom during this collection
- When signatures became available we used them to
query the system - To find hosts that are infected in our network
- How the hosts were infected
- Some statistics
- 679 hosts originated at least one copy of the
virus - 52 of which were in our network
- These hosts sent out copies of the virus to 2011
hosts outside our network boundary
34Analyzing MyDoom Infections
35MyDooms Weekly Progress
36Neoflow-NG
- Keeps track of everything Neoflow has plus
- Individual packet sizes
- Be able to recall the packet sizes in the same
sequence - Inter-arrival times of packets
- Be able to recall the arrival times of individual
packets - Content type of flow
- Be able to recall types of content carried by
flows - Audio, Video, Plain-Text, Encrypted, Compressed
etc.
37Investigating Network Resource Abuse
38Total Network Bandwidth Composition
39Detection of Network Proxies
40Zombie Detection
- Are they any zombies in Poly network? And if so,
how do we identify them? - Criterion Hosts that portscan and use IRC at
likely zombies. - Tabulate the amount of IRC activity for each
host. - Tabulate the amount of portscan activity for each
host. - For each host ZOMBIE_LIKELIHOOD PORTSCANS
IRC_FLOWS - Sort all hosts by their ZOMBIE_LIKELIHOOD.
- List the top 200 hosts.
41Work currently in progress
- Identification of useful network events
- Network is the virtual crime scene that holds
evidence in the form of network events - Developing efficient synopses
- Handling connection oriented connectionless
traffic - Techniques for payload attribution (esp. IRC/IM)
- Automatic cascading of various synopsis
techniques - Analysis and mining techniques
- Zombie detection
- Stepping stone detection
42Work currently in progress
- Integration of information from synopses across
networks - Development of a protocol for secure
communication of various ForNet components - Query processing and storage of synopses
- A query language transparent of various
underlying synopsis techniques - A query manage system to interpret the language
for the underlying database - Various storage and garbage collection strategies
for collected-SynApps - Storage and query processing infrastructure for
Forensics Servers
43Research Challenges
- Integration of information from synopses across
networks - Real power of ForNet is realized when information
from SynApps is fused to answer queries - Development of a protocol for secure
communication of various ForNet components - Query processing and storage of synopses
- A query language transparent of various
underlying synopsis techniques - A query management system to interpret the
language for the underlying database - Analysis
- A frame-work to compare effectiveness/trade-offs
of synopses
44Attacks on ForNet
- Resource Exhaustion
- Flood the network with random bits of data
- Malicious Transformation
- Create packets of length (blocksize 1)
- Stuffing
- Stuff every other block with application
dependent escape characters - For smaller blocks we can try to guess for larger
blocks it is not possible! - Exploiting Collisions
- Hash collisions
- Very unlikely for strong hash functions
- We use a random seed for every HBF so it makes it
more difficult - Packet collisions
- A possibility in Block-based Bloom Filters but
not in HBFs - Streaming transformations
- Encryption, compression
45For more information visit http//isis.poly.edu/p
rojects/fornet/