Title: Highthroughput LinkedPattern Matching for Intrusion Detection System
1High-throughput Linked-Pattern Matching for
Intrusion Detection System
- Zachary Baker
- and
- Viktor K. Prasanna
- University of Southern California
- http//ceng.usc.edu/prasanna
2Outline
- Introduction to Intrusion Detection and hardware
pattern matching - Performance-centered design flow
- Area, performance over large rule databases is
more important - Methodology
- Library of architectural options
- Separate pre-decoded pipelines
- Basic architecture results
- Customized performance
- Partitioning
- Prefix trees
- Correlated Content
- Tool details
- Reducing PAR cost through incremental synthesis
- Handling multiple streams through re-sending
- Efficiency of re-sending strategy
- Conclusion
3IDS Intrusion Detection Systems
- All incoming packets are filtered for specific
characteristics or content - Current databases have thousands of patterns
requiring string matching - FPGA allows fine-grained parallelism and
computational reuse - 10 Gb/s and higher rates desired
- This is an fairly artificial bound, header
processing can reduce overall string matching
burdens - Provided by pipelined, streaming architectures
4String Matching
- Throughput of units must equal maximum buffered
traffic on network - Current Strategies
- Naïve approach
- Hashing ala Bloom filters
- KMP, Boyer-Moore, Aho-Corasick (especially
bit-splitting, Eatherton trie) - Hardwired shift-and-compare
- Very fast and simple units
- Allows variety of interesting meta-layer work to
be tacked on
5Performance-Centered Design Flow
- System-centric view
- Several thousand pattern matching rules
- Unit view of matching units can be inefficient
- Unnecessary work and hardware
- Methodology works towards time and area
performance metrics - System level design
- Reuse of hardware elements
- Option to exchange some area efficiency for
bandwidth - Allows for patterns to be combined together to
extract deeper information
6Tool Flow
7Library of Design Options
- Basic Architecture
- Shared decoding
- Characters are decoded into one-bit pipelines and
distributed to units as needed - Correlated Content Linkages
- Reduces false-positives
- Reduces burden on external controller
- Customized performance
- Prefix trees
- Take advantage of partition-generated similarity
by creating shared prefixes - High throughput architectures
- Take advantage of low area requirements to
replicate hardware
8Basic ArchitecturePartitioned Pipelines
9Basic Architecture Pre-Processing
- System-level partitioning of patterns
- Reduction in pipeline burden through min-cut
partitioning - Shared characters are grouped into independent
pipelines, increasing single-chip throughput and
allowing for effective multi-chip partitioning - Tool first generates graph representation of
pattern set, then executes partitioning routine.
The partitioned graphs are then translated into
architecture description - Partitioning also useful in reducing PR time
10Results - Basic Architecture
- Partitioning results in 20 increase in clock
frequency - Optimal number of partitions is
unpredictable
11Flow Options Area-Efficient Tree Architecture
- 4-byte prefixes turn out to be very appropriate
for intrusion detection - /cgi-bin/bigconf.cgi/cgi-bin/common/listrec.pl/
cgi-sys/addalink.cgi/cgi-sys/entropysearch.cgi - Script searches for 4-byte prefixes and
sub-prefixes then generates prefix matchers in
hardware description. - Essentially hardware Aho-Corasick tree
Average of 15 decrease in area, 5 decrease in
clock period over plain unary
12Correlated Content Layer
- Link together pattern matchers
- Form state machines from low-level comparators
- form higher-level ideas
- basic regular expressions are available
- (alert tcp EXTERNAL_NET any -gt HTTP_SERVERS
HTTP_PORTSmsg "attackpattern" content
"attack" content "pattern" within 5)
13Correlated Content Layer
- Benefits
- reduces burden on external controller
- reduces number of inputs to priority encoder
- basic regular expression functionality
- AND, OR, !, within, distance, character classes
- Disadvantages
- Adds state that has to be maintained when streams
switch - But only the counters that are active
14Tool Details
- Implemented in Perl
- Text-oriented language for batch processing of
text (pattern databases)and generation of VHDL
outputs - Utilizes the Metis partitioning library (U/Minn)
- Template-based generation of architecture
descriptions - Graph Creation, Partitioning
- Run time 30 seconds for lt 2000 patterns
- Insignificant time costs compared to improvements
in performance - Place and Route processing times dwarf
architecture generation costs - Problem with all hardwired shift-and-compare
architectures
15Small Changes to Rule-set
- In normal flow, changes to a single character
would result in recompilation of the entire
design - Wasteful and a lengthy process
- In general, routing tools do not handle small
changes well - Reduced frequency performance
- Interaction of interconnect and mapping is highly
connected to performance - However, if blocks of architecture can be
physically separated on the device, interaction
is eliminated - Creates a smaller place and route problem
- Small changes can be integrated without full
recompilation
16Increasing Speed Place Route
- Key Predefinition of area constraints for
each partition - ideally the partitions are balanced
- Underutilization of device blocks makes meeting
timing constraints easier
17Increasing Speed Place Route
- Definitions
- The optimal partition is selected from the set
of partitions P - Sp is the set of characters required to
represent the new pattern p - is the set difference between the
characters currently represented in Pi and the
characters that are present in Sp - The partition which will require the addition of
the minimum number of new characters is the
optimal partition Pj
18Incremental Synthesis
- Goal Reduce place and route costs
- Using ISE 6.2 Incremental Synthesis support, each
partition has independent area constraints on
device - Change/addition in one partition does not affect
other placements partition - Cost for changing rules in one of k partitions
- 1/k guide file processing overhead
19System Packet Flow
- Packets are reordered and packet contents are
sent as stream to string matching units
20Suffix Resending
- If an attack spans multiple packets it will not
be detected if the system looks at packets on a
one-by-one - Packets must be condensed into a stream
- If time multiplexing is required some section of
the previous session can be pre-pended to the new
packets - Reserved section equal to the length of the
longest attack
21Suffix Resending
- The necessity to resend packets causes some
inefficiency in a multiple stream system - However, TCP and IP header overhead do not need
to be handled by the string matching system,
allowing for us to make up the difference - Average internet packet is 402 bytes long
- Longest attack in our survey of Hogwash database
is 257 bytes - TCP and IP headers equal 40 bytes
- Thus, if 7 packets are issued to the string
matching units at a time, the overheads are
equalized and efficiency is 100
22Overview
- Variations in tool flow provide customizable
performance - Tool Options
- Small partitioned and pre-decoded architecture
- Prefix trees
- Fast k-way architecture
- Potential reduction in hardware reconfiguration
time - Fast reconfiguration
- KMP architecture (FPGA 04)
- Meta-information can be extracted with
correlation - Thanks!
- For more information, http//ceng.usc.edu/prasann
a