Title: Network Security Monitoring and Analysis based on Big Data Technologies
1Network Security Monitoring and Analysis based
on Big Data Technologies
August 26, 2013
2Outline
- Motivation
- Objectives
- System Design
- Monitoring and Visualization
- Network Measurement
- Classification and Identification of Network
Objects - Conclusion
- Future Work
3Motivation
- Traditional security systems assume a static
system - Network attacks
- sophisticated
- organized
- targeted
- persistent
- dynamic
- external
- internal
4(No Transcript)
5Motivation
Cybersecurity, Big Data, Machine Learning
6Motivation
- Problem Network Security is becoming more
challenging - Resource A Large Amount of Security Data
- Network flow
- Firewall log
- Application log
- Server log
- SNMP
- Opportunity Big Data Technologies, Machine
Learning
7Objectives
- A network security monitor and analysis system
based on Big Data technologies to - Measures the network
- Real time continuous monitoring and interactive
visualization - Intelligent network object classification and
identification based on role behavior as context
8Objectives
Network Security
Big Data
Machine Learning
9System Design
- Components
- Data collection
- Data storage
- Security gateway
- Data processing
- User Interfaces
- Features
- Monitoring and visualization
- Measurement
- Intelligent analysis
10(No Transcript)
11System Design
12System Design
13System Design
14System Design
15(No Transcript)
16System Design
- The Design supports features
- Real Time Continuous Monitoring and Interactive
Visualization - Network Measurement
- Classification and Identification of Network
Objects
17Monitoring and Visualization
- Real Time
- response within a time constraint
- Interactive
- involve user interaction
- Continuously
- continue to be effective overtime in light
of the inevitable changes that occur - (NIST)
18Monitoring and Visualization
- Retrieve Data
- Web User Interfaces
- Video Demo
19Monitoring and Visualization
- Data Retrieving
- Data are stored with IP as primary key and time
slice as the secondary key in column - Accessing these data is in ? (1)
20Real Time Querying
21Host Network Connection
22Network Status
23Top N
24Demo of Interactivity and Continuity
Video Demo
25Network Measurement
- A case study
- The Anonymity Technology Usage on Campus Network
- Using sFlow
- Geo-Location
- Usage of Anonymity Systems
26Geo-location of Anonymity Usage on Campus
- One Instance Bahamas, Belarus, Belgium,
Bulgaria, Cambodia, Chile, Colombia, Estonia,
Ghana, Greece, Hungary, Ireland, Israel, Jamaica,
Jordan, Korea, Mongolia, Namibia, Nigeria,
Pakistan, Panama, Philippines, Slovakia, Turkey,
Ukraine, Vietnam, Zimbabwe - Two Instances Chad, ChezchRep, Denmark,
Hongkong, Iran, Japan, Kazakhistan, Poland,
Romania, Spain, Switzerland - Three Instances Austria, France, Singapore
- Four Instances Australia, Indonesia, Taiwan,
Thailand
27Usage of Anonymity Systems
Packets () Traffic (MB ) Observed IPs ()
Proxies 5,580 (62.65) 8.13 (43.53) 234 (3.23)
Tor 3,129 (35.13) 9.04 (48.37) 152 (0.25)
I2P 190 (2.13) 1.50 (8.02) 23 (1.01)
Commercial 7 (0.08) 0.016 (0.08) 2 (N/A)
Total 8,906 (100) 16.69 (100) 411 (N/A)
28Classification of Host Roles
- Data Three months sFlow data from a large campus
Role Count
Client 5494
Server 1920
Public Place 784
Personal Office 416
College1 163
College2 253
Web Server 56
Web Email Server 25
29Classification of Host Roles
- Algorithms
- Decision Tree
- On-line SVM
30Classification of Host Roles
- Features
- Ad hoc based on domain knowledge
- Aggregating features for on-line classification
- 24 features normalized between 0 and 1, inclusive
31Classification of Host Roles
- Features
- 24 features derived from
- src/dest IP address
- src/dest Port number
- TTL
- Package Size
- Transport protocol
32Classification of Host Roles
- Ground Truth
- Host Information in Active Directory
- Crawler to validate its status
33Classification of Host Roles
- Classifying Client vs. Server
- Classifying Web Server vs. Web Email Server
- Classifying Hosts at Personal Office vs. Public
Place - Classifying Hosts at Two Different Colleges
- Feature Contributions
34Classifying Client vs. Server
35Classifying Web Server vs. Web Email Server
36Classifying Host From Personal Office vs. Public
Place
37Classifying Host From Two Different Colleges
38Accuracy
- High accuracies of Host Role Classification
Classification Accuracy ()
Clients vs. Server 99.2
Regular web server vs. Web email server 100
Hosts from personal office vs. public places 93.3
Host from two different colleges 93.3
39Feature Contribution
40Identification of a User
- Data NetFlow data from a large campus
Count
College1 163
College2 253
41Identification of a User
- Algorithms
- Decision Tree
- On-line SVM
- Ground Truth
- Host Information in Active Directory
- Crawler to validate its status
42Identification of a User
- Features
- Discrete probability distribution function (pdf)
- An Example
- System Port Number 6, 8, 9, 11, 14, 30, 80,
1020 - Outliner (P) is 1,
- 80 is the interested port (S)
- Number of bin 4 ( R )
43Identification of a User
- An Example
- (1-0.01) 8 to 7, the 7th is 80,
- bin slice size 80 / (4-1) 26.6
- 6, 8, 9, 11, 14, 30, 80, 1020
- pdf 0.625 0.125 0.125 0.125
30
6,8,9,11, 14
80
1020
44Identification of a User
- An Example without P and S
- Bin size slice is 1024/4 256,
- 6, 8, 9, 11, 14, 30, 80, 1020
- pdf 0.875 0 0
0.125
6,8,9,11, 14,30,80
1020
45Identify a User Among Other Users
46Accuracy
- Identifying a particular user among other users
- Decision Tree 93.3
- On-line Support Vector Machine 78.5
47Feature Contribution
48Conclusion
- Major Contributions
- A Big Data analysis system
- a conference paper
- Monitoring and interactive visualization
- Usage of anonymity technologies
- a conference and a journal paper
- Models of classification of host roles and
identification and users - a conference paper
49Conclusion
- The Big Data analysis system is high performance
and scalable - Real Time Continuous Network Monitoring and
Interactive Visualization are implemented and
supported by the high performance system
50Conclusion
- Proxies and Tor are main anonymity technologies
used on campus - US, Germany, and China are the top 3 countries
- Models and Features for Classification of Host
roles - client vs. server, non-web server vs. web server,
personal office vs. public office, from two
different colleges - Models of Features for Identification of a
particular user among other users
51Future Work
- Improvement to the Current Work
- More interactive features and better user
interfaces - Further analysis on user identification
features, algorithm (such as deep learning)
52Future Work
- Extension to the Current Work
- Define and filter out background traffic
- Detection of operating system fingerprinting
- Identity anonymity
- Fusion with other network security data source
53Future Work
- Vision
- To Provide network security as a service for
individuals, small businesses, or government
offices