Title: Academic Advisor: Dr' Yuval Elovici
1- Academic Advisor Dr. Yuval Elovici
- Technical Advisor Dr. Lidror Troyansky
2A little bit of background
- PortAuthority Offers Businesses the Opportunity
to Gain Insight Into Their Information Leak
Vulnerabilities. - 70 of Information Leaks are InternalMost
organizations focus on preventing outside-in
security breaches, but industry analysts argue
that up to 70 of security breaches occur from
the inside-out. Information leaks of private and
confidential information create a growing threat
to any size organization. - Example of file sharing information leaks
http//www.ynet.co.il/articles/0,7340,L-2875208,0
0.htmlAir force officer in the IDF suspended
over sharing confidential army documents
3Where do we fit in?
- P2P Networks.
- Gnutella, Gnutella2, Bittorrent, eDonkey2000,
Kadmelia. - P2P networks are typically used for connecting
nodes via largely ad hoc connections. - Sharing content files containing audio, video,
data or anything in digital format is very common
(including confidential information). - Real-time data, such as VOIP, is also passed
using P2P technology.
4Where do we fit in?
Continued
5Our Project
- Develop a system which will
- Be able to configure the scanning parameters.
- Scan the P2P networks.
- Download files suspicious as confidential.
- Analyze the material using Machine Learning.
- Generate reports.
- Produce statistics.
6Architecture
7Main Functional Requirements
- Scanning and looking for suspicious target (e.g.
as confidential) information in the P2P network
(Gnutella).
8Main Functional Requirements
Continued
- Downloading the suspicious target (e.g. as
confidential) information from the P2P network
(Gnutella).
9Main Functional Requirements
Continued
- Analyzing the scanned results (determine the
value of the documents). - The system will use the Learning Machine based on
the filtering algorithm to classify the
documents.
10- Bayesian filtering is the process of using
Bayesian statistical method to classify documents
into categories. - Bayesian filtering gained attention when it was
described in the paper A Plan for Spam by Paul
Graham, and has become a popular mechanism to
distinguish illegitimate spam email from
legitimate "ham" email. - Bayesian filtering take advantage of Bayes'
theorem, says that the probability that a
document is of a certain group (confidential
documents), given that it has certain words in
it, is equal to the probability of finding those
certain words in a document from that group
(confidential documents), times the probability
that any document is of that group (confidential
documents), divided by the probability of finding
those words in any Group
11Main Functional Requirements
Continued
- Statistics Gathering
- The number of users which currently hold the
target information. - Using IP Geolocation and finding out the
geographic location of the leaked information. - The history of searched for, downloaded
analyzed files.
12Main Functional Use Cases
13Main Functional Use Cases
Continued
Scan network - Use Case Diagram
14Main Functional Use Cases
Continued
Analyze downloaded files - Use Case Diagram
15Main Functional Use Cases
Continued
16Non-Functional Requirements
- Performance constraints
- The system should return a search result for
suspicious target after no more than 15 minutes. - The system timeout for downloading should be
configurable. - The system should hold history result and
statistics of not more than one year ago.
17Non-Functional Requirements
Continued
- Safety and Security
- The system will not be used for any other purpose
than find information leaks in P2P networks (e.g.
to find shared MP3 files). - The system will not expose the confidential
documents it downloads and the documents that
were used in the Machine Learning algorithm.
18Non-Functional Requirements
Continued
- Platform constraints
- OS Windows XP.
- Database MS SQL Server 2000.
- Programming languages (Restricted to Python,
Java/J2E, C and C)
19Risks
- Mainly a research project.
- Algorithm risk (Machine Learning).
- Is it good for confidential documents?
- Action to be taken
- Feasibility Study.
20What does successful mean?
21Risks
- Gnutella is an old network.
- May not contain confidential information.
- Action to be taken
- Test suite.
- Use a different P2P network.
22Epilogue
- ??????' "???? ?????? ?? ???? ??? ?????? ?????
???..." - ???? ???????? ????
- www.cs.bgu.ac.il/amirf/AMOS