Exploiting Clustering Techniques for Web Session Inference - PowerPoint PPT Presentation

About This Presentation

Title:

Exploiting Clustering Techniques for Web Session Inference

Description:

Exploiting Clustering Techniques for Web Session Inference A.Bianco, G. Mardente, M. Mellia, M.Munaf , L. Muscariello (Politecnico di Torino) Outline Web Session ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 19

Provided by: mard8

Category:

Tags: clustering | document | exploiting | inference | session | techniques | web

Transcript and Presenter's Notes

Title: Exploiting Clustering Techniques for Web Session Inference

1
Exploiting Clustering Techniquesfor Web Session
Inference

A.Bianco, G. Mardente, M. Mellia, M.Munafò, L.
Muscariello
(Politecnico di Torino)

2
Outline

Web Session Model
Clustering techniques
The proposed algorithm
Performance of the algorithm
Session statistics

3
Web session definition

A single web client generates a succession of
TCP flows and think times

think time Toff
think time Toff

A session here is defined as the set of TCP
flows arriving close enough one to each other
For example a threshold can be used to
discriminate between think times and inter
arrivals of TCP flows

4
Algorithms

A threshold based approach needs a priori
knowledge of the source
An adaptive algorithm should be capable to catch
traffic variations
This is supposed to be less sensitive to traffic
characteristics
Clustering is the chosen approach

5
Proposed algorithm

Three steps
A K-means is used on all samples to obtain a
first clustering, K is chosen very large
A hierarchical clustering is used only on
representatives of each cluster, K is reduced
A K-means is used on all samples again
To test the algorithm we need a priori known
traffic, that is artificially generated

6
First Step K-means

K is chosen large enough but significantly
smaller than the number of samples
The K farthest flows determine the first
partition
K-means is performed 1000 iterations on all
samples
Each cluster is then represented using a subset
of samples, one or two in our algorithm
The mean value (Centroid method)
The gth and (100-g)th percentiles (Single linkage
method if g0)

g-th percentile
(100-g)-th percentile
7
Second step a hierarchical method

A hierarchical method is used on only
representatives
This method merges clusters until a quality
function determines that the optimal number of
clusters Nc has been found

8
Gamma function typical behaviour
9
Third Step K-means

A K-means is performed on all samples
This last step is not critical but rearranges
samples positions within clusters that is flows
within sessions
It is not CPU time consuming, than it is not
critical to use it

10
Performance evaluation

Artificial traffic is generated according to an
ON/OFF process
During ON periods a succession of flows is
generated using i.i.d. inter-arrivals
In this model inferring is to recognize if an
inter arrival is an OFF period or an inter
arrival between flows within an ON period
Every time the algorithm does not guess
correctly, an error is counted
Suppose all variables are exponentially
distributed

11
First step sensitivity (1/2)

If the initial number of clusters is chosen large
enough the method is less error prone
The algorithm is much more sensitive to the value
of the idle period

12
First step sensitivity (2/2)

Performance is sensitive to the choice of the
percentile g
When clusters are represented through flows at
the border of the session the method is less
sensitive to traffic, i.e. g1
This is due to the fact
that cluster has a long
and narrow shape and
those representatives
well model this fact

13
Comparison with threshold based algorithms
exponential case

Threshold based algorithms work well if traffic
characteristics are known
But they are very sensitive to the threshold
value
If sessions are already
well clustered because
idle periods are large
enough compared to
flows inter arrivals,
our algorithm is very
good

14
Comparison with threshold based algorithms
Pareto case

Threshold based algorithms work well if traffic
characteristics are known
But they are very sensitive to the threshold
value
If sessions are already
well clustered because
idle periods are large
enough compared to
flows inter arrivals,
our algorithm is very
good

15
Some statistics on aggregated sessions

The session sizes are heavy tailed (broadly)
Usually each session is made of a few TCP flows
Flow termination definition is not that important

16
Some statistics on aggregated sessions

Similar results concerning server to client and
client to server data
Similar distribution law, asymetries on volume
only

17
Flows and sessions inter-arrivals

The method infers session which are similar even
when considering very different traces
Tarr and Toff are well identified

18
Conclusions

Clustering techniques could be easily used to
infer web-session
The proposed algorithm is a mix a known
clustering approaches
It is able to deal with huge amount of data
Sessions seems to be very well recognized

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Creating and Exploiting a Web of Semantic Data PowerPoint PPT Presentation

Creating and Exploiting a Web of Semantic Data - Creating and Exploiting a Web of Semantic Data Tim Finin University of Maryland, Baltimore County joint work with Zareen Syed (UMBC) and colleagues at the Johns ... | PowerPoint PPT presentation | free to view

Data Mining meets the Internet: Techniques for Web Information Retrieval PowerPoint PPT Presentation

Data Mining meets the Internet: Techniques for Web Information Retrieval - The jaguar, a cat, can run at. speeds reaching 50 mph. The jaguar has a 4 liter engine ... engine jaguar. cat. jaguar. Repository. Documents in repository. 5 ... | PowerPoint PPT presentation | free to view

Opportunities and Challenges of Web Search and Mining PowerPoint PPT Presentation

Opportunities and Challenges of Web Search and Mining - Title: Ongoing Research Author: Lee-Feng Chien Last modified by: wkd Created Date: 4/24/2002 1:15:34 PM Document presentation format: | PowerPoint PPT presentation | free to view

Computational Web Intelligence for Wired and Wireless Applications PowerPoint PPT Presentation

Computational Web Intelligence for Wired and Wireless Applications - Title: No Slide Title Author: Kurt Jack Wayne Last modified by: cscyqz Created Date: 4/4/2001 7:25:26 PM Document presentation format: On-screen Show (4:3) | PowerPoint PPT presentation | free to view

Data Mining meets the Internet: Techniques for Web Information Retrieval and Network Data Management PowerPoint PPT Presentation

Data Mining meets the Internet: Techniques for Web Information Retrieval and Network Data Management - 1. Data Mining Meets the Internet. 6/22/09 ... The jaguar, a cat, can run at. speeds reaching 50 mph. The jaguar has a 4 liter engine ... | PowerPoint PPT presentation | free to view

Web Mining : A Birds Eye View PowerPoint PPT Presentation

Web Mining : A Birds Eye View - mining techniques to discover interesting usage patterns from the secondary data ... Web Usage Mining ... Customized Usage Tracking. Adaptive Sites (Perkowitz ... | PowerPoint PPT presentation | free to view

Web Mining : A Bird PowerPoint PPT Presentation

Web Mining : A Bird - Web Mining : A Bird s Eye View Sanjay Kumar Madria Department of Computer Science University of Missouri-Rolla, MO 65401 madrias@umr.edu | PowerPoint PPT presentation | free to view

Semantic Web: Customers and Suppliers PowerPoint PPT Presentation

Semantic Web: Customers and Suppliers - 95/27 sub/acc, 4 tutorials. ISWC2003: Sanibel Island, FL, US ... Description Logic (FaCT, Racer, Pellet, KAON2) Logic Programming (Ontobroker) ... | PowerPoint PPT presentation | free to view

Computational Web Intelligence for Wired and Wireless Applications PowerPoint PPT Presentation

Computational Web Intelligence for Wired and Wireless Applications - WI = AI IT ... crisp logic and rules in AI, and fuzzy logic and rules in CI (Zadeh) ... Machine Perception and Artificial Intelligence, volume 58, World ... | PowerPoint PPT presentation | free to view

Web Usage Mining for EBusiness Applications PowerPoint PPT Presentation

Web Usage Mining for EBusiness Applications - Despite its success, one problem of the current WWW is that much of this ... Need to look for the shortest backwards path from E to C based on the site topology. ... | PowerPoint PPT presentation | free to view

Measurement, Modeling, and Analysis of a PeertoPeer FileSharing Workload PowerPoint PPT Presentation

Measurement, Modeling, and Analysis of a PeertoPeer FileSharing Workload - Non-Zipf workloads are also observed in web proxy caches and VoD servers ... Hit rate of the proxy cache decreases in the fetch-at-most-once case ... | PowerPoint PPT presentation | free to view

Information Retrieval PowerPoint PPT Presentation

Information Retrieval - Robots explore the entire website in breadth first fashion. Humans access web-pages in depth first fashion. Tan and Kumar (2002) discuss more techniques ... | PowerPoint PPT presentation | free to view

RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System PowerPoint PPT Presentation

RAMSES Regeneration And iMmunity SErviceS: A Cognitive Immune System - Data structures and functions within program. Used by program components to talk to each other ... Attack: Removes all removable files in web server document ... | PowerPoint PPT presentation | free to view

Introduction to Online Marketing Intelligence PowerPoint PPT Presentation

Introduction to Online Marketing Intelligence - Online targeted advertising is ... and free gifts. with an online booking. 1 out of every. 2 people who ... Step 3: Classification based on the target variable ... | PowerPoint PPT presentation | free to view

Textual Entailment PowerPoint PPT Presentation

Textual Entailment - ... Clean Mag has a 100 percent pollution retrieval ... Normalization Date/Time arguments ... of Textual Entailment Systems Textual Entailment ... | PowerPoint PPT presentation | free to view

Textual Entailment PowerPoint PPT Presentation

Textual Entailment - Title: ACL Tutorial on Textual Entailment Author: Ido Dagan, Dan Roth, Fabio Zanzoto Last modified by: Fabio Created Date: 5/7/2002 3:19:09 PM Document presentation ... | PowerPoint PPT presentation | free to view

IT Basics for Supply Networks/4 PowerPoint PPT Presentation

IT Basics for Supply Networks/4 - Title: Pr sentationstitel Author: Withalm Josef Last modified by: withalm Created Date: 1/24/2006 2:04:33 PM Document presentation format: Bildschirmpr sentation | PowerPoint PPT presentation | free to view

Measurement PowerPoint PPT Presentation

Measurement - Produced and consumed in different systems. Usual scenario: large number of ... Packet delays: we do not have a 'chronograph' that can travel with the packet ... | PowerPoint PPT presentation | free to view

Bridging Bioinformatics and Chemoinformatics PowerPoint PPT Presentation

Bridging Bioinformatics and Chemoinformatics - 'While much bioscience is published with the knowledge that machines will be ... Enables the visualization of pre-clinical and clinical high-throughput data in ... | PowerPoint PPT presentation | free to view

i247: Information Visualization and Presentation Marti Hearst PowerPoint PPT Presentation

i247: Information Visualization and Presentation Marti Hearst - ... ordering of concepts (alphabetical) integration of browsing ... List more frequently ... Fruit Pineapple. Dessert Cake. Preparation Bake. Dessert ... | PowerPoint PPT presentation | free to view

Berkeley RAD Lab Technical Vision PowerPoint PPT Presentation

Berkeley RAD Lab Technical Vision - ... (S. Kawamoto) as low-cost prevention/repair strategies ... Root Cause: High DNS request rates generated by Spam Appliance triggered by mail surge ... | PowerPoint PPT presentation | free to view

Symbolic Supercomputer for Artificial Intelligence and Cognitive Science Research PowerPoint PPT Presentation

Symbolic Supercomputer for Artificial Intelligence and Cognitive Science Research - 1994 MAC/FAC simulations took weeks of CPU time ... w/Thomas Hinrichs, Jeff Usher, Matt Klenk, Greg Dunham, Emmett Tomai, Tom Ouyang, ... | PowerPoint PPT presentation | free to view

SSL/TLS PowerPoint PPT Presentation

SSL/TLS - CS 6431 SSL/TLS Vitaly Shmatikov | PowerPoint PPT presentation | free to view

The SAHARA Project: Composition and Cooperation in the New Internet PowerPoint PPT Presentation

The SAHARA Project: Composition and Cooperation in the New Internet - The SAHARA Project: Composition and Cooperation in the New Internet Randy H. Katz, Anthony Joseph, Ion Stoica Computer Science Division Electrical Engineering and ... | PowerPoint PPT presentation | free to view

Goals of IDS PowerPoint PPT Presentation

Goals of IDS - Example: Haystack. Let An be nth count or time interval statistic ... Haystack computes An 1. Then checks that TL An 1 TU. If false, anomalous. Thresholds updated ... | PowerPoint PPT presentation | free to view

Kno'e'sis Center: Overview PowerPoint PPT Presentation

Kno'e'sis Center: Overview - Kno'e'sis Center: Overview | PowerPoint PPT presentation | free to view

CrossLanguage Retrieval and Laboratory PowerPoint PPT Presentation

CrossLanguage Retrieval and Laboratory - Free Text CLIR. What to translate? Queries or documents. Where to get translation knowledge? ... Document translation. Rapid support for interactive selection ... | PowerPoint PPT presentation | free to view