Clustering Web Access Patterns Using Fuzzy Clustering Algorithms - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Clustering Web Access Patterns Using Fuzzy Clustering Algorithms

Description:

Cluster similar web access traversal patterns and train the system to understand ... Greeting cards. Web Data. What information is important for Mining ... – PowerPoint PPT presentation

Number of Views:220
Avg rating:3.0/5.0
Slides: 19
Provided by: Kar7167
Category:

less

Transcript and Presenter's Notes

Title: Clustering Web Access Patterns Using Fuzzy Clustering Algorithms


1
Clustering Web Access Patterns Using Fuzzy
Clustering Algorithms
  • Kartik Menon

2
Overview
  • Goal
  • Web Mining
  • Web Data
  • Web Access Pattern Clustering
  • Clustering Techniques
  • Fuzzy C means
  • Experimental Set-up
  • Results
  • Conclusion and Future work
  • Questions

3
Goal
  • Cluster similar web access traversal patterns and
    train the system to understand the needs and
    demands of different users accessing the website.

4
Web Mining
  • Web Mining
  • Learning about different users accessing a web
    page.
  • The needs and requirements of the user
  • Web Access Traversal Patterns
  • Links which are more popular than others
  • For example www.yahoo.com
  • Emails
  • Search engine
  • News
  • Greeting cards

5
Web Data
  • What information is important for Mining
  • Links traversed (URLs requested)
  • Documents downloaded
  • Time spent
  • GET or POST messages
  • Web Log servers

6
Web Access Pattern Clustering
  • Find users with similar web access patterns
  • Grouping and separating users
  • Concise representation of a system's behavior

7
Clustering Techniques
  • Neural Nets
  • Self Organizing Maps (SOMs)
  • Statistical
  • K-Means
  • Fuzzy Logic
  • Fuzzy C Means
  • Fuzzy ISODATA

8
Fuzzy C Means
  • Is a data clustering technique where each data
    point belongs to a cluster to some degree that is
    specified by a membership function
  • If
  • X is a set of n data sample vectors
  • U is a partition of X in c part,
  • V are cluster centers
  • d2 is an inner product induced norm
  • u grade of membership of xk to the cluster i
  • between 0 and 1
  • m is a parameter to increase or decrease the
    fuzziness.

9
Fuzzy C Means (contd.)
10
Experimental Set-up
  • Target the website http//campus.umr.edu.
  • Mine the web log files for web data.
  • The main problem is to convert the web sites
    accessed into numeric values.
  • Identify all the URLs from where you can go from
    this web page
  • Number these URLs from 1 to N where N is the Nth
    URL which can be accessed
  • Assign fuzzy weights (w(j)) to each URL that can
    be accessed
  • A Boolean variable s(j) is defined which is set
    to 1 if the jth URL is accessed by the user else
    s(j) is set to null.

11
Experimental Set-up (contd.)
  • Define the data point x as the number
    corresponding to the for all the sites accessed
    by the user in that particular user session.
  • Apply fuzzy c-means by calculating Euclidean
    distance between the data sample as dijxj-ci
    where xj being the data point and ci being the
    center of cluster i.

12
(No Transcript)
13
(No Transcript)
14
Results - For 2 and 3 clusters
15
(No Transcript)
16
Conclusions
  • Fuzzy c-means is an easy way of clustering
    similar web access patterns for different user
    sessions
  • The use of Euclidean distance was very helpful to
    learn more about these web access patterns.
  • The experiment provided easy results and plots
    which was highly interpretable
  • We observe that that fuzzy c-means provided
    stable results for the different data sets we
    took.

17
Future Work
  • Use other clustering algorithms and compare
  • Developing self evolving web sites - sites that
    improve themselves by learning from user access
    patterns
  • The results which we got using the fuzzy
    clustering algorithms could be used to recommend
    the web master of the http//campus.umr.edu
  • Increase the popularity of the web page by
    tailoring it more to the needs of the users
    accessing it

18
Questions ???
Write a Comment
User Comments (0)
About PowerShow.com