Web Usage Mining - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Web Usage Mining

Description:

Web Usage Mining Web Data Web Usage Mining Taxonomy and Project Survey Web Mining Web Structure Mining Web ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 32
Provided by: pud56
Category:
Tags: mining | structure | usage | web

less

Transcript and Presenter's Notes

Title: Web Usage Mining


1
Web Usage Mining
  • ? ?

2
??
  • ??
  • Web Data
  • Web Usage Mining
  • Taxonomy and Project Survey

3
??
  • Web Mining ??????
  • Web Structure Mining
  • Web Content Mining
  • Web Usage Mining
  • Web Usage Mining???????????????????????
  • ????????

4
Web Data
  • ??????????????
  • 1). Content???????
  • 2). Structure ??intra-structure? inter-structure
  • 3). Usage ??Click Stream
  • 4). User Profile??registration data ? customer
    profiles

5
???(?)??????
  • ??
  • 1) Web Log File
  • 2) Packet Sniffing ??
  • 3) Web Page Content structure
  • 4) Application Server
  • ??
  • Cached Page View ??????
  • post ?????????????
  • ???????????????

6
???(?)?????
  • ??????Cache?session identification,???????????????
    ?
  • Applet Script
  • ???????,?????????
  • Modified browser (Mosaic?Mozilla)
  • ??????????????

7
???(?)???????
  • ??????Client???Server???????

8
????
  • 1.User
  • 2.Page View(Click) ?????????????????
  • 3.Click Stream ?????page view ??
  • 4.User Session (transaction)???????click
    stream,????????
  • 5.Server Session ??????????????click stream
  • 6.Episode ??session????????click stream

9
????(?)
Episodes
Server Session
User Session
Click Stream
Page View
Raw Data
10
Web Usage Mining?????
Site Files
Preprocessing
Pattern Discovery
Pattern Analysis
Interesting Rules Patterns Statistics
Raw Logs
Preprocessed Clickstream Data
Rules Patterns Statistics
11
???
  • ???????
  • ????????

12
???????(?)
  • ??IP Address ,agent ,server side click stream
    ?????????????????
  • Simple IP address/Multiple Server Sessions
  • ?ISP???Proxy server
  • Multiple IP address/Single Server Session
  • ???ISP???????????????IP
  • Multiple IP address/Single User
  • ???????????
  • Multiple Agent/Single User
  • ????????????

13
???????(?)
  • ????????Click-Stream??????session?
  • ????????????????????????,????????????????????????s
    ession???????,?URI????????(?)
  • Cache??????????????

14
????????
  • ??????
  • 1. Page????????
  • ???????????Page??
  • ?????????????
  • 2. Page ???????????
  • ?Page?????????

15
Pattern Discovery
  • Statistical Analysis
  • Association Rules
  • Clustering
  • Classification
  • Sequential Patterns
  • Dependency Modeling

16
Statistical Analysis
  • ?Page Views,Viewing time ?navigational
    path??????,???,????????
  • ????????????????,?
  • ????????
  • ????????
  • ???????????
  • ????????????,????????????????????
  • ??????????,?????????????,????????,????????

17
Association Rules
  • ????????????????????????????90?????
  • ?Web Usage Mining ?,???????????????(???)??????????
    ???
  • ???Apriori ???????????????????????????????????
  • ??????????????????????????,??????????

18
Clustering
  • ??????????????
  • Usage Clustering????????????????????????????????
  • Page Clustering?????????????????????

19
Classification
  • ?Web Usage Mining ?,??????????
  • ???????????????
  • Decision tree
  • naïve Bayesian
  • k-nearest neighbor
  • Support Vector Machines??

20
Sequential Patterns
  • ??session???pattern,?????????????,????????????????
    ?trend analysis,change point detection
    ??similarity analysis

21
Dependency Modeling
  • ???????web????????????
  • ?????
  • Hidden Markov Model
  • Bayesian Belief Network
  • ??????????,????????????????,???????Web?????,??????
    ????

22
Pattern Analysis
  • ??????????,??????????????????Pattern
    Analysis??????????????,?SQL.??????????????????????
    ?OLAP.
  • ?????????
  • ???????

23
Taxonomy Dimensions
  • the data sources used to gather input
  • the types of input data
  • the number of users represented in each data set
  • the number of Web sites represented in each data
    set
  • the application area focused on by the project
  • ????single-site,multi-user,server-side usage data

24
Major Application Area for Web Usage Mining
25
???????
  • WebSIFT
  • SpeedTracer IBM Watson .????????user traversal
    path,?????user session??????????? most common
    traversal path ? frequently visited page group
  • WUM????????,??????????frequent
    path??????????sequence pattern????
  • WebLogMiner?web?????????????????roll-up?drill-dow
    n,??????,??,???????
  • Shahabi Rely on Client Side data collection.

26
Personalization
  • ?????????????
  • WebWatcher???????????????????,??????????WebWatche
    r???????????????????
  • SiteHelper????????????????????,??????
  • LetiziaClient Side agent.????????????????
  • Yan et. al.??Web????????????????????????????,????
    ????????????????

27
System Improvement
  • Web Usage Mining ??????Web Caching,network
    transmission ,load balancing ,data distribution .
  • ?Security??,??????intrusion,fraud,attempted
    break-in
  • Almeida et.al.??????,??Proxy??pre-fetching?cachin
    g???
  • Schechter et.al???????????path
    profile,????????HTML,???????????

28
Site Modification
  • Web Usage Mining ?????????,???????????
  • SCML???????????????????????????

29
Business Intelligence
  • Buchner et.al?Web Data?????????????????????????,?
    Web Usage Data????????????????
  • Commercial Products
  • SurfAid,Accrue,NetGenesis,Aria,Hitlist,WebTrends

30
Usage Characterization
  • ?Web characterization research?Web usage
    mining?????????
  • Catledge et al.???????????,??????????????????????
    ?,???,??,??????
  • Pitkow et al.??????,?????????????????????????????
    ????????????????????

31
? ? ? ?
Write a Comment
User Comments (0)
About PowerShow.com