Finding Frequent Webpage Access Patters - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Finding Frequent Webpage Access Patters

Description:

If a User browses the internet without using the back button, the URLs visited ... If a User browses the internet using the back button, the accessed URLs can be ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 15
Provided by: tri5498
Category:

less

Transcript and Presenter's Notes

Title: Finding Frequent Webpage Access Patters


1
Finding Frequent Webpage Access Patters
  • Part 2 Searching for Frequent Trees
  • Jennifer Cuzzort
  • Ashley Plier

2
URLs List Represented as a Tree
  • If a User browses the internet without using the
    back button, the URLs visited can be represented
    as a sequence.
  • If a User browses the internet using the back
    button, the accessed URLs can be represented as a
    tree.

3
Tree Representation and Terminology
4
Tree Written Representation
  • The tree on the previous slide can be represented
    in the following format
  • A(B(EFG)C(HI)D(JK))
  • Notice that each nodes children are listed
    inside a set of parentheses following their
    parent node
  • The first generation node is listed first,
    outside of any parentheses

5
Method of Implementation
  • The information from the file that is to be
    searched is stored in a 2-D character array
  • The program searches for the location of each
    occurrence of a character, beginning with A,
    inside the array
  • The location the balance parentheses following
    each occurrence of A is stored in a vector
    called FoundData

6
Method of Implementation
  • When all the occurrences of A are found and
    stored in FoundData, the program stores the
    location of all the characters that appear inside
    the balanced parentheses of the records in
    FoundData
  • This information is stored in FoundData2

7
Example Searching for B
  • FoundData2
  • C (6, 6) on line 1
  • D (6, 6) on line 1
  • A (4, 7) on line 2
  • D (12, 12) on line 2
  • K (6, 6) on line 2
  • L (11, 11) on line 2
  • M (7, 7) on line 2
  • P (9, 12) on line 2
  • FoundData
  • B (3, 6) on line 1
  • B (10, 10) on line 1
  • B (3, 13) on line 2

8
Method of Implementation
  • The program uses the content of FoundData and
    FoundData2 to build all the trees with a first
    generation node B in the array
  • The information is stored in a vector called
    Trees

9
Example Searching For B
  • Trees
  • B(CD) on line 1
  • B(C) on line 1
  • B(D) on line 1
  • B(A(KM)) on line 2
  • B(R(LD) on line 2
  • B(A) on line 2
  • B(A(K)) on line 2
  • B(A(M)) on line 2
  • B(K) on line 2
  • B(M) on line 2
  • B(KM) on line 2
  • B(R) on line 2
  • B(R(L)) on line 2
  • B(R(D)) on line 2
  • B(L) on line 2
  • B(D) on line 2
  • B(LD) on line 2

10
Method Of Implementation
  • The occurrence of each tree in the vector is
    counted
  • If the tree appears on more than the minimum
    support number of lines, then it is considered
    frequent
  • Frequent trees are added to a vector called
    FinalTrees

11
Method of Implementation
  • The program repeats the previous steps untill it
    has checked for all characters A through Z
  • At this point FinalTrees contains all of the
    frequent trees that occur in the input file
  • The content of trees needs to be filtered for a
    more concise output file

12
Method of Implementation
  • The contents of FinalTrees is printed to an
    intermediate output file called temp.txt
  • Then the information in temp.txt is stored in a
    second 2-D array
  • The trees that do not appear as part of a larger
    tree are recorded in a new vector
  • Trees that do appear in a larger tree are ignored

13
Example Filtering the Trees in Temp.txt
  • Temp.txt
  • A(B(C(DE)))
  • A(C(E))
  • A(B(F))
  • B(D)
  • E(SJ)
  • E(S)
  • E(F(G(H)))
  • F(H)
  • Output.txt
  • A(B(C(DE)))
  • A(B(F))
  • E(SJ)
  • E(F(G(H)))

14
Conclusion
  • Using the method described the Tree searching
    part of the Data Mining program can effectively
    evaluate the frequent trees in a list of URLs
  • The results of this evaluation are concisely
    reported in the output file
Write a Comment
User Comments (0)
About PowerShow.com