Title: Visualization of the Web Access Popularity
1Visualization of the Popularity of the Web
Accessfor Ping Wales
Xiaochuan Huang (George) Supervised by Dr Markus
RoggenbachDepartment of Computer
ScienceUniversity of Wales SwanseaNov. 2005 _at_
Gregynog
2Overview
- A Regular Website Report
- Specification
- Technology Involved
- A First Approach
31. A Regular Website Report
- What the project is about
- Our customer, Ping Media Ltd the website, Ping
Wales - What they need and the technical infrastructure
41. A Regular Website Report
- What the project is about
- Introducing similar tools
- Log file analyzersThe AWStats and Analogs
6.0Graphic statistics generated by AWStats and
Analog
51. A Regular Website Report
61. A Regular Website Report
- What the project is about
- Our customer, Ping Media Ltd the website, Ping
Wales - What they need and the technical infrastructure
- Introducing similar tools
- Log file analyzersThe AWStats and Analogs
6.0Graphic statistics generated by AWStats and
Analog - Why this application is necessary
- Customers needs The shortage of existing
applicationsExtendable project
72. Specification
- Components
- The filter/parserThe analyzerTwo
databasesVisualization - Going through the processes
- Take daily log file -gt parse with DB1 -gt output
filtered result -gt write result into DB2 - Given a specified duration -gt access DB2 -gt
generate the records -gt output an visualized
report
83. Technologies Involved
- The Apache log files
- Introduction
93.Technologies Involved
- The Apache log files
- Introduction
- Format"h l u t \"r\" gts b
\"Refereri\" \"User-agenti\""
combined220.244.224.104 - - 12/Jan/2005001238
0000 "GET /hardware/toshiba-small-80gb-hdd.html
HTTP/1.0" 200 11020 "http//www.pingwales.co.uk/b
usiness/apple-keynote.html" "Mozilla/5.0 (X11 U
Linux i686 en-US rv1.7.3) Gecko/20041204
Epiphany/1.4.4"
10The Apache log files
- Introduction
- Format "h l u t \"r\" gts b
\"Refereri\" \"User-agenti\""
combined220.244.224.104 - - 12/Jan/2005001238
0000 "GET /hardware/toshiba-small-80gb-hdd.html
HTTP/1.0" 200 11020 "http//www.pingwales.co.uk/b
usiness/apple-keynote.html" "Mozilla/5.0 (X11 U
Linux i686 en-US rv1.7.3) Gecko/20041204
Epiphany/1.4.4" - Log string analysis
- (h) 220.244.224.104 the IP address of the
client - (l) The RFC 1413, identity of the client
- (u) The userid of the requesting person
- (t) 12/Jan/2005001238 0000 the request
time - (\"r\") "GET /hardware/toshiba-small-80gb-hdd.h
tml HTTP/1.0" method, request page, client
protocol - (gts) 200 the status code
- (b) 11020 the size of the object returned to
the client - (\"Refereri\") the site that the client
reports having been referred from. - (\"User-agenti\") identifying information of
client browser
113. Technologies Involved
- The Apache log files
- Programming language Ruby
- interpreted scripting language for quick and
easy object-oriented programming
cd sample ruby eval.rbrubygt a "Hello, world
!" "Hello, world!rubygt puts a
Hello, world!Nilrubygt D
rubyputs "Hello, world!DHello, world!
123. Technologies Involved
- The Apache log files
- Programming language Ruby
- Database access
- MySQL,
- The two databases
- Access DB with Ruby
134. A First Approach
- load the daily log file
- Parsing/Filtering
- while not end of file
- read hit, line by line
- for each hit, getIP(h), getTime(t),
getReq(\"r\"), getSt(gts) - Check if even(first( getSt() )), then go
through the articles database looking for
getIP() - if there is, write such hit to database 2,
read next - go to next hit
- Analyzing
- Specify StartingTime, EndTime, build an
array/stack myArray - Read through records from database 2, for those
within the specified time - for each hit,
- if getIP() is in myArray, then counter1
- otherwise, write this hit to myArray, initial
counter - Sort myArray according to counter of each element
- Write out the result of top Ns to file, for
visualizing
14- Water flow model
- Take daily log file -gt parse with DB1 -gt output
filtered result -gt write result into DB2 - Given a specified duration -gt access DB2 -gt
generate the records -gt output an visualized
report
Filter
Daily Log File
Database 1ltwebpage add DBgt
Database 2 ltpage visits recordsgt
GraphicReport
Visualization Tool
Analyzer
Period entry
Records
15Summary
- What I have done so far
-
- What I am planning to do next
16End
- hey weak up, there he ends !! LOL
- George 21/11/2005 _at_Gregynog