Limits of the Web Log Analysis Artifacts - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Limits of the Web Log Analysis Artifacts

Description:

features of interfaces, query languages and search techniques used in ... Two Palliative Cut-off Variables (1) If users are not 'cookied' CLIENT |= USER ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 20
Provided by: nilolaibuz
Category:

less

Transcript and Presenter's Notes

Title: Limits of the Web Log Analysis Artifacts


1
Limits of the Web Log Analysis Artifacts
  • Nikolai (Nick) Buzikashvili, Jim Jansen
  • Russian Academy of Science, Penn State
    University
  • buzik_at_cs.isa.ru, jjansen_at_ist.psu.edu

2
Comparing Results of the Web Log Analysis

3
  • Obviously Different.
  • Why?
  • features of interfaces, query languages and
    search techniques used in different search
    engines,
  • differences in contexts, including the a
    recommended usage manner, the overlap of query
    language and real-life written language,
  • hypothetic cultural differences
  • This traditional approach is significant and
    well-grounded.

4
  • Different Statistically,
  • Different Significantly.
  • However, Really Different? Sure?
  • Here we consider
  • DEPENDENCY of the RESULTS of Web Log Analysis on
    the METHODS
  • sensitivity of results to method varying

5
Arithmetic of Web Log Analysis
  • Query 1 5 terms, query 2 1 term, query 3 1
    term
  • head-disrupted query 1 is accounted ? 3 unique
    queries, 8 transactions, 2 terms/query
  • query 1 is ignored ? 2 unique queries, 6
    transactions, 1 term/query  
  •  
  • (1,0) (1,1) (2,0) (1,2) (2,1)
    (2,1) (3,0) (2,0) (3,1) (3,2) (4,0)
  •  
  •  
  • OBSERVATION
    PERIOD
  • Head-disrupted query
    Tail-disrupted query
  • Notations (I,j) ? (query, page-of-retrieved-resu
    lts)

6
Web TLA vs. Classic TLATwo Palliative Cut-off
Variables
  • (1) If users are not cookied
  • CLIENT USER
  • and we need to exclude mixed users (local
    area networks). ? LAN CUT-OFF variable
  • (e.g. 10 transactions per a day observation
    period)
  • (2) No search sessions in the Web logs
  • ? TEMPORAL CUT-OFF variable
  • to segment transactions into temporal sessions

7
Suspect Factors
  • Peculiarities of the WEB log analysis
  • Two Cut-Off variables arbitrary assigned in
    different Web log studies
  • (2) Common factors
  • 2.1 Sampling technique and
  • 2.2 Observation Period Duration
  • --------------------------------------------------
    -----------
  • To study sensitivity of results we use the
    Excite99 complete 8-hour fragment and Excite01
    and Yandex05 samples

8
1. Log Sampling-1
9
1. Log Sampling-2 fractions of quotation
queries in 3 samples
10
1. Log Sampling-3
  • (1) SIGNIFICANTLY affects results
  • (2) usually selected by the log owner (search
    engine team) rather than by the researchers
  • (3) the owners usually make little account of
    sample randomization

11
2. Observation Period
  • Varying of Observation Period duration
    significantly affects the results only if
    duration is less than 2 hours.
  • Thus, we should exclude the Observation Period
    factor as NOT SIGNIFICANT

12
Cut-Off Variables
  • 3. Temporal cut-off
  • usual values 15 min (Excite),
  • 30 min (AltaVista, Yandex),
  • 60 min (Yandex)
  • 4. LAN (local area network) cut-off
  • measured in transactions (usually queries),
  • or 0-page transactions,
  • or unique queries

13
3. Temporal cut-off
  • A priori,
  • Per transaction characteristics (query length,
    fractions of some kind of queries, e.g. Boolean,
    advanced etc.) dont depend on temporal cut-off.
  • The same per client characteristics also dont
    depend.
  • The same per session characteristics SHOULD
    DEPEND.

14
4. LAN Cut-off
  • A priori,
  • same characteristics SHOULD DEPEND on the LAN
    cut-off

15
3. Temporal cut-off(Excite99/Excite01, LAN5
u.q./h)

16
4. LAN cut-off (Excite99/ Excite01, temporal
cut-off 15 min)
17
4. LAN cut-offFractions of adv.queries in
Excite99 01
18
Suspect Factors

19
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com