Geo-word Centric Association Rule Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Geo-word Centric Association Rule Mining

Description:

Humans perceives a location in the verbal form, such as address, landmark and ... Most user interface of location based services provide input methods based on ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 21
Provided by: hyes9
Category:

less

Transcript and Presenter's Notes

Title: Geo-word Centric Association Rule Mining


1
Geo-word Centric Association Rule Mining
2
About Me
  • Fahad Al-Emam
  • Bachelors of CSE from MSU (04)
  • Masters student in the College of Computing
    specializing in Software Engineering
  • Graduating this Fall !

3
Summary
  • Humans perceives a location in the verbal form,
    such as address, landmark and other well known
    terms.
  • Most user interface of location based services
    provide input methods based on those verbal forms
    because they are the most intuitive and easiest
    to use for human users.
  • This verbal form is referred to as geo-word.
  • The main focus of the paper describes using
    geo-words in Association Rule Mining.

4
Background Terminology
  • Association rule mining finds all the rules
    existing in the database that satisfy a minimum
    support and confidence constraint.
  • Let I i1 i2 i3 i4 in be a set of
    itemsLet T t1, t2 .. tn be a set of
    TransactionsA Rule is defined as an implication
    of the formX ? Y where X, Y ? I and X n Y
    0
  • Support of an itemset is proportion of
    Transactions that contain it.
  • Confidence of a rule sup(x n y) / sup(x)

5
Geo-Words Geo-Tuples
  • Geo-word a proper noun which represents a
    location related human understandable concept. An
    address is a typical example of geo-word as well
    as the name of a landmark.
  • A non-proper noun such as "mountain" is also not
    considered as a geo-word. Neither is IP or
    lat/lon
  • Geo-word tuple is a set of items which contains
    at least one geo-word
  • TG ( g, ik .. in ) where i can be keyword, area
    status or even temperature. (1)

6
Geo-word Centric Association Rule
  • A geo-word centric association rule
  • RG Xg ? Yg
  • Geo-word association rule
  • The rule format explicitly contains at least a
    geo-word either on the left side of the rule X
    or the right side of the rule Y as formulated

  • (2)
  • Co-location association rule
  • The rule does not contain geo-word but it is
    generated from geo-word tuples and the geo-word
    becomes a constraint to the rule. All of the
    items considered in the rule come from geo-word
    tuples of the same geo-word. In other word, when
    the geo-word tuples satisfy (1) then the itemsets
    are related to a specific geo-word as formulated


  • (3)

7
Geo-word Centric Assc. Rule Mining
  • Over all Process of geo-word centric association
    rule mining. Some geo-word specific processes
    include geo-word cleaning, geo-word scaling and
    geo-word specific interestingness measures. The
    association rule mining itself can use any
    available classic mining algorithms

8
Geo-Word Tuple Processing
  • There are three type of Geo-word tuple processing
    which results in intermediate data
    representations that become itemsets for
    association rule mining
  • Geo-word tuple This basic methodology only
    considers the items in the geoword tuples. This
    processing method will generate geo-word
    association rules with the support equal to the
    number of tuples in the itemset.
  • Session When tuples contain user ID and
    timestamp they can be grouped into sessions based
    on the unique users and a specified time
    interval. This processing method also generates
    geo-word association rules but the support of the
    rule is the number of sessions instead of the
    number of the tuples.
  • Co-location tuple Co-location tuple contains
    only the items which co-occur with a geo-world in
    the tuple. Since co-location tuples do not
    necessarily contain any geo-word, the mining
    results can be regarded as co-location
    association rules.

9
Geo-Word Tuple Processing (cont.)
  • Here is an example of the processing preformed on
    an access log
  • Geo-word tuples are grouped
  • Session are grouped by time ID
  • Co-location are grouped by geo-word that relates
    to them

10
Experiment
  • Target Log Data Access log data used for mining
    is from an online commercial location based
    search service (Japanese yellowpages) each line
    of the log is called request which includes a
    geo-word, free-words, and a time stamp. Data
    collected over 3 months.
  • Data Preprocessing Each request of log data is
    abstracted by only picking items required for
    mining. After removing abnormal requests, log
    data is divided into sessions.
  • Statistics of the Target Data Original raw data
    consisted of 500 million accesses and its size is
    150 Giga bytes. After the preprocessing, it was
    converted into 14 million accesses and 800 Mega
    bytes data.
  • Association rules Generated from 14 million
    requests using the method described in Processing
    slide. The minimum support for the rules is
    0.000002. Each rule includes geo-words, and
    freewords.

11
Experiment (Cont.)
12
Experiment (Cont.)
13
Experiment (Cont.)
14
Experiment (Cont.)
15
LIFT
  • Observation In some cases, there were too many
    rules generated for a given word. For example,
    there are more than 1000 kinds of association
    rules which hotel is on the left part in our
    generated geo-word association rules. Therefore,
    we need a metric that can measure useful and
    interesting rules one known metric is called
    lift. The lift is a method that can measure
    how interesting/Unique a rule is, and a lift
    for rule R X ?Y is sup(X ?Y)/(Sup(X)Sup(Y))

16
LIFT
17
Entropy
  • The geo-word for a requested word can have
    different significance based on the notion of
    local specialty and local commodity. If a
    user request is the word super market, he/she
    may wish to know how to find it in the vicinity
    since one can be found in many places. However,
    if the word is not so common such as kabuki
    theater, he/she may want to know the location
    regardless of proximity to the vicinity.
  • Entropy Is a metric can identify the degree of
    local specialty of a given word. If we regard w a
    word as information source for g ?G , then
    Entropy H(w) -S N(g,w)log(N(g,w) / N(w))
  • When a free-word w is requested, if the
    accompanying geo-word g is the same all the time,
    then entropy 0 . If g differs all the time then
    H(w) is maximum

18
Entropy (Cont.)
19
Conclusion
  • Paper was extremely difficult to read.
  • Discussion need more popular examples
  • Didnt mention any actual mining algorithms used
    in research or in experiment.
  • By the end of the 3rd page, all citations
    completed!
  • Math presented without discussion.

20
  • Q A
Write a Comment
User Comments (0)
About PowerShow.com