Efficient Mining of Group Patterns from User Movement Data - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Efficient Mining of Group Patterns from User Movement Data

Description:

Positioning technology can be as accurate as 1-20 meters (GPS/AGPS) ... International Conference on Database and Expert Systems Applications (DEXA2003), Prague, 2003 ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 55
Provided by: wud
Category:

less

Transcript and Presenter's Notes

Title: Efficient Mining of Group Patterns from User Movement Data


1
Efficient Mining of Group Patterns from User
Movement Data
  • A/P Lim Ee Peng
  • School of Computer EngineeringNanyang
    Technological University

2
Acknowledgement
  • PhD students
  • Wang Yida just graduated
  • Hady Lauw in 2nd year of PhD program
  • Collaborator
  • A/P San-Yih HwangDepartment of Information
    ManagementNational Sun Yat-Sen
    UniversityKaohsiung, Taiwan

3
Outline
  • Introduction
  • Algorithms AGP VG-growth
  • Spherical Location Summarization
  • Performance Evaluation
  • Conclusion

4
Introduction
  • People are affiliated to different kinds of
    groups family, friends, school/class, company,
    etc..
  • Sociologists and management experts have studied
    groups and group dynamics extensively.
  • Peer pressure and group conformity can affect the
    behaviors of individuals.
  • Traditional applications
  • Improvement of communication
  • Roles of individuals in a group
  • Other applications
  • Group-oriented marketing strategy
  • Security

5
Introduction
  • Existing techniques for deriving groups
  • Static attributes age, education, work place,
    income, etc..
  • Transactional attributes purchase history
  • Problems of existing techniques
  • Collection of static and transactional data
  • Groups derived are not necessary useful for some
    applications

6
Research Objectives
  • Group pattern mining based on user movement data
  • Physical proximity Group members are physically
    close to one another
  • Temporal proximity Group members stay close for
    some meaningful time duration
  • Assumptions
  • User movement can be tracked by mobile phones,
    RFIDs and other devices.
  • Positioning technology can be as accurate as 1-20
    meters (GPS/AGPS).
  • Privacy issues can be addressed separately.

7
Example
8
Group Examples
9
Valid Segment of a User Group
  • User movement database D (D1, D2, , DM)
  • Di is a time series of locations of user ui, ltt,
    (x, y, z)gt
  • uit.p (x, y, z)
  • Distance function
  • d((x1, y1, z1), (x2, y2, z2))
    ((x1-x2)2(y1-y2)2(z1-z2)2)1/2
  • max_dis and min_dur thresholds
  • max_dis Two users are close to each other if ?
    max_dist apart
  • min_dur The closeness should last ? min_dur to
    become meaningful
  • Given max_dis, and min_dur, ta, tb is a valid
    segment for a group of users G if
  • ? ui , uj? G, ? t in ta,tb, d(uit.p, ujt.p)
    ? max_dis
  • ? ui , uj? G, d(uita-1.p, ujta-1.p) gt
    max_dis
  • ? ui , uj? G, d(uitb1.p, ujtb1.p) gt
    max_dis and
  • (tb-ta1) ? min_dur

10
User Movement Database
11
Group Pattern
  • Group pattern
  • Given D, ltG, max_dis, min_durgt forms a group
    pattern if G has a valid segment
  • A k-group pattern is a group pattern with k users
  • Weight of group pattern
  • PltG, max_dis, min_durgt be a group pattern with
    valid segments s1, s2, , sn, and
  • N denotes the number of time points in D

12
Valid Group Pattern Mining
  • Valid group pattern
  • If weight(P) ? min_wei, P is called a valid group
    pattern, and G is called a valid group
  • min_wei is a user specified weight threshold
  • Valid Group (Pattern) Mining Problem
  • Given D, max_dis, min_dur, and min_wei, find all
    the valid group patterns (or valid groups).

13
Valid Group Mining vs Association Rule Mining
(ARM)
  • No explicit concept of transaction in a movement
    database
  • For each time point, group those users who are
    nearby one another into one transaction gt Large
    number of transactions
  • Weight defined for group pattern different from
    support in ARM
  • New algorithms are required

14
Apriori-liked Group Pattern (AGP) Mining Algorithm
  • Based on Apriori Algorithm
  • Sub-group pattern
  • PltG, max_dis, min_durgt is a sub-group pattern
    of PltG, max_dis, min_durgt if G ? G
  • Apriori property for group patterns
  • If a group pattern is valid, all of its sub-group
    patterns are valid as well
  • For example u1, u2 are valid if u1, u2 are
    valid.

15
AGP Algorithm
16
AGP Algorithm
17
Example
max_dis 10, min_dur 3, and min_wei 50
s(u1,u2) Valid segments of u1,u2 0,3,
7,9 Weight(u1,u2) 4 3 / 10 0.7
18
Example
Set of all valid groups
19
Design of VG-growth Algorithm
  • Weaknesses of AGP
  • Huge sets of candidate k-groups
  • Multiple scans on D
  • FP-tree and FP-growth Han, Pei and Yin 2000
  • Challenges
  • The concept of transaction does not exist
  • weight is more complex than support
  • VG-growth algorithm
  • Perform mining recursively until the largest
    group containing user ui is found

20
Valid Group graph (VG graph)
  • A VG graph is a directed graph (V, E)
  • V vertices representing users in the set of
    valid 2-groups
  • E directed edges, each representing a valid
    2-group (ui,uj)
  • Each edge is assigned valid segments of (ui,uj)
  • Construction
  • Apply AGP to find valid 2-groups
  • Scan D once to derive valid segments for valid
    2-groups.

21
VG Graph Example
22
VG-growth
  • Prefix-neighbor
  • If (ugv ) is a directed edge in a VG-graph, u is
    called the prefix-neighbor of v
  • VG-growth algorithm
  • Traversing the VG-graph
  • Examine all the prefix-neighbors for each vertex
  • Recursively mining the conditional VG-graph
  • VG-growth is much more efficient than AGP with
    respect to mining valid k-groups (kgt2)

23
VG-growth
24
Mining of u5
Output valid groups u1,u5, u2,u5, u4,u5
5,9
3,9
0,3,7,9
3,9
0,9
25
Visualization of Group Patterns
26
Experimental Results
  • IBM City Simulator generated 3-D user movement
    over city layout
  • 1000 users, 1000 time points (10 mins per point),
    1000 x 1500 x 100 m3 gt 12 MB
  • Performance metrics
  • Total execution time (T)
  • Time for mining valid 2-groups (T2)
  • Time for mining all other valid groups (Tk)
  • Parameter settings
  • Change min_wei from 1 to 10
  • max_dist 30, min_dur 4

27
Results of AGP and VG-growth
28
Results of AGP and VG-growth
29
Results
30
Results
31
Common problem of AGP VG-growth
  • A common step of AGP and VG-growth mining valid
    2-groups
  • T2 time used for mining valid 2-groups
  • Tk time used for mining valid k-groups (kgt2)
  • T whole execution time (T T2Tk)
  • Bottleneck
  • Assume M users and N time points in D
  • T2 dominates T, especially when M and N are large

32
Summarization Method for Mining Group Patterns
  • Idea
  • Number of candidate 2-groups
  • Reduce the number of candidate 2-groups
  • Spherical location summarization method
  • Partition each users movement data into segments
    with equal duration
  • Summarize the locations within each segment by a
    sphere
  • D Summarized user movement database

33
Spherical Location Summarization
  • Spherical location summarization
  • Divide movement database of a user into time
    segments with fixed length, w.
  • Summarize location points within each segment by
    a sphere.

Sphere(pc, r)
34
Spherical Location Summarization
35
Preprocessing the Sphere Data
  • Observation
  • If the minimum distance between two spheres S1
    and S2 gt dist, then all pairs of points from S1
    and S2 gt dist.
  • Preprocessing step
  • upper bound of max_dis
  • Compute the upper bounds of weight count and
    valid segment length (longest close sphere
    segment length) for each candidate 2-group

max_dis
36
Upper Bound Weight Count of a User Pair
Set of CSSs of ui,uj
37
Upper Bound Weight Count of a User Pair
38
Preprocessing the Sphere Data
39
Spherical Location Summarization based Algorithm
for Mining Valid 2-Groups (SLSV2G)
  • Prune unpromising candidate 2-groups using
    pre-computed upper bound information
  • Check the closeness relationship for each
    candidate 2-group based on the summarized
    database
  • Check the original database only when the
    closeness relationship cannot be determined

40
SLSV2G Algorithm
41
SLSV2G Algorithm
42
Closeness between Spheres
  • Case 1 All points in 2 spheres ? max_dis
  • Case 2 All points in 2 spheres gt max_dis

d(c1,c2) - (r1r2)
S2 (c2, r2)
S1 (c1, r1)
S1
S2
max_dis
d(c1,c2) - (r1r2) gt max_dis
43
Closeness between Spheres
  • Case 3 Only some points in 2 spheres ? max_dis
    (need to check the original database)

44
Performance Evaluation
  • Synthetic data generator
  • IBM City Simulator
  • Three datasets

45
Performance Evaluation
  • Series-I SLSV2G versus VG-growth
  • Focus on finding valid 2-groups
  • max_dist 30, min_dur 4
  • With location summarization vs. without
    summarization
  • Compare T2 and C2

D is loaded in main memory
VG-growth
Without location summarization
D is on hard disk D is loaded in main memory
SLSV2G
With location summarization
46
Time for finding valid 2-groups (T2)
  • T2 of SLSV2G is only 5 - 8 of T2 of VG-growth
  • DBIII is too large to be mined without location
    summarization

47
Number of candidate 2-groups (C2)
  • Without location summarization, C2 is a
    constant
  • With location summarization, C2 is much smaller
    (1 7)

48
Performance Evaluation
  • Series-II Scalability of SLSV2G
  • Time window size, w
  • Number of users, M
  • Number of time points, N

49
Performance Evaluation
  • Does not scale linearly with time window size, w
  • There exists an optimal w for a given dataset
  • Overhead of scanning summarized database vs
    granularity of summarization

50
Performance Evaluation
  • SLSV2G is much more scalable than VG-growth wrt M
  • The gap becomes larger when M increases

51
Performance Evaluation
  • Both are linear with N
  • SLSV2G is much more scalable

52
Conclusion
  • Methods for valid group pattern mining AGP,
    VG-growth
  • Location summarization method to efficiently
    mining valid 2-groups.
  • Experiments show that summarization method reduce
    the processing time significantly.
  • Other related work
  • Other kinds of location summarization models
  • Extension of valid group pattern mining to handle
    dynamic user movement database
  • Other practical issues missing movement data,
    location accuracies

53
End of TalkQA?
  • Ee-Peng Lim
  • aseplim_at_ntu.edu.sg

54
Publications
  • Jeng-Kuen Chiu, San-Yih Hwang, Ying-Han Liu,
    Ee-Peng Lim, Mining Mobile Group Patterns A
    Trajectory-based Approach, 9th Pacific-Asia
    Conference on Knowledge Discovery and Data Mining
    (PAKDD2005), Hanoi, Vietnam, May 2005.
  • Hady Wirawan Lauw, Ee-Peng Lim, Mining Social
    Network from Spatio-Temporal Events, Workshop on
    Link Analysis, Counterterrorism and Security, in
    conjunction with SIAM Data Mining Conference,
    Newport Beach, April 2005.
  • Yida Wang, Ee-Peng Lim, San-Yih Hwang, Efficient
    mining of group patterns from user movement data.
    To appear at Data and Knowledge Engineering,
    2005.
  • Yida Wang, Ee-Peng Lim, San-Yih Hwang, Efficient
    mining of group patterns using data
    Summarization, International Conference on
    Database Systems for Advanced Applications
    (DASFAA2004), Jeju Island, 2004.
  • Yida Wang, Ee-Peng Lim, San-Yih Hwang, On mining
    group patterns of mobile users. International
    Conference on Database and Expert Systems
    Applications (DEXA2003), Prague, 2003
Write a Comment
User Comments (0)
About PowerShow.com