Title: Efficient Mining of Group Patterns from User Movement Data
1Efficient Mining of Group Patterns from User
Movement Data
- A/P Lim Ee Peng
- School of Computer EngineeringNanyang
Technological University
2Acknowledgement
- PhD students
- Wang Yida just graduated
- Hady Lauw in 2nd year of PhD program
- Collaborator
- A/P San-Yih HwangDepartment of Information
ManagementNational Sun Yat-Sen
UniversityKaohsiung, Taiwan
3Outline
- Introduction
- Algorithms AGP VG-growth
- Spherical Location Summarization
- Performance Evaluation
- Conclusion
4Introduction
- People are affiliated to different kinds of
groups family, friends, school/class, company,
etc.. - Sociologists and management experts have studied
groups and group dynamics extensively. - Peer pressure and group conformity can affect the
behaviors of individuals. - Traditional applications
- Improvement of communication
- Roles of individuals in a group
- Other applications
- Group-oriented marketing strategy
- Security
5Introduction
- Existing techniques for deriving groups
- Static attributes age, education, work place,
income, etc.. - Transactional attributes purchase history
- Problems of existing techniques
- Collection of static and transactional data
- Groups derived are not necessary useful for some
applications
6Research Objectives
- Group pattern mining based on user movement data
- Physical proximity Group members are physically
close to one another - Temporal proximity Group members stay close for
some meaningful time duration - Assumptions
- User movement can be tracked by mobile phones,
RFIDs and other devices. - Positioning technology can be as accurate as 1-20
meters (GPS/AGPS). - Privacy issues can be addressed separately.
7Example
8Group Examples
9Valid Segment of a User Group
- User movement database D (D1, D2, , DM)
- Di is a time series of locations of user ui, ltt,
(x, y, z)gt - uit.p (x, y, z)
- Distance function
- d((x1, y1, z1), (x2, y2, z2))
((x1-x2)2(y1-y2)2(z1-z2)2)1/2 - max_dis and min_dur thresholds
- max_dis Two users are close to each other if ?
max_dist apart - min_dur The closeness should last ? min_dur to
become meaningful - Given max_dis, and min_dur, ta, tb is a valid
segment for a group of users G if - ? ui , uj? G, ? t in ta,tb, d(uit.p, ujt.p)
? max_dis - ? ui , uj? G, d(uita-1.p, ujta-1.p) gt
max_dis - ? ui , uj? G, d(uitb1.p, ujtb1.p) gt
max_dis and - (tb-ta1) ? min_dur
10User Movement Database
11Group Pattern
- Group pattern
- Given D, ltG, max_dis, min_durgt forms a group
pattern if G has a valid segment - A k-group pattern is a group pattern with k users
- Weight of group pattern
- PltG, max_dis, min_durgt be a group pattern with
valid segments s1, s2, , sn, and - N denotes the number of time points in D
12Valid Group Pattern Mining
- Valid group pattern
- If weight(P) ? min_wei, P is called a valid group
pattern, and G is called a valid group - min_wei is a user specified weight threshold
- Valid Group (Pattern) Mining Problem
- Given D, max_dis, min_dur, and min_wei, find all
the valid group patterns (or valid groups).
13Valid Group Mining vs Association Rule Mining
(ARM)
- No explicit concept of transaction in a movement
database - For each time point, group those users who are
nearby one another into one transaction gt Large
number of transactions - Weight defined for group pattern different from
support in ARM - New algorithms are required
14Apriori-liked Group Pattern (AGP) Mining Algorithm
- Based on Apriori Algorithm
- Sub-group pattern
- PltG, max_dis, min_durgt is a sub-group pattern
of PltG, max_dis, min_durgt if G ? G - Apriori property for group patterns
- If a group pattern is valid, all of its sub-group
patterns are valid as well - For example u1, u2 are valid if u1, u2 are
valid.
15AGP Algorithm
16AGP Algorithm
17Example
max_dis 10, min_dur 3, and min_wei 50
s(u1,u2) Valid segments of u1,u2 0,3,
7,9 Weight(u1,u2) 4 3 / 10 0.7
18Example
Set of all valid groups
19Design of VG-growth Algorithm
- Weaknesses of AGP
- Huge sets of candidate k-groups
- Multiple scans on D
- FP-tree and FP-growth Han, Pei and Yin 2000
- Challenges
- The concept of transaction does not exist
- weight is more complex than support
- VG-growth algorithm
- Perform mining recursively until the largest
group containing user ui is found
20Valid Group graph (VG graph)
- A VG graph is a directed graph (V, E)
- V vertices representing users in the set of
valid 2-groups - E directed edges, each representing a valid
2-group (ui,uj) - Each edge is assigned valid segments of (ui,uj)
- Construction
- Apply AGP to find valid 2-groups
- Scan D once to derive valid segments for valid
2-groups.
21VG Graph Example
22VG-growth
- Prefix-neighbor
- If (ugv ) is a directed edge in a VG-graph, u is
called the prefix-neighbor of v - VG-growth algorithm
- Traversing the VG-graph
- Examine all the prefix-neighbors for each vertex
- Recursively mining the conditional VG-graph
- VG-growth is much more efficient than AGP with
respect to mining valid k-groups (kgt2)
23VG-growth
24Mining of u5
Output valid groups u1,u5, u2,u5, u4,u5
5,9
3,9
0,3,7,9
3,9
0,9
25Visualization of Group Patterns
26Experimental Results
- IBM City Simulator generated 3-D user movement
over city layout - 1000 users, 1000 time points (10 mins per point),
1000 x 1500 x 100 m3 gt 12 MB - Performance metrics
- Total execution time (T)
- Time for mining valid 2-groups (T2)
- Time for mining all other valid groups (Tk)
- Parameter settings
- Change min_wei from 1 to 10
- max_dist 30, min_dur 4
27Results of AGP and VG-growth
28Results of AGP and VG-growth
29Results
30Results
31Common problem of AGP VG-growth
- A common step of AGP and VG-growth mining valid
2-groups - T2 time used for mining valid 2-groups
- Tk time used for mining valid k-groups (kgt2)
- T whole execution time (T T2Tk)
- Bottleneck
- Assume M users and N time points in D
- T2 dominates T, especially when M and N are large
32Summarization Method for Mining Group Patterns
- Idea
- Number of candidate 2-groups
- Reduce the number of candidate 2-groups
- Spherical location summarization method
- Partition each users movement data into segments
with equal duration - Summarize the locations within each segment by a
sphere - D Summarized user movement database
33Spherical Location Summarization
- Spherical location summarization
- Divide movement database of a user into time
segments with fixed length, w. - Summarize location points within each segment by
a sphere.
Sphere(pc, r)
34Spherical Location Summarization
35Preprocessing the Sphere Data
- Observation
- If the minimum distance between two spheres S1
and S2 gt dist, then all pairs of points from S1
and S2 gt dist. - Preprocessing step
- upper bound of max_dis
- Compute the upper bounds of weight count and
valid segment length (longest close sphere
segment length) for each candidate 2-group
max_dis
36Upper Bound Weight Count of a User Pair
Set of CSSs of ui,uj
37Upper Bound Weight Count of a User Pair
38Preprocessing the Sphere Data
39Spherical Location Summarization based Algorithm
for Mining Valid 2-Groups (SLSV2G)
- Prune unpromising candidate 2-groups using
pre-computed upper bound information - Check the closeness relationship for each
candidate 2-group based on the summarized
database - Check the original database only when the
closeness relationship cannot be determined
40SLSV2G Algorithm
41SLSV2G Algorithm
42Closeness between Spheres
- Case 1 All points in 2 spheres ? max_dis
- Case 2 All points in 2 spheres gt max_dis
d(c1,c2) - (r1r2)
S2 (c2, r2)
S1 (c1, r1)
S1
S2
max_dis
d(c1,c2) - (r1r2) gt max_dis
43Closeness between Spheres
- Case 3 Only some points in 2 spheres ? max_dis
(need to check the original database)
44Performance Evaluation
- Synthetic data generator
- IBM City Simulator
- Three datasets
45Performance Evaluation
- Series-I SLSV2G versus VG-growth
- Focus on finding valid 2-groups
- max_dist 30, min_dur 4
- With location summarization vs. without
summarization - Compare T2 and C2
D is loaded in main memory
VG-growth
Without location summarization
D is on hard disk D is loaded in main memory
SLSV2G
With location summarization
46Time for finding valid 2-groups (T2)
- T2 of SLSV2G is only 5 - 8 of T2 of VG-growth
- DBIII is too large to be mined without location
summarization
47Number of candidate 2-groups (C2)
- Without location summarization, C2 is a
constant - With location summarization, C2 is much smaller
(1 7)
48Performance Evaluation
- Series-II Scalability of SLSV2G
- Time window size, w
- Number of users, M
- Number of time points, N
49Performance Evaluation
- Does not scale linearly with time window size, w
- There exists an optimal w for a given dataset
- Overhead of scanning summarized database vs
granularity of summarization
50Performance Evaluation
- SLSV2G is much more scalable than VG-growth wrt M
- The gap becomes larger when M increases
51Performance Evaluation
- Both are linear with N
- SLSV2G is much more scalable
52Conclusion
- Methods for valid group pattern mining AGP,
VG-growth - Location summarization method to efficiently
mining valid 2-groups. - Experiments show that summarization method reduce
the processing time significantly. - Other related work
- Other kinds of location summarization models
- Extension of valid group pattern mining to handle
dynamic user movement database - Other practical issues missing movement data,
location accuracies
53End of TalkQA?
- Ee-Peng Lim
- aseplim_at_ntu.edu.sg
54Publications
- Jeng-Kuen Chiu, San-Yih Hwang, Ying-Han Liu,
Ee-Peng Lim, Mining Mobile Group Patterns A
Trajectory-based Approach, 9th Pacific-Asia
Conference on Knowledge Discovery and Data Mining
(PAKDD2005), Hanoi, Vietnam, May 2005. - Hady Wirawan Lauw, Ee-Peng Lim, Mining Social
Network from Spatio-Temporal Events, Workshop on
Link Analysis, Counterterrorism and Security, in
conjunction with SIAM Data Mining Conference,
Newport Beach, April 2005. - Yida Wang, Ee-Peng Lim, San-Yih Hwang, Efficient
mining of group patterns from user movement data.
To appear at Data and Knowledge Engineering,
2005. - Yida Wang, Ee-Peng Lim, San-Yih Hwang, Efficient
mining of group patterns using data
Summarization, International Conference on
Database Systems for Advanced Applications
(DASFAA2004), Jeju Island, 2004. - Yida Wang, Ee-Peng Lim, San-Yih Hwang, On mining
group patterns of mobile users. International
Conference on Database and Expert Systems
Applications (DEXA2003), Prague, 2003