Liang Jin and Chen Li - PowerPoint PPT Presentation

About This Presentation

Title:

Liang Jin and Chen Li

Description:

Example: a movie database. Drama. 1990. Goodfellas. Samuel Jackson. 1984. 2005. 1999. Year ... Database. 4. Selectivity Estimation: Problem Formulation. A bag ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 33

Provided by: che7

Learn more at: http://flamingo.ics.uci.edu

Category:

Tags: chen | database | jin | liang | movie

Transcript and Presenter's Notes

Title: Liang Jin and Chen Li

1
Selectivity Estimation for Fuzzy String
Predicates in Large Data Sets

Liang Jin and Chen Li

VLDB2005 Supported by NSF CAREER Award
IIS-0238586
2
Example a movie database
Find movies starred Schwarrzenger?
Find movies with a star similar to
Schwarrzenger.
Star Title Year Genre
Keanu Reeves The Matrix 1999 Sci-Fi
Samuel Jackson Star Wars Episode III - Revenge of the Sith 2005 Sci-Fi
Schwarzenegger The Terminator 1984 Sci-Fi
Samuel Jackson Goodfellas 1990 Drama

3
Queries with Fuzzy String Predicates

Stars name similar to Schwarrzenger
Employees SSN similar to 430-87-7294
Customers telephone number similar to 412-0964
Similar to
a domain-specific function
returns a similarity value between two strings
Example edit distance
Ed(s1,s2) minimum of operations (insertion,
deletion, substitution) to change s1 to s2
ed(Tom Hanks,
Ton Hank ) 2

Database
4
Selectivity Estimation Problem Formulation
star SIMILARTO Schwarrzenger
Input fuzzy string predicate P(q, d)
A bag of strings
Output of strings s that satisfy dist(s,q) lt d
5
Why Selectivity Estimation?
SELECT FROM Movies WHERE star SIMILARTO
Schwarrzenger AND year BETWEEN 1970,1971
SELECT FROM Movies WHERE star SIMILARTO
Schwarrzenger AND year BETWEEN 1980,1999
Movies
Star Title Year Genre
Keanu Reeves The Matrix 1999 Sci-Fi
Samuel Jackson Star Wars Episode III - Revenge of the Sith 2005 Sci-Fi
Schwarzenegger The Terminator 1984 Sci-Fi
Samuel Jackson Goodfellas 1990 Drama

The optimizer needs to know the selectivity of a
predicate to decide a good plan.
6
Rest of the talk

Motivation selectivity estimation of fuzzy
predicates
Our approach SEPIA
Proximity between strings
Histograms and estimation algorithm
Construction and maintenance of SEPIA
Experiments

7
Intuition of SEPIA

Selectivity Estimation of Approximate Predicates

8
Proximity between Strings
Edit Distance? Not discriminative enough
9
Edit Vector from s1 to s2

A vector ltI, D, Sgt
I of insertions
D of deletions
S of substitutions
in a sequence of edit operations with their edit
distance

10
Why Edit Vector? More discriminative
11
SEPIA histograms Overview
12
Frequency table for each cluster
13
Global PPD Table

Proximity Pair Distribution table

14
SEPIA histograms summary
15
Selectivity Estimation ed(lukas, 2)

Do it for all v2 vectors in each cluster, for all
clusters
Take the sum of these contributions

16
Selectivity Estimation for ed(q,d)

For each cluster Ci
For each v2 in frequency table of Ci
Use (v1,v2,d) to lookup PPD
Take the sum of these f N
Pruning possible (triangle inequality)

17
Outline

Motivation selectivity estimation of fuzzy
predicates
Our approach SEPIA
Proximity between strings
Histograms and estimation algorithm
Construction and maintenance of SEPIA
Experiments

18
Clustering Strings

Two example algorithms
Lexicographic order based.
K-Medoids
Choose initial pivots
Assign strings to its closest pivot
Swap a pivot with another string
Reassign the strings

19
Number of Clusters

It affects
Cluster quality
Similarity of strings within each cluster
Costs
Space
Estimation time

20
Constructing Frequency Tables

For each cluster, group strings based on their
edit vector from the pivot
Count the frequency for each group

21
Constructing PPD Table

Get enough samples of string triplets (q,p,s)
Propose a few heuristics
ALL_RAND
CLOSE_RAND
CLOSE_LEX
CLOSE_UNIQUE

22
Dynamic Maintenance Frequency Table

Take insertion as an example

23
Dynamic Maintenance PPD
24
Improving Estimation Accuracy

A post-processing step to further improve
estimation accuracy
See paper for details.

25
Outline

Motivation selectivity estimation of fuzzy
predicates
Our approach SEPIA
Proximity between strings
Histograms and estimation algorithm
Construction and maintenance of SEPIA
Experiments

26
Data

Citeseer
71K author names
Length 2,20, avg 12
Movie records from UCI KDD repository
11K movie titles.
Length 3,80, avg 35
Introduced duplicates
10 of records
of duplicates 1,20, uniform
Final results
Citeseer 142K author names
UCI KDD 23K movie titles

27
Setting

Test bed
PC 2.4G P4, 1.2GB RAM, Windows XP
Visual C compiler
Query workload
Strings from the data
String not in the data
Results similar
Quality measurements
Relative error (fest freal) / freal
Absolute relative error fest freal / freal

28
Quartile distribution of relative errors
Data set 1. CLOSE_RAND 1000 clusters
29
Number of Clusters
30
Dynamic Maintenance

More results in the paper
Extension to other similarity functions
More experimental results

31
Related Work

Traditional histograms
Selectivity estimation for predicates with
wildcards star LIKE Hanks
Answering fuzzy predicates efficiently (another
talk in this conference)

32
Conclusions

Important to support queries with fuzzy string
predicates
SEPIA provides accurate selectivity estimation
Structures can be efficiently constructed and
maintained.
Extendable to various similarity measurements

The Flamingo Project http//www.ics.uci.edu/fla
mingo/
QA?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Liang Jin and Chen Li PowerPoint PPT Presentation

Liang Jin and Chen Li - Star Wars: Episode III - Revenge of the Sith. The Matrix. Title. Schwarzenegger. Samuel Jackson ... estimation for predicates with wildcards: star LIKE '%Hanks ... | PowerPoint PPT presentation | free to view

Liang Jin and Chen Li PowerPoint PPT Presentation

Liang Jin and Chen Li - Title: Slides 03 Subject: ICS214B: Transaction Processing and Distributed Data Management Author: Chen Li Last modified by: Chen Li Created Date: 8/28/1995 11:58:10 AM | PowerPoint PPT presentation | free to view

Liang Jin and Chen Li PowerPoint PPT Presentation

Liang Jin and Chen Li - Star 'Find movies starred Schwarrzenger'? Find movies with a star 'similar to' Schwarrzenger. ... estimation for predicates with wildcards: star LIKE '%Hanks ... | PowerPoint PPT presentation | free to view

Liang JinUC Irvine PowerPoint PPT Presentation

Liang JinUC Irvine - Liang Jin and Chen Li: supported by NSF CAREER Award IIS-0238586 ... Star Wars: Episode III - Revenge of the Sith. The Matrix. Title. Schwarzenegger. Samuel Jackson ... | PowerPoint PPT presentation | free to view

Liang JinUC Irvine PowerPoint PPT Presentation

Liang JinUC Irvine - Star Wars: Episode III - Revenge of the Sith. The Matrix. Title. Schwarzenegger. Samuel Jackson ... At each level, all characters in S become single states and ... | PowerPoint PPT presentation | free to view

Chen Li PowerPoint PPT Presentation

Chen Li - Answering Approximate Queries Efficiently Chen Li Department of Computer Science Joint work with Liang Jin, Nick Koudas, Anthony Tung, and Rares Vernica ... | PowerPoint PPT presentation | free to view

NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms PowerPoint PPT Presentation

NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms - Title: NNH Improving Performance of Nearest-Neighbor Searches using Histograms Author: Liang Jin, Nick Koudas, Chen Li Last modified by: Chen Li Created Date | PowerPoint PPT presentation | free to view

Strange quark dynamics on hot dense matter under the extreme condition PowerPoint PPT Presentation

Strange quark dynamics on hot dense matter under the extreme condition - Strange quark dynamics on hot dense matter under the extreme condition Yu-Gang Ma ( ) (SINAP) Main collaborators: Jin-Hui Chen, Guo-Liang Ma | PowerPoint PPT presentation | free to view

Globally polarized QGP in PowerPoint PPT Presentation

Globally polarized QGP in - Title: PowerPoint Last modified by: Liang Zuo-tang Created Date: 1/1/1601 12:00:00 AM Document presentation format: Other titles | PowerPoint PPT presentation | free to view

Space Gravitational Wave Detection in China PowerPoint PPT Presentation

Space Gravitational Wave Detection in China - Space Gravitational Wave Detection in China Yue-Liang Wu University of Chinese Academy of Sciences (UCAS) Kavli Institute for Theoretical Physics China (KITPC/ITP-CAS) | PowerPoint PPT presentation | free to view

The Ming Dynasty PowerPoint PPT Presentation

The Ming Dynasty - Handscroll in wild cursive script, ink on paper, h. 9 5/8' China, Ming dynasty. ... Peacocks. Artist Lin Liang (1430 1490 CE) Hanging scroll, ink on silk, h. 60 ... | PowerPoint PPT presentation | free to view

Buddhism during the Period of Disunion PowerPoint PPT Presentation

Buddhism during the Period of Disunion - Eastern Jin was succeeded by Liu Song, which began the Southern Dynasties ... meditation exercises, exorcism, sexual hygiene, herbalism, talismanic charms etc ... | PowerPoint PPT presentation | free to view

Fractional Order Signal Processing Techniques, Applications and Urgency PowerPoint PPT Presentation

Fractional Order Signal Processing Techniques, Applications and Urgency - YangQuan Chen Director, Center for Self-Organizing and Intelligent Systems Associate Professor, Dept. of Electrical & Computer Engineering Utah State University ... | PowerPoint PPT presentation | free to view

Chapter 1 Herbs That Release The Exterior PowerPoint PPT Presentation

Chapter 1 Herbs That Release The Exterior - xi xian cao. chou wu tong. luo shi teng. kuan jin teng ... xian mao. Chapter 13. Herbs That Stabilize And Bind. Herbs that Stabilize the Lung and Stop Cough ... | PowerPoint PPT presentation | free to view

F PowerPoint PPT Presentation

F - F 0002 Xiang Nei Zai Fo Xing Ding Li Xun Zhao Chang Zhu Bu Mie Fo Guang Fo Guang Pu Zhao Si Fang ... | PowerPoint PPT presentation | free to view

Tour of Wulingyuan PowerPoint PPT Presentation

Tour of Wulingyuan - Tour of Wulingyuan Greetings from Wulingyuan, a town in northeastern Hunan Province , China. We are 11th grade students at the Wulingyuan #1 Middle School. | PowerPoint PPT presentation | free to view

Tour of Wulingyuan PowerPoint PPT Presentation

Tour of Wulingyuan - Tour of Wulingyuan Greetings from Wulingyuan, a town in northeastern Hunan Province , China. We are 11th grade students at the Wulingyuan #1 Middle School. | PowerPoint PPT presentation | free to view

ACADEMIC EXCELLENCE AT BURNABY NORTH PowerPoint PPT Presentation

ACADEMIC EXCELLENCE AT BURNABY NORTH - in scholarships (entrance as well as others) ( School ... MERLIN LO. EXCEL PROGRAM. 2006 / 07. Offers. 25 Honours courses & 15 AP courses. to students from ... | PowerPoint PPT presentation | free to view

Globally polarized QGP in PowerPoint PPT Presentation

Globally polarized QGP in - Globally polarized QGP in non-central AA collisions at high energies | PowerPoint PPT presentation | free to view

Epilepsy PowerPoint PPT Presentation

Epilepsy - Liver & Kidney Yin def. Signs & Symptoms: Main Sym. : Frequent recurrence, absent-mindedness, dizziness, dry eyes, dark complexion, dry and lusterless helix, ... | PowerPoint PPT presentation | free to view

Development of TCM Internal Medicine PowerPoint PPT Presentation

Development of TCM Internal Medicine - ... abdominal pain and diarrhea due to cold from yang def. Impotence Clear vaginal ... diarrhea Rou Gui Ba Ji Tian Male impotence or female ... | PowerPoint PPT presentation | free to view

G PowerPoint PPT Presentation

G - ... Bo Ai Ba Ba Bo Ai Ba Ba Bo Ai Bo Ai Sai Ba Ba Hare Krishna x2 Krishna Krishna Hare Hare Hare Rama x2 Rama Rama ... | PowerPoint PPT presentation | free to view

Huarong Number 1 Middle School PowerPoint PPT Presentation

Huarong Number 1 Middle School - Her Chinese name is ?? (luo lan) She has a cute dog, named Cody. ... I love pandas. I like to watch TV. I am a happy girl. I like my sister. In this picture: ... | PowerPoint PPT presentation | free to view

S PowerPoint PPT Presentation

S - A - D ... s | PowerPoint PPT presentation | free to view

PowerPoint Template PowerPoint PPT Presentation

PowerPoint Template - | PowerPoint PPT presentation | free to view

4000 years in 15 minutes ECIS 591 PowerPoint PPT Presentation

4000 years in 15 minutes ECIS 591 - Earliest known Chinese writing. Practiced Human Sacrifices. Partilineal System. Qin. 221 B.C. ... High point in Chinese civilization. Buddhism declined and ... | PowerPoint PPT presentation | free to view

Bilingualism and Bilingual Education in the peoples Republic of China PowerPoint PPT Presentation

Bilingualism and Bilingual Education in the peoples Republic of China - Standard Chinese or Mandarin (Putonghua, based on the Beijing dialect), Yue ... Hui were also Han Chinese who have changed to be Islam. ... | PowerPoint PPT presentation | free to view