Tuning using Synthetic Workload - PowerPoint PPT Presentation

About This Presentation

Title:

Tuning using Synthetic Workload

Description:

eTUNER: Tuning Schema Matching Software using Synthetic Scenarios ... Na ve Bays. matcher. TF/IDF. name matcher. SVM. matcher. Characteristics of attr. Post-prune? ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 2

Provided by: rishirak

Category:

more less

Transcript and Presenter's Notes

Title: Tuning using Synthetic Workload

1
Schema Matching Systems
Modeling Schema Matching Systems
Tuning Schema Matching Systems

Schema Matching
Finding semantic matches between the schemas of
disparate data sources
Applications data warehousing, scientific
collaboration, e-commerce, bioinformatics, data
integration on WWW,
Current Trends
Manually finding matches is labor intensive
Numerous automatic matching techniques have been
developed
Each technique has its own strength and weakness
Hence, most current matching systems adopt a
multi-component strategy
Each component employs a particular matching
technique
Highly extensible and customizable
Example LSD, COMA, GLUE, Embley02, SimFlood,
iMAP, ProtoPlasm,

Matching tool M (L, G, k)
Given a particular matching situation, how
to select the right matching components to
execute, and how to adjust the multiple
knobs of the components?

L Library of matching components
(e.g. matchers, combiners, filters, etc.)

G Execution graph

k Collection of control variables (i.e. knobs)

Tuning is necessary to get high matching accuracy
Crucial in many applications automatic data
exchange, data integration, peer-to-peer systems,
Tuning is extremely difficult
Huge space of knobs
Wide variety of matching techniques
Complex interactions among the components
No reasonable guideline for tuning

Example LSD (L, G, k)
Developing efficient techniques for tuning is now
crucial!
Generating Synthetic Workload
Formalization of Tuning Problem
The eTUNER Archietecture

Generate synthetic workload
Tune a matching system M using the synthetic
workload and tuning procedures stored in the
repository
Exploit user assistance to generate an even
higher quality synthetic workload, if possible

V1
V
Exploiting user assistance - Grouping
semantically equivalent attributes over S -
Adding domain specific perturbation rules

General tuning problem
Given
M a schema matching tool
Workload a set of matching scenarios (S1,T1),
(S2,T2), , (Sk,Tk)
U a utility function defined over the process
of matching two schemas
Find the knob configuration k maximizing the
utility over the workload

Perturb of tables
1
3
2
Perturb of columnsin each table
.
Split S into V and U with disjoint data tuples
.
.
EMPLOYEES
Vn
Perturb column and table names
EMPLOYEES

Our tuning problem
Given
M a schema matching tool
S a source schema
Workload a set of matching scenarios (S,T1),
(S,T2), , (S,Tk),
(The Tis are future schemas)
U matching accuracy
Find the knob configuration k maximizing the
average accuracy

Perturb data tuples in each table
U
EMPS
1
3
2
EMPLOYEES
EMPS
EMPLOYEES
EMPS.emp-last EMPLOYEES.last EMPS.id
EMPLOYEES.id EMPS.wage
EMPLOYEES.salary()
O1 a set of semantic matches
V1
U
Tuning using Synthetic Workload
Experimental Results
Summary Future Work

Efficient tuning is extremely important
Our contributions
Establish that tuning matching systems
automatically is feasible
Synthesize workload to estimate the quality of a
matching system with given knob configurations
Establish that staged tuning is a reasonable
optimization technique
Experiment extensively over 4 real-world domains
with 4 matching systems
Future Work
Explore better search methods and more extensive
evaluation
Deploy the idea of using synthetic input/output
pairs to other applications
(e.g. wrapper maintenance)

Staged Tuning
Level 4
Level 3
Tuning direction
Level 2
Level 1

Tune sequentially starting from the lowest-level
components
Find best knob configuration for a component
based on matching accuracy over the synthetic
workload

Write a Comment

User Comments (0)