Tuning using Synthetic Workload - PowerPoint PPT Presentation

About This Presentation
Title:

Tuning using Synthetic Workload

Description:

eTUNER: Tuning Schema Matching Software using Synthetic Scenarios ... Na ve Bays. matcher. TF/IDF. name matcher. SVM. matcher. Characteristics of attr. Post-prune? ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 2
Provided by: rishirak
Category:

less

Transcript and Presenter's Notes

Title: Tuning using Synthetic Workload


1
Schema Matching Systems
Modeling Schema Matching Systems
Tuning Schema Matching Systems
  • Schema Matching
  • Finding semantic matches between the schemas of
    disparate data sources
  • Applications data warehousing, scientific
    collaboration, e-commerce, bioinformatics, data
    integration on WWW,
  • Current Trends
  • Manually finding matches is labor intensive
  • Numerous automatic matching techniques have been
    developed
  • Each technique has its own strength and weakness
  • Hence, most current matching systems adopt a
    multi-component strategy
  • Each component employs a particular matching
    technique
  • Highly extensible and customizable
  • Example LSD, COMA, GLUE, Embley02, SimFlood,
    iMAP, ProtoPlasm,

Matching tool M (L, G, k)
Given a particular matching situation, how
to select the right matching components to
execute, and how to adjust the multiple
knobs of the components?
  • L Library of matching components
  • (e.g. matchers, combiners, filters, etc.)
  • G Execution graph
  • k Collection of control variables (i.e. knobs)
  • Tuning is necessary to get high matching accuracy
  • Crucial in many applications automatic data
    exchange, data integration, peer-to-peer systems,
  • Tuning is extremely difficult
  • Huge space of knobs
  • Wide variety of matching techniques
  • Complex interactions among the components
  • No reasonable guideline for tuning

Example LSD (L, G, k)
Developing efficient techniques for tuning is now
crucial!
Generating Synthetic Workload
Formalization of Tuning Problem
The eTUNER Archietecture
  • Generate synthetic workload
  • Tune a matching system M using the synthetic
    workload and tuning procedures stored in the
    repository
  • Exploit user assistance to generate an even
    higher quality synthetic workload, if possible

V1
V
Exploiting user assistance - Grouping
semantically equivalent attributes over S -
Adding domain specific perturbation rules
  • General tuning problem
  • Given
  • M a schema matching tool
  • Workload a set of matching scenarios (S1,T1),
    (S2,T2), , (Sk,Tk)
  • U a utility function defined over the process
    of matching two schemas
  • Find the knob configuration k maximizing the
    utility over the workload

Perturb of tables
1
3
2
Perturb of columnsin each table
.
Split S into V and U with disjoint data tuples
.
.
EMPLOYEES
Vn
Perturb column and table names
EMPLOYEES
  • Our tuning problem
  • Given
  • M a schema matching tool
  • S a source schema
  • Workload a set of matching scenarios (S,T1),
    (S,T2), , (S,Tk),
  • (The Tis are future schemas)
  • U matching accuracy
  • Find the knob configuration k maximizing the
    average accuracy

Perturb data tuples in each table
U
EMPS
1
3
2
EMPLOYEES
EMPS
EMPLOYEES
EMPS.emp-last EMPLOYEES.last EMPS.id
EMPLOYEES.id EMPS.wage
EMPLOYEES.salary()
O1 a set of semantic matches
V1
U
Tuning using Synthetic Workload
Experimental Results
Summary Future Work
  • Efficient tuning is extremely important
  • Our contributions
  • Establish that tuning matching systems
    automatically is feasible
  • Synthesize workload to estimate the quality of a
    matching system with given knob configurations
  • Establish that staged tuning is a reasonable
    optimization technique
  • Experiment extensively over 4 real-world domains
    with 4 matching systems
  • Future Work
  • Explore better search methods and more extensive
    evaluation
  • Deploy the idea of using synthetic input/output
    pairs to other applications
  • (e.g. wrapper maintenance)

Staged Tuning
Level 4
Level 3
Tuning direction
Level 2
Level 1
  • Tune sequentially starting from the lowest-level
    components
  • Find best knob configuration for a component
    based on matching accuracy over the synthetic
    workload
Write a Comment
User Comments (0)
About PowerShow.com