Requests to Tsong-Li - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Requests to Tsong-Li

Description:

3. Web addresses at the very end. Searching for and Comparing. Trees and Graphs ... can't I compare trees (or graphs) as easily as I can compare strings? Tree ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 24
Provided by: dennis47
Category:

less

Transcript and Presenter's Notes

Title: Requests to Tsong-Li


1
Requests to Tsong-Li
  • 1. Related work at end of each section
  • 2. Screen dumps of treebase at end of treesearch
    section (youll see where)
  • 3. Web addresses at the very end.

2
Searching for and ComparingTrees and Graphs
  • Dennis Shasha, shasha_at_cs.nyu.edu
  • Courant Institute, NYU
  • Joint work with
  • Kaizhong Zhang and Jason Wang

3
Philosophy
  • Trees and graphs represent data in many domains
    in linguistics, chemistry, and even maybe the
    web.
  • Question why cant I search for trees or graphs
    at the speed of keyword searches?
  • Why cant I compare trees (or graphs) as easily
    as I can compare strings?

4
Tree Searching
  • Given a small tree t is it present in a bigger
    tree T?

5
What does present mean?
  • Preserving sibling order or not
  • Preserving ancestor order
  • Preserving distance
  • Mismatches

6
Sibling Order
  • Order of children of a node

A
A
?
B
B
C
C
7
Ancestor Order
  • Order between children and parent.

C
A
?
A
B
C
B
8
Ancestor Distance
  • Can children become grandchildren

A
A
?
X
B
B
C
C
9
Mismatches
  • Can there be relabellings, inserts, and deletes
    (Tolstoy problem)

A
A
how far?
C
B
X
C
10
Bottom Line
  • There is no one definition of mismatch or subtree
    (Tolstoy problem). You must choose the package
    that suits you.
  • I will tell you about three.

11
TreeSearch Query Language
  • Query language is simply a tree decorated with
    single length dont cares (?) and variable length
    dont cares ().

A
gt 0, on each side
?
1

C
D
B
12
Exact Match
  • Query matches exactly if contained regardless of
    sibling order or other nodes

X
A
A
Y
Q
X
?

W
B

Z
D
C
D
U
B
C
13
Inexact Match
  • Inexact match if missing or differing node
    labels. Higher differences cost more.

X
A
A
Y
Q
X
?
Differ by 1
W

B
Z
E
C
D
U
B
C
14
Treesearch Conceptual Algorithm
  • Take all paths in query tree.
  • Find out where each path is in the data tree.
  • So notion of distance is number of paths that
    differ. Higher nodes are more important.
  • Implementation suffix array. A few seconds on
    several thousand trees.

15
Treesearch Review
  • Ancestor order matters.
  • Sibling order doesnt.
  • Dont cares and ?
  • Distance metric is based on numbers of path
    differences.
  • Sister system built by Divesh and Sihem at Bell
    Labs that allows terms to be generalized

16
Tsong-Li screen dumps of treebase then related
work
17
Tree Edit
  • Order of children matters

A
A
A-gtA del(B) ins(B)
B
B
C
C
18
Tree Edit in General
  • Operations are relabel A-gtA, delete (X), insert
    (B).

A
A
A-gtA del(B) ins(B)
B
X
C
C
C
C
19
Review of Tree Edit
  • Generalizes string editing distance for trees, a
    dynamic programming algorithm.
  • O(T1 T2 depth(T1) depth(T2))
  • The basis for XMLdiff.
  • Also has and best removal of subtrees.

20
Tsong-Li related work here
21
Graph Edit
  • Thesis work of Rosalba Giugno.
  • Find a small graph (with and ?) in a big graph.
  • Doesnt work fast if query graph is big because
    graph subisomorphism is exponential.

22
Example of GraphGrep
  • Query graph has nodes and dont cares

A
C
D

B
23
Summary of Tools
  • Why cant tree and graph search be like keyword
    search?
  • We are getting there and will provide software if
    you are interested.
  • Current downloads of about 50.
Write a Comment
User Comments (0)
About PowerShow.com