Extraction of Vectorized Graphical Information from Scientific Chart Images - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Extraction of Vectorized Graphical Information from Scientific Chart Images

Description:

... Liu, Weihua Huang, Chew Lim Tan. School of Computing. National University of ... A set of points from the contour or the skeleton of the lines is chosen. ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 29
Provided by: compN
Category:

less

Transcript and Presenter's Notes

Title: Extraction of Vectorized Graphical Information from Scientific Chart Images


1
Extraction of Vectorized Graphical Information
from Scientific Chart Images
Ruizhe Liu, Weihua Huang, Chew Lim Tan School
of Computing National University of Singapore
2
Outline
  • Introduction
  • Related works
  • The proposed approach
  • Experiment results
  • Conclusion

3
Introduction
  • What is represented in a scientific chart?
  • Recognition and interpretation is the reverse
    process

4
Introduction
  • The significance of recognition and
    interpretation of scientific chart images.
  • Automatic document processing Convert the charts
    into machine readable form
  • Information retrieval and extraction recover the
    tabula data and the intended message

5
Introduction
  • Major steps in the system

6
Introduction
  • The role of vectorization
  • Converts pixels into graphical primitives
  • Forms the basis for graphical symbol
    construction1
  • The proposed approach
  • Directional single-connected chains (DSCC)
    curve fitting

7
Outline
  • Introduction
  • Related works
  • The proposed approach
  • Experiment results
  • Conclusion

8
Related Works
  • Point based vectorization
  • A set of points from the contour or the skeleton
    of the lines is chosen.
  • Curve fitting methods are applied
  • Limitations
  • Proper point set required, otherwise easy to
    shift from the true curve when the lines are
    distorted
  • Difficult to handle line intersections

9
Related Works
  • Point based vectorization

Straight line fitting
Ellipse fitting
10
Related Works
  • Segment based vectorization
  • Segments obtained from thinning, contour
    tracking, run-length coding or medial-axis
    tracking
  • Make use of geometric features
  • Limitations
  • Difficult to extract geometric features for
    complex curves
  • Some still have difficulty with line intersections

11
Related Works
  • Segment based vectorization

Thinning based method
Sparse pixel tracking method
Progressive simplification based tracking method
12
Outline
  • Introduction
  • Related works
  • The proposed approach
  • Experiment results
  • Conclusion

13
The Proposed Approach
  • Major steps
  • Its a 2-pass vectorization process
  • Run-lengths breaks lines and curves at each
    intersection
  • Broken lines and curves are re-joint during
    post-processing

14
The Proposed Approach
  • The directional single-connected chain (DSCC)
  • Chain of run-lengths following a single
    direction, can be treated as a segment
  • Originally proposed for form recognition
  • Average length of the run-lengths estimates the
    thickness of the chain
  • The set of mid-points of the run-lengths in a
    chain allows curve fitting to the chain

15
The Proposed Approach
  • Construction of DSCC (straight-line)

Horizontal run-length
Vertical run-length
  • Chain formed by run-lengths that
  • have similar length
  • keep the direction of the chain (run-length does
    not shift too much)

16
The Proposed Approach
  • Construction of DSCC (arc)
  • Chain formed by run-lengths that
  • have similar length
  • keep the direction of the chain (run-length does
    not shift too much)
  • number of neighbors 2

17
The Proposed Approach
  • Post-processing
  • Filtering remove run-lengths with length 1 and
    1 neighbors
  • Smoothing combine two run-lengths in the same
    column/row if the blank area between them is less
    than threshold T

Before
After
18
The Proposed Approach
  • Post-processing
  • Splitting iterative divide a DSCC at the turning
    point that is the run-length with maximum
    distance to the line formed by the start and end
    point of the DSCC

P1
P3
P2
Each DSCC is now a straight line, or arc or
polyline
19
The Proposed Approach
  • Apply ellipse fitting to each DSCC A.
    Fitzgibbon, 1999

Minimize the squared algebraic distance
F(A X) A X ax2 bxy cy2 dx ey f
0 where A a b c d e f T and X x2 xy y2
x y 1T.
F(A Xi) the algebraic distance of a point
(xi, yi) to the conic F(A X) 0
SA ?CA ATCA 1
S is the scatter matrix DTD. D x1 x2 xn T
is called the design matrix C is the matrix
that expresses the constraint. ? is the Lagrange
multiplier.
20
The Proposed Approach
  • Result of ellipse fitting
  • Classification and verification
  • Straight line max_radius / min_radius T
  • Circular arc max_radius / min_radius 1
  • Elliptic arc max_radius / min_radius T
  • Polyline through error checking

21
The Proposed Approach
  • Combine straight lines
  • The two lines must be within the connected area.
  • The two lines should be angled less than 10
    degrees.
  • Combine arcs
  • The two arcs must be within the connected area.
  • The two arcs have common center and radius.
  • The two tangent lines of the staring or ending
    points of the two arcs should be angled less than
    10 degrees.

22
Outline
  • Introduction
  • Related works
  • The proposed approach
  • Experiment results
  • Conclusion

23
Experimental Results
  • Dataset
  • 200 chart images including 2D and 3D bar charts,
    2D and 3D pie charts, and 2D line charts.
  • Multi-leveled ground truth information for
    performance evaluation available, including
    vector level information of straight lines,
    circular and elliptic arcs.

24
Experimental Results
  • Evaluation Criteria
  • s overlapping segment between an extracted
    vector vd and the corresponding vector in the
    ground truth vg
  • Coverage(s, vi) the length of s divided by the
    length of vi, where vi is either vd or vg.
  • Correct if both Coverage(s, vd) and Coverage(s,
    vg) are 90
  • Broken if Coverage(s, vd) 90 but Coverage(s,
    vg) is not
  • Wrong if both Coverage(s, vd) and Coverage(s,
    vg)

25
Experimental Results
  • Results

26
Outline
  • Introduction
  • Related works
  • The proposed approach
  • Experiment results
  • Conclusion

27
Conclusion
  • A method for obtaining vector information from
    scientific chart images is introduced.
  • The method is based on construction of DSCC and
    ellipse fitting.
  • The resulting vectors are to be used to construct
    graphical symbols for further recognition and
    interpretation purposes.

28
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com