Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges - PowerPoint PPT Presentation

About This Presentation
Title:

Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges

Description:

Title: Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges – PowerPoint PPT presentation

Number of Views:846
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges


1
Whole Slide Imagery as an Enabling Technologyfor
Content-Based Image Retrieval A review of
current capabilities, opportunities and challenges
  • Ulysses J. Balis, M.D.
  • Director, Division of Pathology Informatics
  • Department of Pathology
  • University of Michigan Health System
  • ulysses_at_umich.edu

2
Disclosures
  • Aperio
  • Technical Advisory Board and Shareholder
  • Living Microsystems/Artemis Health, Inc.
  • Founder and Shareholder
  • Cellpoint Diagnostics
  • Founder and Shareholder

These are listed for completeness only this
presentation does not contain proprietary or
commercial content from any of the above entities.
3
Overview of Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

Slide 3 of 94
4
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

Slide 4 of 139
5
Thesis Statement
  • The availability of digital whole slide data sets
    represent an enormous opportunity to carry out
    new forms of numerical and data- driven query, in
    modes not based on textual, ontological or
    lexical matching.
  • Search image repositories with whole images or
    image regions of interest
  • Carry our search in real-time via use of scalable
    computational architectures

Extraction from Image repositories based
upon spatial information
001011010111010111..
Analysis of data in the digital domain
6
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

7
Definition
  • Content-Based Image Retrieval (CBIR)
  • Within the context of an image-based repository,
    searching for matching predicates with
    image-based operators in lieu of text matching
  • Reverse Metadata Lookup (RML)
  • Using the cohort of returned images from a CBIR
    query to generate a list of associated metadata
    concept terms
  • Anatomic frame of reference
  • Prior diagnoses
  • Differential Diagnosis

8
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

9
A Quick History of CBIR
  • 1970s Corona Satellite Remote Sensing
    Initiative
  • Film-based
  • Resultant analog content, when digitized,
    represented Gigabytes of data (consider the
    computational burden for 1972
  • Several numerical approaches devised to quickly
    crunch data
  • Many approaches based on conventional image
    analysis one or more specific algorithms
    developed for each feature to be extracted /
    identified
  • Technically challenging
  • Time consuming
  • Computationally expensive
  • The term CBIR first coined in 1992 by T. Kato to
    describe automatic retrieval of images from a
    database.
  • One promising approach also explored was Vector
    Quantization (V.Q.)
  • Many-log increase in computational throughput
    required for routine use

10
(No Transcript)
11
CBIR Operational Modes
  • Query by Example
  • Find pictures that contain this snippet / ROI
  • Semantic Retrieval
  • Find pictures like adenocarcinoma
  • Like this adenocarcinoma
  • Multimodal Retrieval
  • Search for matches based on imagery data combined
    with other search metrics
  • High-throughput omics data, etc.
  • Patient clinical outcomes and therapeutic
    response data
  • Other imaging modalities

12
CBIR Techniques (conventional)
  • Color Operators
  • Texture operators
  • Shape
  • Spectral information
  • Frequency and phase domain information

There are at least several thousand major classes
of conventional image analysis operations, with
most exhibiting the common trait of requiring
some degree of application tuning for the
intended use-case. Hence, this class of
approaches should not be generally viewed as
turnkey solutions.
13
CBIR Techniques (innovative)
  • Genetic Image Exploration
  • Originally designed to analyze multispectral
    satellite data
  • Semi-autonomous systems that employ a
    decision-tree to search a known repertoire of
    conventional image analysis algorithms for the
    most sensitive and specific combination of
    algorithms that fits the query predicate
  • is representative
  • (Los Alamos National Labs)
  • Autonomous operation comes at a price the need
    for significant computational throughput in
    training mode (e.g. slow)

14
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

15
Prior Work
  • Conventional Image analysis
  • Conventional Vector Quantization

16
Conventional Image Analysis
  • At present, confined to specific use-cases
  • Quantitative IHC
  • FDA validation an ongoing challenge
  • Not reduced to practice as an integral tool of
    the pathologists workstation

17
Conventional Vector Quantization
Original Image
Division of image into local domains
Extraction of Local Domain Composite Vectors
?
VKSLx0y0Order , LxnymOrder
Vectorization of each local kernel
Individual assessment of each vector dimension
38857448643
18
Conventional Vector Quantization
VKSLx0y0Order , LxnymOrder
Established Vocabulary
Query Against library (Vocabulary) of Established
Vectors
Novel Vector
Previously Identified Vector
Assignment of a unique serial number and
inclusion into global vocabulary
Assembly of compressed dataset
38857448643
19
VQ-Based Image Compression as the Original
Predicate for Carrying OutImage-Based Search
Raw Data
Restored Data
Compressed data The spatially-preserved
organization of the encoded data represents a
many-fold decrease in overall search dataset
size, thus providing a significant computational
opportunity for accelerated search. Additionally,
the vectors identified as contributing to a
match may be visually interrogated for
confirmation of their predictive morphologic
content.
20
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

21
The Challenge That IsPathology CBIR
  • Start with some conservative initial assumptions,
    concerning a prototypic image repository, in
    terms of search potential
  • Ability to search 10 years of data
  • 1000 slides day ? 200,000 slides/year
  • 500 Mb of compressed whole slide data/slide
  • Operational goal of being able to
  • Search in real-time
  • Re-index the database every evening, such that
    searches carried out the next day are current

22
The Challenge That IsPathology CBIR
  • Net storage required for ten years worth of
    data
  • 1 Billion Megabytes
  • 106 Gigabytes
  • 103 Terabytes
  • 100 Petabytes ? 1 Petabyte
  • Current conservative enterprise storage is 2000/
    Terabyte
  • The full Petabyte would cost 2M
  • A single Genetic-type search across all images,
    assuming 5 seconds of computation / slide, would
    be
  • 200,000 slides x 10 x 5 seconds ? 5 million
    seconds
  • This is 6 log too slow
  • 8.27 weeks or about 6 searches per year
  • (original Apple 2e 78 years)
  • So we would need to save our queries for those
    really important image searches.
  • Conventional VQ, which is 100 times faster, is
    still not fast enough 13.8 hours per feature
    search
  • Yet another 4 log of performance is required
  • Two ways to address this
  • 10,000 parallel processors or
  • better algorithms

23
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

24
On Current Technology
  • Modern computational throughput continues to
    increase, with this capability representing an
    opportunity for perhaps 1-2 log performance
    increase in the next decade
  • With a one-log increase, we are still left with a
    five-log gap that needs to be made up by improved
    algorithmic performance.

25
Recent Developments
  • A number of promising algorithms being developed
  • Support Vector Machines (SVM)
  • Principle Component analysis
  • High-dimensional reduction approaches
  • Spatially-invariant VQ (SiVQ)

26
VQ Revisited and SiVQ
  • Q What is conventional VQs greatest weakness
  • A Too many required vectors to represent a
    single atomic morphologic feature
  • (promiscuity of vector set growth with continued
    training)

27
Conventional VQ Vector Growth during training
28
A Matter of Degrees of Freedom
How many ways can this be sampled?
29
How Many Ways Can A Candidate Feature Be Matched
During Training?
Y Translational Freedom
X Translational Freedom
Rotational Freedom
30
In VQ it may be the same feature but there are
excessively enumerable ways to sample
  • Typical Feature Vector
  • 25 x 25 pixels (x by y) or larger
  • ? 625 translational degrees of freedom
  • Effective radius of 12.5 pixels
  • After Nyquist rotational sampling (2x spatial
    frequency)
  • 2 x (2 x 12.5 x p) ? 79 separate rotations
  • 3 color planes
  • 2 mirror symmetries
  • At least 20 possible semi-discreet length-scale
    Nyquist samples
  • All together, there are at least 625 x 79 x 3 x 2
    x 20? 5,925,000 possible ways to represent one
    possible vector (assuming twenty fixed
    magnifications in use)
  • This explains the non-asymptotic (unbounded)
    vector growth observed of some histology
    patterns.
  • Multispectral data (e.g. 28 vs. 3 bands) will
    further multiply the diagnostic power of SiVQ
    vectors (55,300,000 degrees of freedom / vector)

31
Consequences of SiVQ
  • Use one spatially-invariant vector to do the work
    of 5,925,000 spatially-constrained vectors
  • 5,925,000x faster
  • 5,925,000 fewer vectors to store per feature
    archetype
  • 6 log increase in algorithmic performance (we
    only needed 4 log, so we have CPU to burn)
  • Implies an operational solution to the real-time
    requirement for large datasets
  • CBIR is essentially reduced to practice for a
    sizable contingent of textural-based whole slide
    image-retrieval use-cases
  • Emergent property SiVQ works equally-well on all
    structurally-repetitive data sets (e.g. remote
    sensing, Google-like image searches of the Web)

32
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

33
Interactive Demonstration
34
Topics
  • Thesis statement
  • Definitions
  • A quick history of content-based image retrieval
    (CBIR)
  • Prior work
  • The challenge that is Pathology CBIR
  • Current technology and recent developments
  • Demonstrations
  • Opportunities
  • upcoming Web-enabled tool suites
  • Intended use-cases

35
Opportunities and Future Work
  • CBIR development will continue
  • Many groups already demonstrating feasibility of
    real-time query capability
  • Activity at Rutgers, U. of Pittsburgh and Cal
    Tech
  • For the UofM Group
  • Rapid dissemination of the algorithm and
    libraries via peer-reviewed publications and/or
    e-pubs
  • Extension of the discovery tool suite to support
    multiple-vector classification, similar to the
    approaches taken for prior VQ systems, with rapid
    follow-on publications
  • Ground-Truth Engine for integrative
    multimodality studies
  • Activation of an open-architectures website that
    will provide a downloadable tool suite and a
    Web-Based, real-time decision support environment
    for submitted images, operating in two general
    use-cases
  • Surface classification with rare event detection
    (anything not classified as normal)
  • Differential diagnosis generation with return of
    matching images and associated metadata
  • Generation of a classification library of
    extensive normal SiVQ vectors for each organ
    system
  • Actively pursue collaboration to form a core team
    to adjudicate needed normal and abnormal vector
    classes

36
Closing Remarks
  • CBIR is not vaporware or an elusive computational
    goal
  • Contemporary computation speed is, actually,
    quite adequate for many CBIR tasks
  • Much work remains to realize its full potential
  • SiVQ will likely be one of a plurality of
    compelling solutions in the Image Query /
    Decision-support armamentarium

37
Acknowledgements
  • Jerome Cheng, U. of Michigan
  • Anastasios Markas, Insilica Corporation
  • Mehmet Toner and Ronald Tompkins, Harvard Medical
    School
  • Mike Feldman, U. of Pennsylvania
Write a Comment
User Comments (0)
About PowerShow.com