Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges - PowerPoint PPT Presentation

About This Presentation

Title:

Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges

Description:

Title: Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges – PowerPoint PPT presentation

Number of Views:846

Avg rating:3.0/5.0

Slides: 38

Provided by: Ulys6

Learn more at: https://digitalpathologyassociation.org

Category:

more less

Transcript and Presenter's Notes

Title: Whole Slide Imagery as an Enabling Technology for Content-Based Image Retrieval: A review of current capabilities, opportunities and challenges

1
Whole Slide Imagery as an Enabling Technologyfor
Content-Based Image Retrieval A review of
current capabilities, opportunities and challenges

Ulysses J. Balis, M.D.
Director, Division of Pathology Informatics
Department of Pathology
University of Michigan Health System
ulysses_at_umich.edu

2
Disclosures

Aperio
Technical Advisory Board and Shareholder
Living Microsystems/Artemis Health, Inc.
Founder and Shareholder
Cellpoint Diagnostics
Founder and Shareholder

These are listed for completeness only this
presentation does not contain proprietary or
commercial content from any of the above entities.
3
Overview of Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

Slide 3 of 94
4
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

Slide 4 of 139
5
Thesis Statement

The availability of digital whole slide data sets
represent an enormous opportunity to carry out
new forms of numerical and data- driven query, in
modes not based on textual, ontological or
lexical matching.
Search image repositories with whole images or
image regions of interest
Carry our search in real-time via use of scalable
computational architectures

Extraction from Image repositories based
upon spatial information
001011010111010111..
Analysis of data in the digital domain
6
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

7
Definition

Content-Based Image Retrieval (CBIR)
Within the context of an image-based repository,
searching for matching predicates with
image-based operators in lieu of text matching
Reverse Metadata Lookup (RML)
Using the cohort of returned images from a CBIR
query to generate a list of associated metadata
concept terms
Anatomic frame of reference
Prior diagnoses
Differential Diagnosis

8
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

9
A Quick History of CBIR

1970s Corona Satellite Remote Sensing
Initiative
Film-based
Resultant analog content, when digitized,
represented Gigabytes of data (consider the
computational burden for 1972
Several numerical approaches devised to quickly
crunch data
Many approaches based on conventional image
analysis one or more specific algorithms
developed for each feature to be extracted /
identified
Technically challenging
Time consuming
Computationally expensive
The term CBIR first coined in 1992 by T. Kato to
describe automatic retrieval of images from a
database.
One promising approach also explored was Vector
Quantization (V.Q.)
Many-log increase in computational throughput
required for routine use

10
(No Transcript)
11
CBIR Operational Modes

Query by Example
Find pictures that contain this snippet / ROI
Semantic Retrieval
Find pictures like adenocarcinoma
Like this adenocarcinoma
Multimodal Retrieval
Search for matches based on imagery data combined
with other search metrics
High-throughput omics data, etc.
Patient clinical outcomes and therapeutic
response data
Other imaging modalities

12
CBIR Techniques (conventional)

Color Operators
Texture operators
Shape
Spectral information
Frequency and phase domain information

There are at least several thousand major classes
of conventional image analysis operations, with
most exhibiting the common trait of requiring
some degree of application tuning for the
intended use-case. Hence, this class of
approaches should not be generally viewed as
turnkey solutions.
13
CBIR Techniques (innovative)

Genetic Image Exploration
Originally designed to analyze multispectral
satellite data
Semi-autonomous systems that employ a
decision-tree to search a known repertoire of
conventional image analysis algorithms for the
most sensitive and specific combination of
algorithms that fits the query predicate
is representative
(Los Alamos National Labs)
Autonomous operation comes at a price the need
for significant computational throughput in
training mode (e.g. slow)

14
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

15
Prior Work

Conventional Image analysis
Conventional Vector Quantization

16
Conventional Image Analysis

At present, confined to specific use-cases
Quantitative IHC
FDA validation an ongoing challenge
Not reduced to practice as an integral tool of
the pathologists workstation

17
Conventional Vector Quantization
Original Image
Division of image into local domains
Extraction of Local Domain Composite Vectors
?
VKSLx0y0Order , LxnymOrder
Vectorization of each local kernel
Individual assessment of each vector dimension
38857448643
18
Conventional Vector Quantization
VKSLx0y0Order , LxnymOrder
Established Vocabulary
Query Against library (Vocabulary) of Established
Vectors
Novel Vector
Previously Identified Vector
Assignment of a unique serial number and
inclusion into global vocabulary
Assembly of compressed dataset
38857448643
19
VQ-Based Image Compression as the Original
Predicate for Carrying OutImage-Based Search
Raw Data
Restored Data
Compressed data The spatially-preserved
organization of the encoded data represents a
many-fold decrease in overall search dataset
size, thus providing a significant computational
opportunity for accelerated search. Additionally,
the vectors identified as contributing to a
match may be visually interrogated for
confirmation of their predictive morphologic
content.
20
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

21
The Challenge That IsPathology CBIR

Start with some conservative initial assumptions,
concerning a prototypic image repository, in
terms of search potential
Ability to search 10 years of data
1000 slides day ? 200,000 slides/year
500 Mb of compressed whole slide data/slide
Operational goal of being able to
Search in real-time
Re-index the database every evening, such that
searches carried out the next day are current

22
The Challenge That IsPathology CBIR

Net storage required for ten years worth of
data
1 Billion Megabytes
106 Gigabytes
103 Terabytes
100 Petabytes ? 1 Petabyte
Current conservative enterprise storage is 2000/
Terabyte
The full Petabyte would cost 2M
A single Genetic-type search across all images,
assuming 5 seconds of computation / slide, would
be
200,000 slides x 10 x 5 seconds ? 5 million
seconds
This is 6 log too slow
8.27 weeks or about 6 searches per year
(original Apple 2e 78 years)
So we would need to save our queries for those
really important image searches.
Conventional VQ, which is 100 times faster, is
still not fast enough 13.8 hours per feature
search
Yet another 4 log of performance is required
Two ways to address this
10,000 parallel processors or
better algorithms

23
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

24
On Current Technology

Modern computational throughput continues to
increase, with this capability representing an
opportunity for perhaps 1-2 log performance
increase in the next decade
With a one-log increase, we are still left with a
five-log gap that needs to be made up by improved
algorithmic performance.

25
Recent Developments

A number of promising algorithms being developed
Support Vector Machines (SVM)
Principle Component analysis
High-dimensional reduction approaches
Spatially-invariant VQ (SiVQ)

26
VQ Revisited and SiVQ

Q What is conventional VQs greatest weakness
A Too many required vectors to represent a
single atomic morphologic feature
(promiscuity of vector set growth with continued
training)

27
Conventional VQ Vector Growth during training
28
A Matter of Degrees of Freedom
How many ways can this be sampled?
29
How Many Ways Can A Candidate Feature Be Matched
During Training?
Y Translational Freedom
X Translational Freedom
Rotational Freedom
30
In VQ it may be the same feature but there are
excessively enumerable ways to sample

Typical Feature Vector
25 x 25 pixels (x by y) or larger
? 625 translational degrees of freedom
Effective radius of 12.5 pixels
After Nyquist rotational sampling (2x spatial
frequency)
2 x (2 x 12.5 x p) ? 79 separate rotations
3 color planes
2 mirror symmetries
At least 20 possible semi-discreet length-scale
Nyquist samples
All together, there are at least 625 x 79 x 3 x 2
x 20? 5,925,000 possible ways to represent one
possible vector (assuming twenty fixed
magnifications in use)
This explains the non-asymptotic (unbounded)
vector growth observed of some histology
patterns.
Multispectral data (e.g. 28 vs. 3 bands) will
further multiply the diagnostic power of SiVQ
vectors (55,300,000 degrees of freedom / vector)

31
Consequences of SiVQ

Use one spatially-invariant vector to do the work
of 5,925,000 spatially-constrained vectors
5,925,000x faster
5,925,000 fewer vectors to store per feature
archetype
6 log increase in algorithmic performance (we
only needed 4 log, so we have CPU to burn)
Implies an operational solution to the real-time
requirement for large datasets
CBIR is essentially reduced to practice for a
sizable contingent of textural-based whole slide
image-retrieval use-cases
Emergent property SiVQ works equally-well on all
structurally-repetitive data sets (e.g. remote
sensing, Google-like image searches of the Web)

32
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

33
Interactive Demonstration
34
Topics

Thesis statement
Definitions
A quick history of content-based image retrieval
(CBIR)
Prior work
The challenge that is Pathology CBIR
Current technology and recent developments
Demonstrations
Opportunities
upcoming Web-enabled tool suites
Intended use-cases

35
Opportunities and Future Work

CBIR development will continue
Many groups already demonstrating feasibility of
real-time query capability
Activity at Rutgers, U. of Pittsburgh and Cal
Tech
For the UofM Group
Rapid dissemination of the algorithm and
libraries via peer-reviewed publications and/or
e-pubs
Extension of the discovery tool suite to support
multiple-vector classification, similar to the
approaches taken for prior VQ systems, with rapid
follow-on publications
Ground-Truth Engine for integrative
multimodality studies
Activation of an open-architectures website that
will provide a downloadable tool suite and a
Web-Based, real-time decision support environment
for submitted images, operating in two general
use-cases
Surface classification with rare event detection
(anything not classified as normal)
Differential diagnosis generation with return of
matching images and associated metadata
Generation of a classification library of
extensive normal SiVQ vectors for each organ
system
Actively pursue collaboration to form a core team
to adjudicate needed normal and abnormal vector
classes

36
Closing Remarks

CBIR is not vaporware or an elusive computational
goal
Contemporary computation speed is, actually,
quite adequate for many CBIR tasks
Much work remains to realize its full potential
SiVQ will likely be one of a plurality of
compelling solutions in the Image Query /
Decision-support armamentarium

37
Acknowledgements