Crystallization Image Analysis on the World Community Grid - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Crystallization Image Analysis on the World Community Grid

Description:

Crystallization Image Analysis on the World Community Grid ... Grid members have contributed 8,919 CPU-years so far to HCC, an average of 55 CPU-years per day. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 28
Provided by: christia186
Category:

less

Transcript and Presenter's Notes

Title: Crystallization Image Analysis on the World Community Grid


1
Crystallization Image Analysis on the World
Community Grid
  • Christian A. Cumbaa and Igor Jurisica
  • Jurisica Lab, Division of Signaling Biology
  • Ontario Cancer Institute,
  • Toronto, Ontario

2
Why automate classification of protein
crystallization trial images?
  • Hauptman-Woodward has 65,000,000 images.
  • They want 65,000,000 outcomes.

3
Why automate classification of protein
crystallization trial images?
  • Assist or replace human screening
  • Speed the search phase in protein crystallization
  • Improve throughput, consistency, objectivity
  • Enables data mining and statistical optimization
    of the crystallization process

clear
precipitate
crystal
4
Image classification
5
Truth data
  • 96 study
  • 96 proteins X 1536 images hand-scored by 3
    experts
  • Presence/absence of 7 independent outcomes
  • NESG SGPP
  • 15000 images
  • Hand-scored by 1 expert, same scoring system
  • 50 unanimously-scored images
  • 10 most interesting compound categories

96-study
NESG (crystals)
SGPP (crystals)
6
Feature set
  • 12375 features computed per image
  • A few basic statistics
  • 50 microcrystal features
  • Euler number features, two variations
  • 11 Blur levels
  • 11 Blur levels X 4 thresholds
  • Image energy
  • 11 blur levels
  • 2925 Grey-Level Co-occurrence Matrix features
  • 3 different grey-level quantizations
  • 13 basic functions
  • 25 sample distances
  • 100 directions
  • Computable from every point in the image
  • Distilled to max range, max mean, min mean
  • 9500 image-blob features
  • Radon edge-detection

7
Our image analysis problem
  • Computing all 12,375 features takes gt5 hours for
    a single image
  • We have 165,000 images in our training set
  • Features must be evaluated for quality
  • The best features (10s or low 100s) must be
    computed for the remaining 65,000,000 images
  • Massive computing resources required!

8
Image analysis on the World Community Grid
  • http//www.worldcommunitygrid.org
  • a global, distributed-computing platform for
    solving large scientific computing problems with
    human impact
  • 377,627 volunteers contribute idle CPU time of
    960,346 devices.
  • Our project Help Conquer Cancer
  • launched November 2007.
  • HCC has two goals
  • To survey a wide tract of image-feature space and
    identify image analysis algorithms and parameters
    (features) that best determine crystallization
    outcome.
  • To perform the necessary image analysis on
    Hauptman Woodwards archive of 65,000,000
    crystallization trial images.

fundraising slogan of the Ontario Cancer
Institute and its parent organization.
9
Image analysis on the World Community Grid
  • HCC has two phases
  • Phase I calculate 12,375 features per image on
    high-priority images, including 165,441
    hand-scored images.
  • November 2007-May 2008
  • analysis on hand-scored images completed January
    2008
  • Phase II calculate the best features from Phase
    I on the backlog of HWI images
  • Grid members have contributed 8,919 CPU-years so
    far to HCC, an average of 55 CPU-years per day.

10
(No Transcript)
11
(No Transcript)
12
Phase I feature assessment
13
Measuring feature quality
feature entropy
  • Treat as random variables
  • Image class
  • Feature value
  • Measure the mutual information between them
    (unit bits)
  • entropy(class) entropy(feature)
    entropy(class,feature)

class entropy
14
Measuring feature quality
15
Information density microcrystal counts
parameter space
16
Information density GLCM maximum range parameter
space
Clear
Precipitate
Crystal
17
Information density Radon-Sobel soft sum
parameter space
Clear
Precipitate
Crystal
18
Information density Radon-Sobel blob metrics
(means) parameter space
Clear
Precipitate
Crystal
19
Towards Phase II image classification
20
Building classifiers
  • handpicked 74 features from peaks in the clear,
    precipitate and other mutual information plots
  • two classification schemes
  • three-way clear, non-crystal precipitate, other
  • ten-way clear, phase separation, phase
    precipitate, skin, phase crystal, precip,
    precip skin, precip crystal, crystal, garbage
  • naïve Bayes model
  • leave-one-out cross-validation

21
Measuring classifier accuracy precision and
recall
crystals
recall
I think these are crystals
precision
22
Three-class distribution
Clear 24.3
Precipitate AND NOT crystal 52.7
Other 23.0
Confusion matrix
23
Recall precision
24
10-class distribution
Clear 33.83
Phase separation 7.00
Phase separation precipitate 0.50
Skin 0.79
Phase separation crystal 2.32
Precipitate 34.25
Precipitate skin 4.95
Precipitate crystal 7.53
Crystal 8.34
Garbage 0.55
25
Confusion matrix
26
Recall precision
27
Acknowledgements
  • Hauptman-Woodward Medical Research Institute
  • George DeTitta, Joe Luft, Eddie Snell, Mike
    Malkowski, Angela Lauricella, Max Thayer, Raymond
    Nagel, Steve Potter, and the 96-study reviewers.
  • World Community Grid
  • Bill Bovermann, Viktors Berstis, Jonathan D.
    Armstrong, Tedi Hahn, Kevin Reed, Keith J.
    Uplinger, Nels Wadycki
  • IBM Deep Computing
  • Jerry Heyman
  • Jurisica Lab
  • Richard Lu
  • All crystallization images were generated at the
    High-Throughput Screening lab at The
    Hauptman-Woodward Institute.
  • Funding from
  • NIH U54 GM074899
  • Genome Canada
  • IBM
  • NSERC
  • (and earlier work from)
  • NIH P50 GM62413
  • NSERC
  • CITO
Write a Comment
User Comments (0)
About PowerShow.com