Title: Condor in CryoEM image processing
1 Condor in Cryo-EM image processing
Weimin Wu, Wen Jiang
Department of biological sciences Purdue
University 04/30/2008
2Cryo-EM low temperature electron
microscopy Image processing get the 3D
reconstruction from 2D images. Introduction Viral
infections have been and remain one of the major
threats to human health. Viruses are large
assemblies of proteins and nucleic acids that
rely on infection of hosts to complete their life
cycle and sustain their propagation. High
resolution 3-D structure of the virus particles
will provide important insights to understanding
of these processes and the development of
effective prevention and treatment strategies.
Recently we have demonstrated, in collaboration
with researchers in Baylor College of Medicine
and MIT, the 3-D reconstruction of the infectious
bacterial virus Epsilon15 (e15) at 4.5 Ã…
resolution, which allowed tracing of the
polypeptide backbone of its major capsid protein
gp7 (Jiang et al., Nature 451(7182)1130-4,
2008).
3For many of the tailed dsDNA viruses, for example
the bacterial viruses T7, T3 and e15, one of the
12 icosahedral 5-fold vertices is occupied by a
unique 12-fold portal protein complex. This
unique portal vertex is responsible for the
packaging of dsDNA genome into the protein shell
during assembly and the ejection of the dsDNA
genome out of the virus and into the host cell
during infection. However, high resolution
structure of these virus particles, especially
the non-icosahedrally organized components such
as the portal complex, the tail and the
encapsulated dsDNA genome, are lacking. I am
working on this kind of project without enforcing
any symmetry on virus. Now we get a sub-nanometre
resolution result which enables us to visualize
the secondary structure of portal, tail hub and
tail spikes.
4(A) Schematic diagram of the T7/T3 phage particle
assembly and dsDNA genome packaging pathway.
Adapted from (Serwer, 2004). (B) A cryo-EM
micrograph of T3 phage showing the particles
representing each of the major stages during
assembly and genome packaging.
5Image processing is a critical step for
generating the macromolecule 3D structure from
the 2D images taken with cryo-EM technique. This
step includes 2D alignment and 3D reconstruction.
Both need intensive computing power. High
performance computing (HPC) resources supported
by RCAC enable us to work on huge datasets for
getting high resolution results and therefore
learning more details of biological system.
6Scientific needs Two major steps are involved in
the cryo-EM image processing. One is the 2D
alignment step, which is to find the orientation
and center information of the sample particles by
matching the images (2D projection of the sample
particles) with the reference, the other step is
3D reconstruction step, which generates the 3D
map by collecting all the particles orientation
and center information and averaging them.
1second
1 raw image vs 1 projection
22K CPU hours
7GroEL as example to show the 3D reconstruction
and many iterations needed for high resolution.
For our E15 project, even we started with an
intermediate resolution map (7?), more than 10
iterations were continued for achieving 4.5?.
Features as a function of resolution to show how
to evaluate the resolution qualitatively from
density map
8Condor Performance
We feel lucky in Purdue to get so many resources
supported by RCAC, otherwise our research will
take forever. Here I list the condor jobs we
submitted and CPU hours we used.
each job took about half a hour.
each job took about one hour due to different
algorithm and other reasons.
9Running jobs versus Time. This is a long time
job, about 64hours. It is obvious there are three
major peaks. These three periods are overnight
time. At daytime, the number of running jobs drop
a lot due to owner use. The three peaks are
getting smaller mean the user priority is getting
lower. Now it is summer holiday, I can get more
than 3,000 nodes for my condor jobs.
10We tried to use all the platforms to run our
condor jobs. How about the performance of
different platforms?
The LINUX 64-bit machines are not as fast as we
expected. Why?
11We checked the remote host condor jobs submitted
to in this test, 90 of LINUX 64-bit machines
were from ccl00.cse.nd.edu.
The condor jobs could go to the nodes out of
campus and the performance was just slightly
worse. It made us more confident to seriously
think about the Teragrid, although we have tried
Teragrid but still used the resources in campus.
Anyway it is a problem when the files to be
transferred are large, for example, more than
700M.
12High quality Alpha-helix ,Beta sheet and Side
chain, which enabled us to do the modeling and
get the backbone structure.
With icosahedral symmetry
13- Our problem/concern about Condor
- Operation the best thing for us is to submit the
condor jobs from our desktop, and let condor
itself to find resources, but now we need specify
where to go if using Teragrid. - File transfer in the case of large file
transfer, the network becomes bottleneck which
will easily overload the head node and crash it,
especially when the file goes outside of campus.
This is due to large amount of reading from the
only copy of large dataset. However this might be
circumvented by applying P2P client into the
condor because in our image processing 2D
alignment step, one image will be compared to all
the reference projections, those projections
might have been sent to neighboring computers to
run another condor job, therefore for this condor
job, the file could be transferred from
neighboring nodes. Based on this, the number of
reading from original copy will drop a lot, in
theory, might be just a few times. The file
transfer speed will also increase dramatically.
14Acknowledgment
Preston Smith David Braun Steve Wilson Pia
Mikeal Bruce L. Fuller
- Reference
- Jiang et.al Vol4392 February 2006/Nature 04487
- Jiang et.al Vol45128 February 2008/Nature 06665