Title: Bioinformatics at NASA or Yes Virginia, NASA does do biology
1Bioinformatics at NASAorYes Virginia, NASA does
do biology!
Maryland
- Michael New
- Astrobiology Discipline Scientist
2Bioinformatics at NASA?
- Bioinformatics is used at NASA in several ways
- Fundamental Space Biology How do organisms,
including humans, adapt to the space environment? - Planetary Protection What is the nature of the
community of micro-organisms living in
space-craft assembly areas and on space-craft? - Astrobiology What can the genomes of life on
Earth tell us about the origin, evolution,
distribution and future of life on Earth and the
potential for life elsewhere
3Fundamental Space Biology
- How are molecular signals, pathways, and products
in humans and model organisms (e.g., mice)
altered by exposure to microgravity and space
radiation factors? - How is drug metabolism affected by space related
effects? - Are there critical stages in development that are
affected by altered gravity? - Why virulence of pathogens appears to increase in
space?
4Small Sats On-board Expression Measurements
Bioinformatics
30 cm x 10 cm x 10 cm
How to make inferences?
5Making good inferences is the key
ExperimentalData
Analysis Algorithm
New Knowledge
Andrew Pohorille, Jeff Shrager and Steve
Racunas NASA Center for Astrobioinformatics,
Karl Schweighofer
6An example the Jnk Pathway
External Stimuli
TGF-?
TPA
Kinases
Jnk2
Jnk1
c-Jun
JunD
Transcription Factors
. . .
mRNA Transcripts
IL-11
p53
nur77
p19
7Expression studies are inconclusive
p value Probability that posterior of H, p(DH),
is just spurious (i.e., same posterior likely
with random D when H)
8Background knowledge makes a difference!
9Need a system for evaluating biological models
10Planetary Protection
- What organisms are present in and on spacecraft?
- How can we assess the bioburden of spacecraft?
- How can we ensure the no Terran life hitchhikes
to a clement spot on another planet? - How can we assess the safety of returned samples?
11Assessing crud
- What is the diversity of low-biomass samples
taken from a space-craft assembly clean room? - Comparing two new techniques AffymetrixsPhyloc
hip and 454 sequencing.
12Third Generation Phylochip
- Additional advancements
- Smaller feature size -gt no increase in chip cost.
- Smaller sample volumes decreased cost in
reagents. - Improved analysis
- More sophisticated fragmentation method
- Refined analysis software
- Improved validation approach.
Relatively inexpensive and suitable for repeated
assays, Less robust quantitation
13454 Sequencing The Sogin Survey Method
In a single run, 454 technology can generate
up to 200,000 independent sequence reads of 100
bases each. Comprehensively samples short
variable rRNA regions First report on deep
sea diversity estimates 10-100 times more species
than previously suspected (Sogin et al., PNAS
2006). A few species are common, vast majority
are rare This method easily adapted to
spacecraft bioburden inventory. Gives some
estimate of quantity as well as phylogeny
454 Inc
Method is expensive and requires large amounts of
DNA. More suitable for infrequent assays of
pooled samples.
14Family-level Comparisons
- Overall both methods showed high agreement of
detection at the family level, but only when data
from all temperature gradients was compiled.
65
31
22
454 V6 Pyrosequencing Families Detected 87
Detected exclusively on PhyloChip 22
G2 PhyloChip Families Detected 96 Detected
exclusively on PhyloChip 31
15Astrobiology Life in a Universal Context
- How does life begin and evolve?
- What do the rock record and genomes tell us?
- Does life exist elsewhere in the Universe?
- Life as we know it?
- Weird life?
- How can either be detected?
- What is the future for life on Earth and beyond?
16Three case studies
- Development of new tool to assess HGT.
- Peter Gogarten and Olga Zhaxybayeva
- Use of standard tools to look for independent
leaps to land. - Zoe Cardon, Louise Lewis, and Harry Frank
- Resurrecting ancient proteins.
- Steve Benner, et al.
17How can we assess the degree of HGT present on
the early Earth?
- Quartet is a smallest unit of phylogeneticinformat
ion - Each quartet can have three unrooted tree
topologies - Support for different quartet topologies can be
summarized for all gene families
18Why use embedded quartets?
- No assumption that all genes in a genome have the
same phylogenetic history. - The total number of quartets is much smaller than
number of tree topologies, which makes it
possible to evaluate all quartets. - Gene families present only in few analyzed
genomes can be included in the analyses - Phylogenetic signal can be divided into plurality
consensus and the conflicting signal. - Allows us to partition analyzed genomes according
to some scenario (e.g., grouping by ecology) and
retrieve gene families that support or conflict
it.
19Example Cyanobacteria their Genes
- Analyzed gene families in 11 sequenced
cyanobacterial genomes using the developed
quartet decomposition method - Cyanobacterial genomes reveal a complex
evolutionary history, which cannot be presented
by a single strictly bifurcating tree for all
genes or even most genes. - Across short phylogenetic distances all type of
genes appear to be equally affected by transfer.
Across large phylogenetic distances genes
encoding metabolic functions are more frequently
transferred, and genes in transcription and
translation are less frequently transferred
Olga Zhaxybayeva, J. Peter Gogarten, Robert L.
Charlebois, W. Ford Doolittle and R. Thane Papke
"Phylogenetic Analyses Of Cyanobacterial Genomes
Quantification Of Horizontal Gene Transfer
Events", Genome Research, 2006, 161099-1108.
20What traits were needed for leap to land?
Green Plants
5 Major Green Algal Classes (sensu Mattox and
Stewart, 1984--recent revision divides
Charophyceae into 6 classes)
Terrestrial green plants
Numerous independent habitat transitions provide
statistical power for detecting traits correlated
with successful leaps from water to land.
?
?
?
Chlorophyceae
?
?
?
?
Trebouxiophyceae
?
?
?
?
?
Ulvophyceae
?
?
Charophyceae
The famous leap to land
Embryophytes
N1
Prasinophyceae
?
?
?
N? leaps of eukaryotic green algae from
aquatic or marine habitats to land
20
21Bioinformatics used to
Infer evolutionary relationships among known
aquatic and recently isolated desert algae using
data from nucleotide sequences (large data sets,
multiple genes) to estimate diversity and
describe new species. Estimate the number of
transitions from aquatic to terrestrial habitats
(Bayesian methods). To date, we estimate at
least 40 evolutionarily independent transitions!
Test the correlation of source habitat type
with traits that occur in our desert and related
aquatic algae, using comparative statistical
methods that take into account evolutionary
relationships among taxa.
Lewis and Lewis 2005, Systematic Biology, 54
936-947 Gray et al. 2007, Plant Cell and
Environment, 301240-1255 Cardonet al. 2008,
Bioscience, 58114-122 Lewis, unpublished
22Moving from single cells to multicellular animals
- This seems hard to do from the perspective of
molecular biology - Change the goal of life to replicate cells as
fast as possible (what bacteria do) toreplicating
cells under control, and then not at all (what
you do) - The fossil record makes the transition seem
sudden (but the fossil record may be missing many
things) - We are not certain that the transition is not
driven by planetary change, such as the emergence
of abundant oxygen in the atmosphere
Understanding how this transition took place on
Earth helps NASA infer how likely it is to have
taken place elsewhere, a key part of the Drake
equation to estimate the likelihood of
intelligent life elsewhere in the cosmos.
23Since fossils are no help, turn to genomes
- Exhaustive matching supported models for protein
sequence evolution - New tools to score amino acid replacements
- Tools to extend the model that scores
replacements - Tools to exploit homoplasy, compensatory
covariation, other non-Markovian behaviors of in
the evolution if real proteins diverging under
functional constraints - Gonnet, G. H., Cohen, M. A., Benner, S. A (1992)
Exhaustive matching of the entire protein
sequence database. Science256, 1443-1445
Sequencing of Choanoflagellate provides outgroup,
an animal diverging just before multicellularity
emerges King, N. et al. (2008) The genome of the
choanoflagellateMonosigabrevicollis and the
origin of metazoans. Nature451, 783-66
Multicellularity emerges
What happened here in the genome?
24So what happened?
- Many things
- Steroid receptors emerged, together with
oxygen-dependent proteins - that make steroid hormones key at many
places in metazoan biology - Protein tyrosine phosphorylatingkinases emerged
from serine kinases - Protein tyrosine phosphatases emerged (from an
unknown source) - Kinase substrates emerged that were
phosphorylated on tyrosines - SH2 domains that bind to phosphortyrosine
emerged (unknown source)
And not just one example. Lots of them with
correlated evolution.
JAK
STAT
JAK is a two domain kinase. The domains are
duplicates of a single domain the duplication
occurred in this episode.
STAT is a family of substrates for JAK, also
arising by duplication at the same time as the
JAK domains duplicated.
25How do we know that the ancestral proteins were
doing phosphorylation, being phosphorylated etc.
at that time?
Bring the experimental method to bear on
historical hypotheses using biotech to resurrect
genes and proteins having the inferred ancestral
sequence, studying their behavior in the lab.
Consider the SH2 domains, which bind to
phosphotyrosine, a new function emerging together
with multicellularity. The SH2 domains are a
large family having various binding
specificities. Resurrection shows that the
ancestral proteins bind as well, and shows their
specificity. (Benner, et al., unpublished)
Binds (Gln or Tyr)-Asn-Tyr)
Binds (Ile or Val)-Asn-(Val or Pro))
outgroup
26Acknowledgements
- Andrew Pohorille (NASA ARC)
- Jeff Schrager (Stanford)
- Stephen Racunas (Stanford)
- Karl Schweighofer (SETI Inst)
- Catharine Conley (NASA ARC)
- Mitch Sogin (MBL)
- KasthuriVenkataswaran (JPL)
- Gary Andersen (LBL)
- J. Peter Gogarten (U Conn)
- Olga Zhaxybayeva (Dalhousie)
- Zoe Cardon (MBL)
- Louise Lewis (U Conn)
- Frank Lewis (U Conn)
- Steve Benner (FFAME)
- Jason Raymond (UC Merced)
- Rob Knight (CUB)
- Eric Gaucher (GA Tech)
27Questions? Comments? Brickbats?