Title: A Systematic approach to the LargeScale Analysis of GenotypePhenotype correlations
1A Systematic approach to the Large-Scale Analysis
of Genotype-Phenotype correlations
Paul Fisher Dr. Robert Stevens Prof. Andrew Brass
2Genotype
- The entire genetic identity of an individual that
does not show any outward characteristics, e.g.
Genes, mutations
Genes
DNA
Mutations
ACTGCACTGACTGTACGTATATCT ACTGCACTGTGTGTACGTATATCT
3Phenotype
- (harder to characterise)
- The observable expression of genes producing
notable characteristics in an individual, e.g.
Hair or eye colour, body mass, resistance to
disease
vs.
Brown
White and Brown
4Genotype to Phenotype
5Current Methods
Genotype
Phenotype
200
?
What processes to investigate?
6Phenotype
Genotype
200
?
Metabolic pathways
Phenotypic response investigated using microarray
in form of expressed genes or evidence provided
through QTL mapping
Genes captured in microarray experiment and
present in QTL (Quantitative Trait Loci ) region
Microarray QTL
7Issues with current approaches
8Huge amounts of data
QTL region on chromosome
Microarray
1000 Genes
200 Genes
How do I look at ALL the genes systematically?
9Hypothesis-Driven Analyses
200 QTL genes
Pick the genes involved in immunological process
Case African Sleeping sickness - parasitic
infection - Known immune response
40 QTL genes
Pick the genes that I am most familiar with
2 QTL genes
- Result African Sleeping sickness
- Immune response
- Cholesterol control
- Cell death
Biased view
10Manual Methods of data analysis
No explicit methods
Tedious and repetitive
Human error
Navigating through hyperlinks
11Implicit methods
12Issues with current approaches
- Scale of analysis task
- User bias and premature filtering
- Hypothesis-Driven approach to data analysis
- Constant flux of data - problems with
re-analysis of data - Implicit methodologies (hyper-linking through web
pages) - Error proliferation from any of the listed issues
- Solution Automate through workflows
13The Two Ws
- Web Services
- Technology and standard for exposing code /
database with an means that can be consumed by a
third party remotely - Describes how to interact with it
- Workflows
- General technique for describing and executing a
process - Describes what you want to do
14Hypothesis
- Utilising the capabilities of workflows and the
pathway-driven approach, we are able to provide a
more - - systematic
- - efficient
- - scalable
- - un-biased
- - unambiguous
-
- the benefit will be that new biology results
will be derived, increasing community knowledge
of genotype and phenotype interactions.
15QTL mapping study
Microarray gene expression study
Statistical analysis
Identify genes in QTL regions
Identify differentially expressed genes
Genomic Resource
Annotate genes with biological pathways
Annotate genes with biological pathways
Pathway Resource
Workflow methods
Select common biological pathways
Hypothesis generation and verification
Wet Lab
Literature
Manual methods
16Replicated original chain of data analysis
17Trypanosomiasis in Africa
Steve Kemp
Andy Brass
many Others
http//www.genomics.liv.ac.uk/tryps/trypsindex.htm
l
18Preliminary Results
- A strong candidate gene was found for
Trypanosomiasis resistance DAXX - Daxx not found using manual investigation methods
- The gene was identified from analysis of
biological pathway information - Sequencing of the Daxx gene in Wet Lab showed
mutations that changed the structure of the
protein - Mutation was published in scientific literature,
noting its effect on the binding of Daxx protein
to another protein other protein controls one
of the phenotypes of Trypanosomiasis resistance - FOUND NEW BIOLOGY !!!!
19Recycling, Reuse, Repurposing
Now the e-Science!
- Trypanosomiasis mouse workflow reused without
change in Trichuris muris infection in mice - Identified biological pathways involved in sex
dependence - Previous manual two year study of candidate genes
had failed to do this.
- Social networking for scientists
Workflows now being run over Colitis/
Inflammatory Bowel Disease in Mice (without
change)
20What next?
- More use cases??
- Can be done, but not likely for my project
- Text Mining !!!
- Aid biologists in identifying novel links between
pathways - Link pathways to phenotype through literature
21To Sum Up .
- Need for Genotype-Phenotype correlations with
respect to disease control - High-throughput data can provide links between
Genotype and Phenotype - Highlighted issues with manually conducted in
silico experiments - Improved the methods of current microarray and
QTL based investigations through systematic
nature - Increased reproducibility of our methods
- - workflows stored in XML based schema
- - explicit declaration of services, parameters,
and methods of data analysis - Shown workflows are capable of deriving new
biologically significant results - African Trypanosomiasis in the mouse
- Infection of mice with Trichuris muris
- The workflows require expansion to accommodate
new analysis techniques text mining
22Many thanks to
including Joanne Pennock, EPSRC, OMII, myGrid,
and lots more people