Title: Computational methods for genetic mapping of quantitative traits
1Computational methods for genetic mapping of
quantitative traits
- Kajsa Ljungberg
- February 17, 2006
2Kajsa Ljungberg
- M.Sc. Biotechnology Engineering (civilingenjör
Molekylär bioteknik), Uppsala. - Ph.D. Scientific Computing, Uppsala.
- Various computational projects at e.g.
AstraZeneca and UC San Francisco
3Overview
- Background of my Ph.D. project Quantitative
traits, genetics, experiments. - My work Mathematical problem, methods
(principles only), results. - Two related problems in computational biology.
- Lessons learned in industry and academia.
4Heritable traits, examples
- Earlobes Loose or attached
- Blood group A, B, AB or 0
5Qualitative traits
- Can be divided into classes in a natural way.
- Often governed by a single gene, and in such
cases the genetic basis is relatively easy to
study - Examples Earlobes, blood group, Huntingtons
disease.
6Heritable traits, more examples
- Height
- Hair colour
- Blood pressure
7Quantitative traits
- Vary continuously on some scale.
- Often governed by multiple interacting genes and
the environment. - Examples Blood pressure, cholesterol levels,
growth rate in e.g. farm animals. - Most traits of economic or medical importance are
quantitative.
8QTL
- Abbreviation of Quantitative Trait Locus
- Region in the genome where one or several genes
influencing a quantitative trait are located. - Can be found using QTL mapping, and is an
important step in the process of finding the
individual genes.
9HUGO and QTL mapping
- HUGO project
- Sequence ? Gene ? Function/Trait
- QTL mapping
- Function/Trait ? QTL ( ? Gene)
10Experiments in QTL mapping
0000000000000000000000000000000000
1111111111111111111111111111111111
0000000000000000000000000000000000
0000000000000000011111111111111111
0000000000000000011110000011111111
0000000000000000000011111111111110
0000000000000000000000001111111111
0000000000000000000000000001111111
0000000000000000011111111100000111
11Fit model A(x)by (bb0 b1). b0 denotes the
mean weight and b1 the effect of the genotype.
Experiment? model
1111000001111111111 0001111111111111000 1111111111
110000000 0000000111111111111 0000000000111111111
1111111111111000111 ...
11 11 11 10 10 11 .. A(x)
1672g 945g 213g 1212g 418g 744g .. y
x
12If 1 QTL only
Residual norm
x
The genome, all chromosomes lined up
Model
Experiment data A(x), y
Least-squares problem
Residual norm
13If 1 QTL only, contd.
Residual norm
x
The genome, all chromsomes lined up
The genome position with the smallest residual
norm (best model fit) indicates the most likely
position of the QTL.
14If 2 QTL
Searching for two QTL corresponds to a
two-dimensional optimization problem.
15Computational problem
Searching for n QTL corresponds to
an n-dimensional optimization problem, where the
objective function is the residual norm of a
least-squares problem.
16My project
- Minimize f(x), i.e. find the genome positions
giving the best model fit, faster. - Two main strategies
- 1) Speed up the computation of the residual norm.
Linear algebra, updated QR factorizations... - 2) Speed up the multidimensional search for the
optimum. DIRECT algorithm
17Results
- 2-3 orders of magnitude speed-up for typical
problems, even more for complicated models.
Computational times measured in minutes instead
of days. - This allows for more thourough data exploration
and model testing.
18(No Transcript)
19- PSEUDOMARKER 2.02
- Hao Wu1, Saunak Sen2, Kajsa Ljungberg3, Karl W.
Broman4, Gary A. Churchill1 - 1The Jackson Laboratory, Bar Harbor, ME
- 2Department of Epidemiology and Biostatistics,
University of California San Francisco, San
Francisco, CA - 3Department of Information Technology, Division
of Scientific Computing, Uppsala University,
Uppsala, Sweden. - 4Department of Biostatistics, Johns Hopkins
University, Baltimore, MD - http//www.jax.org/staff/churchill/labsite/softwar
e/pseudomarker/
20Related problems, 1
- Gene expression levels from microarray
experiments are also quantitative traits. - QTL analysis on microarray data can help
distinguish between covariation and causation. - Hot topic!
21Related problems, 2
- QTL analysis in human populations (different
statistical issues). - Examples at AstraZeneca
- 1) Procardis risk factors heart disease.
- 2) Identify QTL related to adverse side
effects, e.g liver problems (Exanta).
22Lessons learned
- (as a computational expert among
experimentalists) - The importance of visualization
- Differences in language
- Real data is a pain
- People use what they know
- ? Future projects?
23 24Thanks
- Sverker Holmgren
- Örjan Carlborg
- Kateryna Mishchenko
- Mahen Jayawardena
- Leif Andersson
- Martina Hägglund
- Forskarskolan i Matematik och Beräkningsvetenskap
- Linné-centrum
25Paper 1
Standard method
Genetic info
Experiment data
Computation
Constant info
Prediction
Genetic info
Computa
-tion
Constant info
Prediction
26Paper 1 (Ljungberg, Holmgren, Carlborg)
Model
Experiment data
Computation
Prediction Real value Difference
27Paper 2 (Ljungberg, Holmgren, Carlborg)
Difference between prediction real value
The genome, all chromosomes lined up
Search carefully in regions with good values, but
only sparsely in regions with bad values.
28Paper 3 (Ljungberg, Mishchenko, Holmgren)
Difference between prediction real value
The genome
When a region with good values has been found,
use a special algorithm which is efficient in
finding the bottom of the closest valley.
29Paper 4 (Ljungberg)
100, 010, 000, 111 11 00 01 11
2
Model
Experiment data
Computation
Prediction
30Paper 5 (Jayawardena, Holmgren, Ljungberg)
Difference between prediction real value
Model
Model
Model
Data
Computation
Computation
Data
Computation
Data
Perform several computations simultaneously in an
organized and efficient manner.