Title: The PGA: PhysGen Bioinformatics Component
1The PGA PhysGenBioinformatics Component
What are the genetic components of given
phenotypic traits?
Michael A. Thomas, Ph.D. Peter J. Tonellato,
Ph.D. Bioinformatics Research Center Medical
College of Wisconsin
2Main components of the PhysGen ProgramLinking
physiology to the genome
- Genomics Create biologically interesting and
genomically engineered strains - Phenotypes Collect extensive physiological
measurements on all strains - Research Service Biotechnology development in
support of genotyping service. - Bioinformatics Quality control, data
processing, data management and release, analysis
tools.
3Bioinformatics Component Overview
- Data processing and QC
- Data warehousing
- Online access
- Analytical tool development
4Bioinformatics Database organization
Data from lab
(Quarterly release)
Eurus Public
Tarzan Tool Data Testing
Dolphin Tool Development
New tools
Mirror sites
Data frozen
5Bioinformatics 1. Date processing QC
Data Checking
1. Check phenotype name 2. Check no
duplicate 3. Check data formatting Date
Case Number of digit 4. Check data domain
Atmosphere Condition HPX / NMX Diet
Condition HS / LS Protocol name 7
protocols Gender Female / male 5. Check
for outliers Criteria Mean /- 3 STDV
LAB
database
Y
N
6Bioinformatics 2. Data release
- Quarterly
- In conjunction with
- major web site improvements
- release of analytical tools
7Bioinformatics 3. Online access and analysis
http//pga.mcw.edu
8Bioinformatics 4. Analytical tool development
- Q Are there differences between means for a
given phenotype among various rat treatments? - Differences among inbred strains can be
attributed to any number of genetic differences - Differences among consomic or congenic strains
can be attributed to the inserted chromosome (or
chromosomal region)
9The Physgen data - renal
10Bioinformatics 4. Analytical tool development
y11 y12 . . . y1g y21 y22 y2g y31 y32 y3
g . . . . . . s1 s2 sg
H0 ?1 ?2 ?3 No differences between strains
11How do we find the answers? by Analysis of
Variance (ANOVA)
- Test if all the means are equal using ANOVA
- If not, test by pairs (t-test or ANOVA)
12If data are normal Conventional ANOVA
- Test of H0 ?1 ?2 ?g
- Construct a statistic F
- Find the distribution of the statistic under H0
The F -distribution - Compare the calculated F with the critical value,
F?. - If F gt F?, then H0 is rejected
Variances among and within groups are compared
F?
13Test of equal variance Levenes Test
- H0 ?21 ?22 ?2n
- (The dataset is homoscedastic)
- Calculate W
- Compare with F table critical value
- If W gt F?, we reject H0
å
g
2
-
-
)
(
)
(
Z
Z
N
g
N
..
j
.
j
1
j
W
å
å
g
N
-
-
2
)
(
)
1
(
Z
Z
g
j
j
.
ij
1
1
j
i
- If the data passes Levenes Test, a conventional
ANOVA will be undertaken. - If the data fails Levenes Test, the
non-parametric ANOVA will be suggested
14What if equal variance does not hold ?
- Solution 1 Non-parametric ANOVA (not sensitive
to unequal variances) - Solution 2 Dynamic ANOVA (requires normality)
15ANOVA
- Conventional ANOVA Powerful. Requires normality
- and equal variances.
- Non-parametric ANOVA Less powerful. Normality
not - required. Much less sensitive
- to unequal variances.
- Dynamic ANOVA Requires normality but not
- equal variances. Requires
- much more computation time.
16If data are not normal Non-Parametric
ANOVAKruskal-Wallis Test
- Test for equality of group means without
assuming normality - Create ranks for each value in the set
- Calculate H statistic and compare with the
h-distribution table (asymptotic to the X2
distribution)
17Current implementation
Data
Levenes Test
N
Equal ?2?
Y
Conventional ANOVA
Non-Par ANOVA
18Q Are there differences between means for a
given phenotype among various rat treatments?
- The user can answer this question for the
particular protocol and phenotypes of interest - The built-in tools from the PhysGen web site help
the user analyze the data with the appropriate
statistical tools
19The user selects the data by phenotype between
different rat treatment categories
Strain BN SS FHH .. .. SS-BN-16
Atmosphere Condition Hypoxia Normoxia
Gender Male Female
Diet Condition High salt Low Salt
Category
20The user selects the protocol
21The user selects the data
22Understanding PGA Data
Independent Variables
Phenotype
23Understanding PGA Data
405.2 399.9 402.6 . . . Mean 401.0
Group Means
Q Are there differences between means for a
given phenotype among various rat treatments?
Values in the group are used to determine the
group mean
24I. Levenes test Passed. Conventional ANOVA
A statistically significant difference is
observed among the group
25I. Continued. Comparison between pairs
No statistically significant difference is
observed between the two
26II. Levenes test Failed. Non-parametric ANOVA
A statistically significant difference is
observed.
27III. Levenes test Passed
No statistically significant difference is
observed among the groups.
28IV. Levenes test Passed
A significant difference is observed among the
groups.
29Next steps
Currently, we ask Q Are there differences
between means for a given phenotype among various
rat treatments? A (via conventional or
non-parametric ANOVA) This will be improved by
providing more accurate and meaningful pair-wise
comparisons using a user-specific reference
strain(s). Soon, well ask Q For a given pair
(or group) of rat treatments, which phenotypes
best explain the differences? A via a
fuzzy/neural net approach
30- Potential projects for bioinformatics students
- Combining physiological data with microarray data
produced from the same experiments - New analytical tools /or approaches
- New ways to manage present the data