R Packages for Genome-Wide Association Studies - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

R Packages for Genome-Wide Association Studies

Description:

Run s on a wide variety of UNIX platforms, Windows and MacOS (interactive or batch mode) ... convert.snp.ped() from Linkage, Merlin, Mach, and similar files ... – PowerPoint PPT presentation

Number of Views:1062
Avg rating:3.0/5.0
Slides: 19
Provided by: Qunyua6
Category:

less

Transcript and Presenter's Notes

Title: R Packages for Genome-Wide Association Studies


1
R Packages for Genome-Wide Association Studies
  • Qunyuan Zhang
  • Division of Statistical Genomics
  • Statistical Genetics Forum
  • March 10,2008

2
What is R ?
  • R is a free software environment for statistical
    computing and graphics.
  • Run s on a wide variety of UNIX platforms,
    Windows and MacOS (interactive or batch mode)
  • Free and open source, can be downloaded from
    cran.r-project.org
  • Wide range of packages (base contributed),
    novel methods available
  • Concise grammar good structure (function, data
    object, methods and class)
  • Help from manuals and email group
  • Slow, time and memory consuming (can be overcome
    by parallel computation, and/or integration with
    C)
  • Popular, used by 7080 statisticians

3
R Task Viewshttp//cran.r-project.org/web/views/
4
Statistical Genetics Packages in
Rhttp//cran.r-project.org/web/views/Genetics.htm
l
  • Population Genetics genetics (basic), Geneland
    (spatial structures of genetic data), rmetasim
    (population genetics simulations), hapsim
    (simulation), popgen (clustering SNP genotype
    data and SNP simulation), hierfstat (hierarchical
    F-statistics of genetic data), hwde (modeling
    genotypic disequilibria), Biodem
    (biodemographical analysis), kinship (pedigree
    analysis), adegenet (population structure), ape
    apTreeshape (Phylogenetic and evolution
    analyses), ouch (Ornstein-Uhlenbeck models),
    PHYLOGR (simulation and GLS model), stepwise
    (recombination breakpoints)
  • Linkage and Association gap (both population
    and family data, sample size calculations,
    probability of familial disease aggregation,
    kinship calculation, linkage and association
    analyses, haplotype frequencies) tdthap (TDT for
    haplotypes, powerpkg (power analyses for the
    affected sib pair and the TDT design),hapassoc
    (likelihood inference of trait associations with
    haplotypes in GLMs), haplo.ccs (haplotype and
    covariate relative risks in case-control data by
    weighted logistic regression), haplo.stats
    (haplotype analysis for unrelated subjects),
    tdthap (haplotype transmission/disequilibrium
    tests), ldDesign (experiment design for
    association and LD studies), LDheatmap (heatmap
    of pairwise LD),. mapLD (LD and haplotype
    blocks), pbatR (R version of PBAT), GenABEL
    SNPassoc for GWAS
  • QTL mapping for the data from experimental
    crosses bqtl (inbred crosses and recombinant
    inbred lines), qtl (genome-wide scans),
    qtlDesign (designing QTL experiments power
    computations), qtlbim (Bayesian Interval QTL
    Mapping)
  • Sequence Array Data Processing seqinr,
    BioConductor packages

5
GenABELAulchenko Y.S., Ripke S., Isaacs A., van
Duijn C.M. GenABEL an R package for genome-wide
association analysis. Bioinformatics. 2007,
23(10)1294-6.
  • GenABEL genome-wide SNP association analysis
  • a package for genome-wide association analysis
    between quantitative or binary traits and
    single-nucleotides polymorphisms (SNPs).
  • Version 1.3-5
  • Depends R ( 2.4.0), methods, genetics,
    haplo.stats, qvalue, MASS
  • Date 2008-02-17
  • Author Yurii Aulchenko, with contributions from
    Maksim Struchalin, Stephan Ripke and Toby Johnson
  • Maintainer Yurii Aulchenko lti.aoultchenko at
    erasmusmc.nlgt
  • License GPL ( 2)
  • In views Genetics
  • CRAN checks GenABEL results

6
GenABEL Data Objects
phdata phenotypic data (data frame)
gtdata genotypic data (snp.data-class)
  • gwaa.data-class
  • 2-bit storage
  • 0 00
  • 1 01
  • 2 10
  • 11
  • Save 75

load.gwaa.data(phenofile "pheno.dat", genofile
"geno.raw)
convert.snp.text() from text file (GenABEL
default format) convert.snp.ped() from Linkage,
Merlin, Mach, and similar files convert.snp.mach()
from Mach format convert.snp.tped() from PLINK
TPED format convert.snp.illumina() from
Illumina/Affymetrix-like format
7
GenABEL Data Manipulation
  • snp.subset() subset data by snp names or by QC
    criteria
  • add.phdata() merge extra phenotypic data to the
    gwaa.data-class.
  • ztransform() standard normalization of
    phenotypes
  • rntransform() rank-normalization of phenotypes
  • npsubtreated() non-parametric adjustment of
    phenotypes for medicated subjects

8
GenABEL QC Summarization
  • summary.snp.data() summary of snp data (Number
    of observed genotypes, call rate, allelic
    frequency, genotypic distribution, P-value of HWE
    test
  • check.trait() summary of phenotypic data and
    outlier check based on a specified p/FDR cut-off
  • check.marker() SNP selection based on call rate,
    allele frequency and deviation from HWE
  • HWE.show() showing HWE tables, Chi2 and exact
    HWE P-values
  • perid.summary() call rate and heterozygosity per
    person
  • ibs() matrix of average IBS for a group of
    people a given set of SNPs
  • hom() average homozygosity (inbreeding) for a
    set of people, across multiple markers

9
GenABEL SNP Association Scans
  • scan.glm() snp association test using GLM in R
    library
  • scan.glm((yx1x2CRSNP", family
    gaussian(), data, snpsubset, idsubset)
  • scan.glm((yx1x2CRSNP", family binomial
    (), data, snpsubset, idsubset)
  • scan.glm.2D() 2-snp interaction scan
  • Fast Scan (call C language)
  • ccfast() case-control association analysis by
    computing chi-square test from 2x2 (allelic) or
    2x3 (genotypic) tables
  • emp.ccfast() Genome-wide significance
    (permutation) for ccfast() scan
  • qtscore() association test (GLM) for a trait
    (quantitative or categorical)
  • emp.qtscore() Genome-wide significance
    (permutation) for qscaore() scan
  • mmscore() score test for association between a
    trait and genetic polymorphism, in samples of
    related individuals (needs stratification
    variable, scores are computed within strata and
    then added up)
  • egscore() association test, adjusted for
    possible stratification by principal components
    of genomic kinship matrix(snp correlation matrix)

10
GenABEL Haplotype Association Scans
  • scan.haplo() haplotype association test using
    GLM in R library
  • scan.haplo.2D() 2-haplotype interaction scan
  • (haplo.stats package required)
  • Sliding window strategy
  • Posterior prob. of Haplotypes via EM algorithm
  • GLM-based score test for haplotype-trait
    association (Schaid DJ, Rowland CM, Tines DE,
    Jacobson RM, Poland GA. 2002. Score tests for
    association of traits with haplotypes when
    linkage phase is ambiguous Am J Hum Genet 70
    425-434. )

11
GenABEL GWAS results from scan.glm,
scan.haplo, ccfast, qtscore, emp.ccfast,emp.qtscor
e
  • scan.gwaa-class
  • Names snpnames list of names of SNPs tested
  • P1df p-values of 1-d.f. (additive or allelic)
    test for association
  • P2df p-values of 2-d.f. (genotypic) test for
    association
  • Pc1df p-values from the 1-d.f. test for
    association between SNP and trait the statistics
    is corrected for possible inflation
  • effB effect of the B allele in allelic test
  • effAB effect of the AB genotype in genotypic
    test
  • effBB effect of the BB genotype in genotypic
    test
  • Map list of map positions of the SNPs
  • Chromosome list of chromosomes the SNPs belong
    to
  • Idnames list of subjects used in analysis
  • Lambda inflation factor estimate, as computed
    using lower portion (say, 90) of the
    distribution, and standard error of the estimate
  • Formula formula/function used to compute
    p-values
  • Family family of the link function / nature of
    the test

12
GenABEL Table Graphic Functions
  • descriptives.marker() table of marker info.
  • descriptives.trait() table of trait info.
  • descriptives.scan() table of scan results
  • plot.scan.gwaa() plot of scan results
  • plot.check.marker() plot of marker data (QC
    etc.)

13
GenABELComputer Efficiency
2000 subjects x 500K chip Memory 3.2 G Loading
time 4 Min. SNP summary 1 Min. Call ccfast
0.5 Min. Call qtscore 2 Min. Total lt 10
Min. Permutation test N10,000 73 120 hrs, 35
days
Intel Xeon 2.8GHz processor,SuSE Linux 9.2, R
2.4.1
14
SNPassocAn R package to perform whole genome
association studies, Juan R. González 1, et al.
Bioinformatics, 2007 23(5)654-655
  • SNPassoc SNPs-based whole genome association
    studies
  • This package carries out most common analysis
    when performing whole genome association studies.
    These analyses include descriptive statistics and
    exploratory analysis of missing values,
    calculation of Hardy-Weinberg equilibrium,
    analysis of association based on generalized
    linear models (either for quantitative or binary
    traits), and analysis of multiple SNPs (haplotype
    and epistasis analysis). Permutation test and
    related tests (sum statistic and truncated
    product) are also implemented.
  • Version1.4-9
  • DependsR ( 2.4.0), haplo.stats, survival,
    mvtnorm
  • Date2007-Oct-16
  • AuthorJuan R González, Lluís Armengol, Elisabet
    Guinó, Xavier Solé, and Víctor MorenoMaintainerJu
    an R González ltjrgonzalez at imim.esgt
  • LicenseGPL version 2 or newerURLhttp//www.r-pro
    ject.org and http//davinci.crg.es/estivill_lab/sn
    passoc
  • In viewsGenetics
  • CRAN checksSNPassoc results

15
SNPassoc Data Summary
  • setupSNP(datasnp-pheno.table, infomap.table,
  • colSNPs, sep "/", ...)
  • summary()
  • allele frequencies
  • percentage of missing values
  • HWE test

16
SNPassoc Association Tests
  • WGassociation(yx1x2, data, model
    (codominant, dominant, recessive, overdominant,
    log-additive or all),quantitative , level
    0.95)
  • scanWGassociation() only p values
  • association() only for selected snps, can do
    stratified, GxE interaction analyses
  • Results
  • Summary a summary table by genes/chromosomes
  • Wgstats detailed output(case-control numbers,
    percentages, odds ratios/ mean differences, 95
    confidence intervals, P-value for the likelihood
    ratio test of association, and AIC, etc.)
  • Pvalues a table of p-values for each genetic
    model for each SNP
  • Plot p values in the -log scale for
    plot.Wgassociation()
  • Labels returns the names of the SNPs analyzed

17
SNPassoc Multiple-SNP Analysis
  • SNPSNP Interaction
  • interactionPval()
  • epistasis analysis between all pairs of SNPs (and
    covariates).
  • Haplotype Analysis
  • haplo.glm() using the R package haplo.stats
  • association analysis of haplotypes with a
    response via GLM
  • haplo.interaction()
  • interactions between haplotypes (and covariates)

18
SNPassoc Computer Efficiency
  • 1000 subjects X 3000 SNPs
  • 5 min. import data
  • 40 min. setupSNP()
  • 30 min. scanWGassociation() only p values
    (including permutation test)
  • Memory usage 750 MB
Write a Comment
User Comments (0)
About PowerShow.com