Bioinformatics tools and techniques Into the heart of darkness - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Bioinformatics tools and techniques Into the heart of darkness

Description:

Application for saving notes, to-do lists, daily logs, and any other kind of ... Result is a list of additional proxy SNPs that have been obtained by LD expansion ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 19
Provided by: colmodu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics tools and techniques Into the heart of darkness


1
Bioinformatics tools and techniquesInto the
heart of darkness
  • Elaine Kenny
  • Colm ODushlaine
  • 15/11/07

2
Summary
  • Simple overviews of some of the tools and methods
    used by EK and COD
  • TK notebook
  • get_hapmap_snps.pl retrieve HM genotype
    information for a list of SNPs
  • GeneViewer.pl cross_ref.pl visualise e.g. SNPs
    in the context of other genomic landmarks. Score
    SNPs depending on how many of these landmarks
    they overlap with
  • ld_expander.pl find SNPs in LD with SNPs of
    interest, based on user-specified r2 and LD
    window (distance between SNPs)
  • STATA
  • VIM command line text editor
  • Lab website

3
TK notebook
  • Application for saving notes, to-do lists, daily
    logs, and any other kind of textual information
    in a place where you can find it all again, and
    where related information is easily found
  • Easy to edit and rapidly searchable
  • DEMO editing
  • DEMO search

4
get_hapmap_snps.pl
  • Simple script to read in a 1-column list of SNPs
    and retrieve HapMap genotypes
  • Can select population and strand
  • DEMO
  • Retrieved data can be loaded into HaploView
  • DEMO

5
cross_ref_scored.pl
  • Score SNPs based on how many putatively
    functional regions they overlap with
  • On a per gene / chromosome basis
  • Gene basis
  • Type perl cross_ref_scored.pl file_A file_B
    file_C ...
  • where
  • file_A - 2-column file of SNPs (format
    id, location)
  • file_B - 3-column file of EXONS (format
    id/name, start, stop)
  • file_C ... - whatever you want, (format
    id/name, start, stop)
  • i.e. other regions like CpGs,
    TFBS, clusters. Any order.

6
cross_ref_scored.pl example output
Can then be merged with HapMap / Perlegen to
retrieve MAF data for SNPs
7
Merge cross_ref_scored data with HapMap/ Perlegen
data using merge_per_hap.pl
  • Type
  • perl merge_per_hap.pl perlegen.txt hapmap.txt
    overlapped_region_scored.txt
  • Where
  • hapmap.txt 3-column file (format rsid,
    ref_allele, ref_allele_freq),
  • perlegen.txt 3-column file (format rsid,
    ref_allele, ref_allele_freq)

8
cross_ref.pl applied to WGA data
  • cross_ref.pl Scoring SNPs throughout genome
  • Data analysed on coding/non-coding basis
  • (coding)
  • perl cross_ref.pl Overlapped_regions_scored.WTCCC.
    chr22.coding.txt 22 WTCCC_T2D_chr22_without_inferr
    ed.forCrossRef WGA_databases/coding_non_synon_SNPs
    _UCSC.clean3 WGA_databases/coding_synon_SNPs_UCSC
    .clean2 WGA_databases/RefSeq_Genes_UCSC.byExon.u
    niqid1 WGA_databases/Triplexes_may2006.bed2
    WGA_databases/splice_site_SNPs_UCSC.clean2 gt
    Overlapped_regions_scored.WTCCC.chr22.coding.log
  • (input-dependent, coding/non-coding dependent,
    arbitrary)
  • (noncoding)
  • perl cross_ref.pl Overlapped_regions_scored.WTCCC.
    chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inf
    erred.forCrossRef WGA_databases/TFBS.chr221
    WGA_databases/CpG_islands_UCSC.uniqid1
    WGA_databases/Most_conserved_phastConsElements17wa
    y_UCSC.clean1 WGA_databases/promoters_knowngene_h
    g18.txt1 WGA_databases/sno_or_miRNA_UCSC.uniqid1
    gt Overlapped_regions_scored.WTCCC.chr22.NONcoding
    .log

9
cross_ref.pl
  • cross_ref.pl output
  • Load into STATA. If SNPs have e.g. association
    p-values, calculate adjusted p-value (R. Anney)
    as -log10P
    cross_ref_score

10
GeneViewer.pl
  • GeneViewer.pl Visualise overlapping features
    (e.g. exons, SNPs etc.) along e.g. your gene of
    interest (html output)

11
ld_expander.pl
  • Find proxies (SNPs in LD) for a list of SNPs
  • User specifies the r2 and LD window
  • Currently configured to obtain proxies from HM
    CEU
  • Result is a list of additional proxy SNPs that
    have been obtained by LD expansion
  • DEMO
  • Note dont LD expand gt150000 SNPs, or HapMap
    will ban you! COD has an alternative version
    that uses local pre-computed pairwise LD SNP files

12
STATA
  • Extremely powerful and flexible
  • gt65k rows handled shock horror!
  • Can write scripts to automate tasks, e.g. read in
    file, do analysis, save results
  • When use GUI to run some commands, the commands
    are shown in the command window, so can save in a
    do file
  • COD, EK and R. Anney strongly advocate this as a
    platform for both file manipulation and
    statistical analysis

13
STATA example using WTCCC data
Bipolar Disorder, Coronary Artery Disease,
Crohn's Disease, Hypertension, Rheumatoid
Arthritis, Type 1 Diabetes, Type 2 Diabetes
14
DATA FORMAT
  • 3 folders
  • Basic
  • Each case collection against the pooled control
    groups 58C and UKBS
  • Combined cases
  • Combining other case collections as controls
  • Combined controls
  • Combining phenotypically relevant case
    collections
  • (e.g. RA/T1D, autoimmune )
  • Data are split by chromosome

15
Questions
  • How do I get all of the chromosome data for my
    gene of interest into one file?
  • How do I search easily all of the SNP information
    for my gene(s) of interest?
  • Create a .do file for all manipulations that
    you want to carry out to the data
  • DEMO
  • Good starting resource http//www.ats.ucla.edu/st
    at/stata/

16
VIM
  • Vi Improved. Mainly UNIX but cross-platform
    text editor (available for Windows).
  • Full list of commands outside scope of this
    demonstration
  • Very fast and efficient, esp. with search and
    replace functions on large datasets
  • Regular expression pattern matching
  • DEMO
  • Integrates with Cygwin (www.cygwin.com very
    useful UNIX emulator for windows)

17
Group website
  • Some useful stuff up there!
  • Please send information about current projects
    etc. Good for our image as a group and minimal
    effort required on your part
  • DEMO

18
Conclusions
  • Small summary of some things you can do
  • Slides and video demonstrations will be online
    at http//www.medicine.tcd.ie/psychiatry/research
    /neuropsychiatry/Protocols/
  • COD EK available for advice (Fridays
    9-9.02am)
  • These things will help you in your work!!
Write a Comment
User Comments (0)
About PowerShow.com