Bioinformatics tools and techniques Into the heart of darkness

About This Presentation

Title:

Bioinformatics tools and techniques Into the heart of darkness

Description:

Application for saving notes, to-do lists, daily logs, and any other kind of ... Result is a list of additional proxy SNPs that have been obtained by LD expansion ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 19

Provided by: colmodu

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics tools and techniques Into the heart of darkness

1
Bioinformatics tools and techniquesInto the
heart of darkness

Elaine Kenny
Colm ODushlaine
15/11/07

2
Summary

Simple overviews of some of the tools and methods
used by EK and COD
TK notebook
get_hapmap_snps.pl retrieve HM genotype
information for a list of SNPs
GeneViewer.pl cross_ref.pl visualise e.g. SNPs
in the context of other genomic landmarks. Score
SNPs depending on how many of these landmarks
they overlap with
ld_expander.pl find SNPs in LD with SNPs of
interest, based on user-specified r2 and LD
window (distance between SNPs)
STATA
VIM command line text editor
Lab website

3
TK notebook

Application for saving notes, to-do lists, daily
logs, and any other kind of textual information
in a place where you can find it all again, and
where related information is easily found
Easy to edit and rapidly searchable
DEMO editing
DEMO search

4
get_hapmap_snps.pl

Simple script to read in a 1-column list of SNPs
and retrieve HapMap genotypes
Can select population and strand
DEMO
Retrieved data can be loaded into HaploView
DEMO

5
cross_ref_scored.pl

Score SNPs based on how many putatively
functional regions they overlap with
On a per gene / chromosome basis
Gene basis
Type perl cross_ref_scored.pl file_A file_B
file_C ...
where
file_A - 2-column file of SNPs (format
id, location)
file_B - 3-column file of EXONS (format
id/name, start, stop)
file_C ... - whatever you want, (format
id/name, start, stop)
i.e. other regions like CpGs,
TFBS, clusters. Any order.

6
cross_ref_scored.pl example output
Can then be merged with HapMap / Perlegen to
retrieve MAF data for SNPs
7
Merge cross_ref_scored data with HapMap/ Perlegen
data using merge_per_hap.pl

Type
perl merge_per_hap.pl perlegen.txt hapmap.txt
overlapped_region_scored.txt
Where
hapmap.txt 3-column file (format rsid,
ref_allele, ref_allele_freq),
perlegen.txt 3-column file (format rsid,
ref_allele, ref_allele_freq)

8
cross_ref.pl applied to WGA data

cross_ref.pl Scoring SNPs throughout genome
Data analysed on coding/non-coding basis
(coding)
perl cross_ref.pl Overlapped_regions_scored.WTCCC.
chr22.coding.txt 22 WTCCC_T2D_chr22_without_inferr
ed.forCrossRef WGA_databases/coding_non_synon_SNPs
_UCSC.clean3 WGA_databases/coding_synon_SNPs_UCSC
.clean2 WGA_databases/RefSeq_Genes_UCSC.byExon.u
niqid1 WGA_databases/Triplexes_may2006.bed2
WGA_databases/splice_site_SNPs_UCSC.clean2 gt
Overlapped_regions_scored.WTCCC.chr22.coding.log
(input-dependent, coding/non-coding dependent,
arbitrary)
(noncoding)
perl cross_ref.pl Overlapped_regions_scored.WTCCC.
chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inf
erred.forCrossRef WGA_databases/TFBS.chr221
WGA_databases/CpG_islands_UCSC.uniqid1
WGA_databases/Most_conserved_phastConsElements17wa
y_UCSC.clean1 WGA_databases/promoters_knowngene_h
g18.txt1 WGA_databases/sno_or_miRNA_UCSC.uniqid1
gt Overlapped_regions_scored.WTCCC.chr22.NONcoding
.log

9
cross_ref.pl

cross_ref.pl output
Load into STATA. If SNPs have e.g. association
p-values, calculate adjusted p-value (R. Anney)
as -log10P
cross_ref_score

10
GeneViewer.pl

GeneViewer.pl Visualise overlapping features
(e.g. exons, SNPs etc.) along e.g. your gene of
interest (html output)

11
ld_expander.pl

Find proxies (SNPs in LD) for a list of SNPs
User specifies the r2 and LD window
Currently configured to obtain proxies from HM
CEU
Result is a list of additional proxy SNPs that
have been obtained by LD expansion
DEMO
Note dont LD expand gt150000 SNPs, or HapMap
will ban you! COD has an alternative version
that uses local pre-computed pairwise LD SNP files

12
STATA

Extremely powerful and flexible
gt65k rows handled shock horror!
Can write scripts to automate tasks, e.g. read in
file, do analysis, save results
When use GUI to run some commands, the commands
are shown in the command window, so can save in a
do file
COD, EK and R. Anney strongly advocate this as a
platform for both file manipulation and
statistical analysis

13
STATA example using WTCCC data
Bipolar Disorder, Coronary Artery Disease,
Crohn's Disease, Hypertension, Rheumatoid
Arthritis, Type 1 Diabetes, Type 2 Diabetes
14
DATA FORMAT

3 folders
Basic
Each case collection against the pooled control
groups 58C and UKBS
Combined cases
Combining other case collections as controls
Combined controls
Combining phenotypically relevant case
collections
(e.g. RA/T1D, autoimmune )
Data are split by chromosome

15
Questions

How do I get all of the chromosome data for my
gene of interest into one file?
How do I search easily all of the SNP information
for my gene(s) of interest?
Create a .do file for all manipulations that
you want to carry out to the data
DEMO
Good starting resource http//www.ats.ucla.edu/st
at/stata/

16
VIM

Vi Improved. Mainly UNIX but cross-platform
text editor (available for Windows).
Full list of commands outside scope of this
demonstration
Very fast and efficient, esp. with search and
replace functions on large datasets
Regular expression pattern matching
DEMO
Integrates with Cygwin (www.cygwin.com very
useful UNIX emulator for windows)

17
Group website

Some useful stuff up there!
Please send information about current projects
etc. Good for our image as a group and minimal
effort required on your part
DEMO

18
Conclusions

Small summary of some things you can do
Slides and video demonstrations will be online
at http//www.medicine.tcd.ie/psychiatry/research
/neuropsychiatry/Protocols/
COD EK available for advice (Fridays
9-9.02am)
These things will help you in your work!!

Write a Comment

User Comments (0)