Keith Satterley, Bioinformatics Division, WEHI - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Keith Satterley, Bioinformatics Division, WEHI

Description:

GABOS version 1 is at http://unix28.alpha.wehi.edu.au/bioinformatics/gabos ... home/users/lab0605/Bioinformatics/databases/genomes/UCSC ... Bioinformatics Division ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 34
Provided by: keithsa
Category:

less

Transcript and Presenter's Notes

Title: Keith Satterley, Bioinformatics Division, WEHI


1
Bioinformatics Seminar 13/11/07
  • Keith Satterley, Bioinformatics Division, WEHI

2
Summary
  • GABOS Get A Bit Of Sequence.
  • GAFEP Get A Few Exon Primers.
  • Functions and Facilities
  • WEB interface.
  • Command Line Interface.
  • Data Management
  • Genome data.
  • Result data.
  • Tools Used
  • Perl
  • HTML
  • PHP
  • Javascript
  • Availability.
  • Future Work.

3
  • GABOS version 1 is at http//unix28.alpha.wehi.edu
    .au/bioinformatics/gabos
  • WEB Page version 1 limitations
  • Exons, DNA, Transcripts available.
  • Genomes are a hard coded list of latest version
    data only.
  • Annotation File is a hard coded list covering all
    genomes.
  • Chromosome selection was a list of the common
    chromosome filenames.
  • Data Files Availability
  • All data has been downloaded from UCSCs download
    site. It is described at
  • http//hgdownload.cse.ucsc.edu/downloads.html and
    can be ftp downloaded from
  • ftp//hgdownload.cse.ucsc.edu/goldenPath/
  • Genome data is stored on the WEHI Disk Server
    accessible from
  • WEHI Unix computers
  • /home/users/lab0605/Bioinformatics/databases/genom
    es/UCSC
  • WEHI Windows computers map a network drive to
  • \\unix33\bioinformatics
  • WEHI Macintoshes Connect to Server at
  • smb//unix33/Bioinformatics

4
  • Genomes at WEHI
  • Jul 24 0105 canFam -gt canFam2
  • Jul 23 1516 canFam1
  • Jul 23 1516 canFam2
  • Jul 22 0117 danRer -gt danRer4
  • Jul 23 1033 danRer3
  • Jul 23 1520 danRer4
  • Nov 6 0110 dm -gt dm3
  • Nov 5 1637 dm3
  • Jul 22 0117 galGal -gt galGal3
  • Jul 20 1727 galGal2
  • Jul 23 1011 galGal3
  • Jul 22 0117 hg -gt hg18
  • Jul 23 1029 hg17
  • Jul 23 1029 hg18
  • Aug 24 0110 mm -gt mm9
  • Jul 23 1030 mm7
  • Aug 23 1450 mm8
  • Aug 23 1812 mm9

5
  • Chromosome data Files
  • Aug 23 1409 chr9_random.fa
  • Aug 23 1409 chrM.fa
  • Aug 23 1409 chrUn_random.fa
  • Aug 23 1414 chrX.fa
  • Aug 23 1414 chrX_random.fa
  • Aug 23 1414 chrY.fa
  • Aug 23 1416 chrY_random.fa
  • Jul 23 1611 chr9.fa
  • Jul 23 1611 chrM.fa
  • Jul 23 1613 chrNA_random.fa
  • Jul 23 1614 chrUn_random.fa
  • Jul 23 1614 md5sum.txt
  • Jul 23 1614 README.txt
  • Jul 23 1616 scaffoldNA_random.fa
  • Jul 23 1616 scaffoldUn_random.fa
  • Jun 22 0405 chr2L.fa
  • Annotation Data Files

6
  • Data Management
  • Amount of data
  • How many genomes local? currently 10 96GB.
  • 19 Vertebrates available 9 sequence only.
  • 15 Insects, 5 Nematodes 4 others available.
  • How many versions of each? mm7, mm8, mm9?
  • 2 or 3 of each?
  • Chromosome data 10-50 per genome.
  • Annotation data 5-10 per genome version
  • RefSeq, genscan, mgc, xenoRef, uniGene, refFlat,
  • ESTs. mRNAs
  • Up to date data!
  • Tool currently being written to nightly check
    UCSC
  • Download, unpack and sort annotation files.

7
  • GABOS Sequence Retrieval Features
  • Specify Search Criteria as either
  • Gene Name List
  • as in Annotation Files
  • NM_001037759,NM_145692, NM_027033, NM_013715 as
    in RefSeq.txt
  • Sgk3, 4930418G15Rik, Cops5, Sulf1 as in
    RefFlat.txt
  • Chromosome Sequence Range specification.
  • Chr1013,500,000 - 14,550,000
  • This will select all genes in this region that
    are defined in the annotation file(s) specified.
  • Exons (incl. EST exons), Transcripts of Genes or
    straight DNA sequence can be retrieved.
  • Specify either strand or both strands.

8
  • Extra Sequence Parameters
  • Range of bases in data object (for e.g. bps in an
    Exon)
  • 1-e all, base 1 to the end base (the default)
  • 1-10 bases 1 to 10
  • 10-e base 10 to end base in object.
  • Range of objects requested. (for e.g. a range of
    Exons)
  • 1-e all exons (the default)
  • 1-3 exons 1 to 3.
  • 1 first exon only
  • e last exon only
  • Possible Extensions
  • (e-3)-e last three objects (or bases)

9
  • GABOS Extras
  • Specify the line length of the FASTA output file.
  • Output Sequence Lines ONLY.
  • Output Fasta Description Lines ONLY.
  • Concatenate ALL Sequences.
  • Concatenate ONLY Sequence from a DNA object (Each
    genes exons concatenated for example).
  • String of characters to be inserted BEFORE each
    DNA object.
  • String of characters to be inserted AFTER each
    DNA object.
  • Specify flanking bases.
  • Show co-ordinates relative to Chromosome, Exon,
    Transcript
  • Uses either RefSeq or Browser gene names in
    refFlat.txt
  • GAFEP (Get a Few Exon Primers)
  • Use output of GABOS to find primers around each
    exon.

10
  • GABOS Command Line Version (CLI).
  • Same code. Program detects environment and
    adjusts accordingly.
  • CLI use of GABOS caters for programmatic use of
    the tool as part of other tasks.
  • For eg. Collecting 5000 bases before a transcript
    and 5000 into the transcript to be used for
    promoter/regulation searching for thousands of
    genes.

CLI Eg. gabos -afile refFlat.txt -genome mm9
-seqrange 4,482,560-4,483,185 -chr 1 -pre 420
-post 420 fastaonly gtmy_results.fa Options can
be in any order. Output can be redirected to a
file as shown. A file of gene names could be used
as input instead of a chromosome sequence
range. gabos help lists all options.
11
  • CLI additional abilities.
  • Gene lists read from a file or piped in.
  • Debugging options available.
  • Specification of alternate locations for
  • (enables use of program at other sites without
    modification.)
  • Annotation files.
  • Genome data files.
  • Checks if data files are latest version and
    updates if not (To be replaced with upgraded
    procedure).

12
GABOS Command Line options
All GAFEP programs can also be run at the command
line. In particular Combine_overlapping_exons, C
reate_primers1, Create_primers2
, Makep3i, P3out2tab.
  • -addends,
  • -addstarts,
  • -dnas,
  • -basedirs,
  • -genomes
  • -afiles,
  • -adirs,
  • -gdirs,
  • -check!
  • -names,
  • -nameps,
  • -namefs,
  • -chrs,
  • -seqranges,
  • -strands,
  • -dataobjects,
  • -objectranges,
  • -baserange
  • -seqonly,
  • -fastaonly,
  • -linelengthi,
  • -relatives,
  • -prei
  • -posti
  • -v!
  • -debug1i,
  • -debug2i,
  • -debug3i,
  • -debug4i,
  • -debug5i,
  • -debug6i,
  • -debugalli,
  • -hhelp?,
  • -version

13
  • Demo of GABOS version 2.
  • http//unix28.alpha.wehi.edu.au/bioinformatics/gab
    os/testing_index.php
  • Improvements
  • Automatically reads genomes available
  • Automatically shows chromosome data for genome
    selected.
  • Automatically shows Annotation data files for
    genome selected.
  • Includes ability to read EST data files.
  • Uses alternate gene name in refFlat.txt.
  • Faster processing of large data files
    using/making presorted versions.

14
  • GAFEP Get A Few Exon Primers.
  • This is a suite of programs.
  • Combines overlapping exons into one CExon.
  • Displays Primer3 options and collects choices.
  • Creates input files for Primer3 in the required
    format.
  • Runs Primer3, displays output on the web page and
    reformats the output suitable for pasting into
    Excel.
  • The same code runs from the web interface or
    from a Command Line Interface.

15
Combining Exons to reduce number of primers
needed.
1
CExon
16
(No Transcript)
17
(No Transcript)
18
  • Demonstration of GAFEP

19
GAFEP Output
20
(No Transcript)
21
  • An example application
  • Ben Kiles lab are using GABOS/GAFEP to create
    primers to search for variations in sequence
    caused by the ENU mutations in mice.

22
Random chemical mutagenesis in the mouse
N-ethyl-N-nitrosourea (ENU)
  • Alkylating agent
  • Point mutagen
  • Efficiently mutates mouse spermatogonial stem
    cells
  • Male mice treated with ENU produce offspring
    heterozygous for ENU-induced mutations at the
    rate of 1 mutation per 1.5 megabases

23
Phenotyping screen measuring platelet number
Blood test
Mutant offspring
Platelet counts
Platelet count x103/uL
24
Mapping strategy for dominant mutations
Affected
Wild-type
C57BL/6
X
1st Outcross
Balb/c
m
X
F1 Generation
2nd Outcross
Affected
Unaffected
F2 Generation
m
m
m
m
25
Mapping strategy for dominant mutations
  • Genome-wide scan with 80-100 microsatellites
  • 20 affected and 20 unaffected animals
  • Result mutation assigned to a chromosome
  • 2. Fine mapping
  • 200-1,000 informative meioses, genotyped with
    SSLPs at increasing density
  • Result candidate interval refined to 1-3 Mb
  • Issues
  • Recombination cold spots
  • Polymorphism deserts

SNP density map of mouse chromosome 1 (C57BL/6 v
129Sv)
26
Candidate intervals
Heaven
Hell
Chromosome 2 20-21 Mb
Chromosome 11 70-71 Mb
27
Candidate gene sequencing
  • Prioritize candidates for sequencing on the basis
    of
  • Known function
  • Homology to other genes of known function
  • Tissues expression pattern
  • Domain structure
  • Exhaustive literature searches..


28
Candidate gene sequencing
1. Automated PCR primer design
Robotic liquid handling
2. Genomic PCR
In-well template clean-up
3. Direct amplicon sequencing
4. Capillary electropheresis
5. Sequence analysis
29
  • Tools used to develop GABOS/GAFEP
  • Perl programming language for all programs.
  • Web interface
  • HTML coding
  • PHP inserted into HTML and processed by the
    webserver before the HTML is processed by the
    webserver.
  • Javascript processed by the clients web browser
    (Mozilla Firefox or Safari for example)

30
WEHI Computing Layout
Unix Server unix28
php processed here
Webserver apache
Client Mac, Windows.
html produced here
Browser Firefox,IE
wan/lan
html processed here
Javascript acts here In response to user
nfs
Unix28 disk GABOS/GAFEP
unix33
Display of GABOS/GAFEP here
Genome DATA
ftp
UCSC
31
  • Web Interface Debugging tools
  • Firefox Error Console
  • Firebug Addin to Firefox

32
  • Future Work
  • Short term
  • Finalize GABOS version 2
  • Transcript, DNA working
  • Complete data download maintenance program
  • Automate sorting of annotation files and modify
    GABOS to be aware of sorted/non-sorted data and
    act accordingly.
  • Include ability to retrieve RNA data
  • Will run on any unix server not just unix28.
  • Web Interface available on WEHIs public server.
  • Source code will be made freely available.
  • Longer Term
  • Retrieve data for utrs, others?
  • Provide web interface access to annotation files.
  • Remove need for BioPerl to be installed.

33
  • Aknowledgements
  • Bioinformatics Division
  • Terry Speed Gordon Smyth for the opportunity to
    pursue this project in an excellent environment.
  • All others in Bioinformatics for many and varied
    help.
  • WEHI ITS
  • Nick Tan, Jakub Szarlat for Unix help.
  • Dung Tran, Scott Wood for network help.
  • Tri Le and John Nguyen for MS windows support.
  • Tony Kyne others in ITS for many questions
    answered.
  • Molecular Medicine
  • Doug Hilton, Ben Kile for explaining their needs.
  • Users for their feedback.
  • Kylie Greig, Adrienne Hilton, Greg Hather,
    Carolyn de Graaf
Write a Comment
User Comments (0)
About PowerShow.com