Title: PGA Education Module: Bioinformatics Tools, ELXR and eTBLAST
1 PGA Education Module Bioinformatics Tools, ELXR
and eTBLAST
UTSW PGA Education Module Bioinformatics Tools,
ELXR and eTBLAST Webex, 12/17/03
Presented by Skip Garner This reflects the work
of many researchers and staff.
2Agenda for module - Today
- Welcome PGAers and others
- Interested persons should have visited our
web-site education module - http//pga.swmed.edu/new_pga/Dreamweaver/
- Brief introduction
- Other tools to be addressed in future Web-X demos
- Intro to hardware and use tracking
- Operations and use of the ELXR code
- Utility, use
- Examples
- Operations and use of the eTBLAST code
- Utility, use
- Examples
- Questions and Answers
3A family of bioinformatics tools and their
computed databases have been developed.
Genomic Annotation
Text data mining
Gene collection identification and analysis
Polymorphism Prediction
4Our Computational Biology / Bioinformatics
toolset is very applied.
- ELXR Exon locator and extractor for
resequencing. - POMOUS/Rep-X and SNIDE polymorphism prediction
software. - eTBLAST, FRISC, TRITE, IRIDESCENT Text data
mining and knowledge discovery tools. - PANORAMA A DNA/Protein sequence analysis and
visualization tool. - ARROGANT A gene/clone collection analysis tool.
- Local BLAST Server UTSW BLAST utility for
comparison against EST/cDNA/RefSeq sequences from
UTSW microarrays, specialized collections and
BioThreat work. - MarC-V, Signal, SNPCEQer .
5Hardware and Databases
We have established hardware and databases for
this effort.
Linux, UNIX, Solaris and Windows Servers gt10TB
primary storage All major languages and scripts
and databases
6For our applications that have web interfaces, we
monitor their usage and can estimate their
utility to the wider research community.
These numbers are only for users external to
UTSW, and they may contain web-bot hits that we
estimate from their origin to be about 50 of the
total.
7Primers for all human, mouse and rat exons, etc.
have been computed and experimentally verified.
ELXR components fastacmd (NCBI) genomic and
RefSeq mRNA sequence retrieval blastn (NCBI)
local genomic sequence alignment to EST/cDNA
sequence input sim4 (PSU) alignment of
transcribed and spliced DNA sequence to genomic
sequence containing that gene while predicting
donor/acceptor sites between introns and
exons primer3 (WI) designs PCR/sequencing
primer pairs
Some area
H2AFY2
ELXR
Exon 1 primers
8ELXR data sets verified and it is now used in the
NHLBI Program in Genomics Applications (PGA) SNP
discovery pipeline.
Latest numbers gt6,000 primer pairs in use for
PGA project.
9Pathogene
- Primers for every ORF for every microorganism
computed - From genome annotation
- From GLIMMER output (overestimates ORFs)
- Primer pairs in experimental validation
- Soon to be available via our www page at
rce.swmed.edu (along with our dedicated
microorganism BLAST server
Example ORF search
Primers
Pathogene Interface
10eTBLAST electronic Text Basic Local Alignment
and Similarity Tool For document clustering, a
new/better way for us to access the literature
eTBLAST electronic Text Basic Local Alignment
and Similarity Tool For document clustering, a
new/better way for us to access the literature
11eTBLAST is a automated document similarity search
and retrieval tool.
- Input is text (paragraphs, proposals, abstracts,
documents, sentences) via a www browser. - Stop words eliminated, keywords extracted (with
lexical variation and synonyms expanded in query)
and weighted - An indexed database of documents (Medline with
13,000,000 abstracts and some book chapters
currently implemented) is searched and ranked for
similarity. Alternate similarity algorithms
planned (grammar induction). - Top 200 hits re-ranked using a dynamic
programming algorithm. - Variety of data outputs in a browser.
- This is a work in progress, but has already found
a number of users (and has made me look smarter
than I really am).
12eTBLAST algorithm, natural query input,
filtering, and output is philosophically similar
to BLAST.
13Where eTBLAST has advantages
- First, eTBLAST is no substitute for traditional
search engines - eTBLAST is particularly valuable when entering a
new area of research for which selection of
keywords may be difficult - eTBLAST can be applied to bulk text that one
often has - Abstracts while reviewing papers
- Grant proposal abstracts can be directly
submitted - Student proposals
- Reference finding while writing papers or
proposals - Uniqueness searching for proposed manuscripts or
patent-able ideas - General text that defines a new area you are
beginning to study
14Using eTBLAST
- Go to http//innovation.swmed.edu/Biocomputing/Co
mputing.htm Select eTBLAST and follow the
directions (page 1) (page 2) - Those directions include
- Pasting in or entering text
- Enter the email address where you want the
results sent - Go to the next page and refine your search
parameters or go with the default - Submit the search
- Wait, and the results will be sent to you
- Inspect the results, interact with the returned
links - Refine your search and/or iterate
- Save the link or save the page locally, for
increased usage may require results to be purged
roughly monthly
15This is how you receive your results
Click here and Your results will Come up in
your browser
We monitor our Email
Raw results (not user friendly)
16eTBLAST by extension has lead to other
opportunistic applications - FRISC, TRITE
- eTBLAST similarity comparison engine for
electronic text using weighted keywords, concepts
and grammar induction. Other types of literature
have begun to be available (Book - Cancer
Medicine). - FRISC using eTBLAST, a UTSW faculty research
interests page is checked regularly against new
Biomedical abstracts from Medline and ranks to
cluster information that best fits interests of
researcher. (UTSW, Brown) A PGA specific User
Profile Builder will be available soon! - TRITE using eTBLAST, topical interests will be
searched regularly against new Biomedical
abstracts in Medline.
17eTBLAST is still experimental
- eTBLAST is offered as a free service of the
Garner Lab - eTBLAST is not funded, but we continue to develop
and extend it, and new features and increased
speed are coming - eTBLAST is experimental, and as such there will
be bugs and occasional temporary problems with
availability - Email us with problems, suggestions, comments
18 http//pga.swmed.edu/
http//pga.swmed.edu/