Bioinformatics%20at%20Promega%20Corporation - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics%20at%20Promega%20Corporation

Description:

PhD and work experience in Molecular Biology. Eight years in Promega ... More complex code (VBScript) Rapidly evolving theory. Partially Promega specific ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 15
Provided by: monik2
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics%20at%20Promega%20Corporation


1
Bioinformatics at Promega Corporation
Intro to Bioinformatics Biotec November 28,
2006 Ethan Strauss Sr. Scientist RD
Bioinformatics, Promega, Ethan.strauss_at_promega.c
om http//q7.com/ethan/molbio
2
My Background
  • Bachelors degree in biology
  • PhD and work experience in Molecular Biology
  • Eight years in Promega Technical Services
  • Almost two years in Bioinformatics (officially)
  • No formal computer training
  • No formal bioinformatics training

3
Bioinformatics at Promega Corporation
  • Bioinformatics did not exists as a separate
    function until 2001
  • One person 2001- 2005
  • Two people 2005 - ?
  • Bioinformatics supports primarily RD (100
    scientists)
  • Mentor and train RD scientists
  • Provide expertise for projects (120 requests per
    year)
  • Propose and evaluate new acquisitions
  • Liaison to IT department
  • Manage bioinformatics infrastructure (15 tools)
  • Develop new tools and adapt existing tools in
    house

4
Bioinformatics Projects
  • Programming
  • Tools for internal and external Promega customers
  • Plexor Primer Design System (https//www.promega
    .com/techserv/tools/plexor/logon.aspx)
  • Biomath (http//www.promega.com/biomath/)
  • siRNA Designer (http//www.promega.com/siRNADesi
    gner/)
  • Sequence analysis for Excel and Microsoft
    Word(http//www.promega.com/enotes/features/fe002
    5.htm)
  • Analysis of BLAST results
  • Automated data retrieval (Web services)
  • Database for tracking vector construction
  • Database for keeping track of plasmid features

5
Bioinformatics Projects
  • Biocomputing (use of computers in biological
    research)
  • Database searches
  • data mining
  • discovery research
  • Primer design
  • Blast analysis and interpretation
  • Etc

6
NCBI
  • I recently took the Powerscripting course from
    NCBI
  • NCBI has a lot of very powerful tools and
    databases.
  • They are not as well documented as they might be.
  • Check them out periodically.
  • Databases at NCBI I was not aware of, but am now.
  • Pub Med CentralArticles with free text
  • 3D domain, structure, 3D structural information.
  • GEO (Gene Expression Omnibus)Micorarray
    expression data
  • There are many more which I see on the drop down
    list, but dont really know any thing about

7
NCBI ftp site
  • Most NCBI data is available by FTP from
    http//www.ncbi.nlm.nih.gov/Ftp/
  • I have used it for a number of projects including
    an analysis of amino acid residue distribution
    for the first 11 positions of human and E. coli

8
NCBI - Entrez Programming Utilities
Programatic access to Entrez http//eutils.ncbi.nl
m.nih.gov/entrez/query/static/eutils_help.html
Allows incorporation of entrez functionality
into third party tools http//www.promega.com/tech
serv/tools/plexor/NewQpcrProject.aspx Allows
automation of Entrez searchesAnalysis of large
datasetsAutomation of searches and
queries Accessable using HTTP or SOAP
9
NCBI - Entrez Programming Utilities
  • Programs available
  • ESearch Searches and retrieves primary IDs and
    term translations and optionally retains results
    for future use in the user's environment.
  • ESummary Retrieves document summaries from a
    list of primary IDs or from the user's
    environment.
  • EFetch Retrieves records in the requested
    format from a list of one or more primary IDs or
    from the user's environment.
  • ELink Checks for links from the query ID
    numbers to other Entrez databases
  • EInfo Provides field index term counts, last
    update, and available links for each database.
  • EPost Posts a file containing a list of primary
    IDs for future use in the user's environment to
    use with subsequent search strategies.

10
NCBI - Entrez Programming Utilities
Lets try it! Go to http//www.ncbi.nlm.nih.gov/Cla
ss/wheeler/eutils/eu.html and play Now
try http//www.ncbi.nlm.nih.gov/Class/wheeler/euti
ls/epipe.html
11
NCBI - Entrez Programming Utilities
These sorts of utilities can be access
programtically using Perl. See Demonstration
Programs at http//eutils.ncbi.nlm.nih.gov/entrez
/query/static/eutils_help.html
12
NCBI - Entrez Programming Utilities
my utils "http//www.ncbi.nlm.nih.gov/entrez/eu
tils" my db ask_user("Database",
"Pubmed") my query ask_user("Query",
"zanzibar") my report ask_user("Report",
"abstract") my esearch "utils/esearch.fcgi?
dbdbretmax1usehistoryyterm" my
esearch_result get(esearch . query) print
"\nESEARCH RESULT esearch_result\n" esearch_re
sult mltCountgt(\d)lt/Countgt.ltQueryKeygt(\d)lt
/QueryKeygt.ltWebEnvgt(\S)lt/WebEnvgts my Count
1 my QueryKey 2 my WebEnv
3 print "Count Count QueryKey QueryKey
WebEnv WebEnv\n" my retstart my
retmax3 for(retstart 0 retstart lt Count
retstart retmax) my efetch
"utils/efetch.fcgi?rettypereportretmodetextr
etstartretstartretmaxretmax" .
"dbdbquery_keyQueryKeyWebEnvWebEnv"
print "\nEF_QUERYefetch\n" my
efetch_result get(efetch) print
"---------\nEFETCH RESULT(". (retstart
1) . . (retstart retmax) . ") ".
"efetch_result\n-----PRESS ENTER!!!-------\n"

13
Bioinformatics Advice
  • Be aware of bias in databases!
  • Search Genbank (nucleotide) for HumanOrganism
    apoptosis. How many hits?
  • Now try OrcinusOrganism apoptosisHow many
    hits?
  • Can you conclude that Orcinus does not have
    apoptosis?

14
Bioinformatics Advice
  • Bioinformatics is changing and advancing very
    rapidly.
  • Dont forget to notice what is new.
  • NCBI now has 20 different databases. They had
    two only 3-5 years ago
  • If you want to do something that you know cant
    be done, check again in two weeks!
  • My standard computer can process the entire human
    genome for Restriction sites, ORFs etc in a few
    hours. Not long ago, the best computers couldnt
    even hold that much data!
  • If old tools work, dont feel you need to use the
    newest tools.
  • I still do much of my analysis with Microsoft
    Word

15
LIMS Laboratory Information Management System
Goal Manage in-house DNA sequences and
associated data Eval UW-Madison Center for
Eukaryotic Structural Genomics Sesame
http//www.sesame.wisc.edu/ Sesame is designed
to organize and record data relevant to complex
scientific projects, to launch computer-controlled
processes, and to help decide about subsequent
steps on the basis of information available. The
Sesame system is based on the multi-tier
paradigm, and it consists of a framework and
application modules that carry out specific
tasks.Users interact with Sesame through a
series of web-based Java applet-applications
designed to organize data. It allows
collaborators on a given project to enter,
process, view, and extract relevant data,
regardless of location, so long as web access is
available. Data reside in an Oracle relational
database. Sesame serves as a digital laboratory
notebook and allows users to attach numerous
files and images
16
Programming
  • Tools for Promega customers
  • Biomath (http//www.promega.com/biomath/)
  • Basic calculations (Most can be done easily by
    hand)
  • Simple code (Javascript)
  • Established theory.
  • Universal (not Promega specific)
  • siRNA Designer(http//www.promega.com/siRNADesigne
    r/ )
  • Complex calculations
  • More complex code (VBScript)
  • Rapidly evolving theory
  • Partially Promega specific

17
Programming
  • Tools for Promega customers
  • Plexor Primer Design (https//www.promega.com/tech
    serv/tools/plexor)
  • Complex calculations
  • Complex code (C.Net)
  • Separate user interface and main calculations
  • Multiple interacting modules
  • Database integration
  • Integration with Genbank (through a web service)
  • Proprietary improvements on established theory
  • Very Promega specific

18
Programming
  • Tools for internal use
  • BLAST analysis of Plexor Primers
  • Primer specificity is important
  • BLAST can determine specificity, but output is
    very complex.
  • Simplify
  • Combine all hits from the same Gene
  • Only show hits which could mis-prime
  • Groups hits by species
  • Allow sorting by species

19
Programming
  • Tools for internal use
  • BLAST analysis of Plexor Primers

Initial BLAST results (1 page out of 30)
Analyzed BLAST results (complete!)
20
Programming
  • Tools for internal use
  • Vector/Insert Database
  • Promegas Flexi vector system has a very
    structured cloning procedure.
  • RD has been making many different Flexi vector
    backbones with many inserts.
  • Keeping track has been a problem.
  • A database is in development

21
Programming
  • Tools for internal use

22
Programming
  • Internal Projects
  • Which Restriction enzyme cuts least frequently in
    human ORFs?
  • Method
  • Download human Refseq database (ftp//ftp.ncbi.nih
    .gov/refseq/H_sapiens/)
  • Load into local database
  • Scan each sequence for each RE site
  • The scan took 2-3 hours to complete

http//www.promega.com/pnotes/89/12416_11/12416_11
.pdf
23
Programming
  • Internal Projects
  • Which human genes in Genbank are the most
    popular?
  • Method
  • Download Gene database (ftp//ftp.ncbi.nlm.nih.g
    ov/gene/)
  • Download Gene Ontology information
    (http//www.geneontology.org/)
  • Use web services to get pathway information from
    KEGG (http//www.genome.jp/kegg/)
  • Use web services to get citation information from
    Pubmed (http//www.ncbi.nlm.nih.gov/entrez/query.f
    cgi?dbPubMed)
  • Load all into local database
  • Rank genes by desired criteria
  • Size
  • Function
  • Localization
  • Pathways
  • Publications

24
Database searches and data mining
Question Can you reformat this sequence for
me?Tool ReadSeq http//bimas.dcrt.nih.gov/molb
io/readseq Macros Question How many viral
proteins start with MetHis?Tool Hits database
motif searches http//hits.isb-sib.ch/ Question
How many different bacterial two-domain
proteins are known?Tool SCOP database
http//scop.berkeley.edu/ Question How do I
design PCR primers selective for bacterial
species X?Tool Ribosomal database 16s rRNA
alignment http//rdp.cme.msu.edu
Write a Comment
User Comments (0)
About PowerShow.com