Title: HCS806 Methods in Horticulture and Crop Science
1HCS806 Methods in Horticulture and Crop
Science Introduction to methods in
Bioinformatics for plant science. David
Francis (Coordinator) Ian Holford (Molecular and
Cellular Imaging Center) Xiaodong Bai (Entomology)
2http//www.oardc.ohio-state.edu/tomato/
3(No Transcript)
4Survey HCS806SurveyPre.doc Goals 1) Establish
knowledge base and comfort level of students and
staff. 2) Assess available equipment and internet
capabilities.
5The course is being taught under the methods
number because it is intended to provide hands-on
practical training. At the end of the class
participants (graduate students and staff) are
expected to have gained Familiarity with
sequence databases and how data are
stored. Skills needed to retrieve, organize, and
store sequence data. Working knowledge of LINUX
commands for manipulating sequence files. Working
knowledge of stand-alone BLAST and running
stand-alone BLAST in the UNIX environment. Working
knowledge of BioPerl and its use to parse BLAST
outputs.
6Estimated Time line (Week 1) Monday
7/13 Introduction to BioInformatics (David
Francis) Distributed resources on the web
(DF) Creating and downloading datasets
(DF) Tuesday 7/14 Setting up your computer for
the class (DF) Installing Unix emulation
(CygWIN) for Windows (DF) Unix/Linux Commands
(Ian Holford) Wednesday 7/15 Installing Stand
alone BLAST (IH) Formatting Data for
Stand-alone BLAST (IH) Thursday 7/16 Stand-alone
BLAST and interpreting BLAST outputs
7Estimated Time line (Week 2) Monday 7/
20 Introduction to Perl Lecture (Bai) Monday 7/
20 Bioperl installation demonstration
(Bai) Tuesday 7/21 Bioperl modules (Bai)
8BioInformatics Def. From Wikipedia Application
of information technology to the field of
molecular biology Entails the creation and
manipulation of databases, algorithms,
computational and statistical techniques, and
theory to solve formal and practical problems
arising from the management and analysis of
biological data BioInformatics data are most
commonly in the form of DNA or Protein Sequence.
Computer scientists refer to this type of data as
a string. BioInformatics aims to facilitate
sequence analysis, genome annotation,
evolutionary biology, biodiversity, analysis of
gene expression, analysis of regulation,
prediction of structure, etc
9 Algorithm a procedure or formula for solving a
problem. An algorithm describes an explicit
series of steps that can be used to solve a
problem. In this class we want to encourage the
algorithm as a way of thinking Formulating the
biological questions is up to us. We then need
to design the algorithms to address the question.
If these procedures are repetitive, they lend
themselves to automation. http//www.cs.sunysb
.edu/alorith/
10Perhaps the most common tool used for sequence
analysis is the Basic Local Alignment Search
Technique (BLAST) BLAST finds regions of local
similarity between sequences. The algorithm
implemented by BLAST places an emphasis on speed
not sensitivity. For more information on what
BLAST does see http//en.wikipedia.org/wiki/BLAST
For more information on how to use BLAST
see httpwww.ncbi.nlm.nih.gov/Education/BLASTinfo
/information3.html Know The BLAST score
indicates how many Words overlap. Significance
scores are based on a distribution, base (or
nucleotide) frequency, and database
size. Alignments to look for similarity (form
implies function).
11Where do I go for data? General National Center
for Biotechnology Information (NCBI)
http//www.ncbi.nlm.nih.gov/ UniProt (SWISS-PROT
) http//www.uniprot.org/ European Molecular
Biology Laboratory (EMBL) nucleotide sequence
database http//www.ebi.ac.uk/embl/ Crop/family
Specific databases Solanaceae Genomics Network
(SGN) http//sgn.cornell.edu/ The arabidopsis
information resource (TAIR) http//www.arabidopsi
s.org/ Gene indexes (Formerly TIGR)
http//compbio.dfci.harvard.edu/tgi/
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19GenBank flat file format
20GenBank Flat file format (continued).
21GenBank Flat file format (continued).
22(No Transcript)
23FASTA file format FASTA is the standard for
sequence data format. gt is followed by a
name/description of the sequence. Everything
following the first paragraph break is expected
to be a sequence string of nucleotide or protein
sequence.
24(No Transcript)
25Descriptions of sequence databases Nucleotide
Contains high quality annotated sequences EST
Expressed Sequence Tag. Derived from cDNA
(mRNA) and therefore represents transcribed
(expressed) sequences. Generally are derived
from single pass Sanger sequencing. GSS
Genomic short sequences. Similar to EST
archive, but contains genomic sequence. For
example sequenced PCR products.
26(No Transcript)
27(No Transcript)
28Other databases The SWISS-PROT database
contains high-quality annotation, is
non-redundant and cross-referenced to many other
databases in May 26, 2009, the SWISS-PROT
database was merged into the UniProt database.
http//www.uniprot.org/ European Molecular
Biology Laboratory (EMBL) nucleotide sequence
database http//www.ebi.ac.uk/embl/
29(No Transcript)
30(No Transcript)
31Other databases Crop/family specific
databases e.g. Solanaceae Genomics Network (SGN)
http//sgn.cornell.edu/ e.g. The arabidopsis
information resource (TAIR) http//www.arabidopsis
.org/ Gene indexes (Formerly TIGR)
http//compbio.dfci.harvard.edu/tgi/
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38This ends an introduction to on-line
databases. Next, a discussion of downloading
customized data
39The following slides are related to future
lectures
40http//www.psc.edu/general/software/packages/
41(No Transcript)
42(No Transcript)
43(No Transcript)