What is Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

What is Bioinformatics

Description:

... the need for storing and communicating large datasets has grown tremendously. ... EMBL www.ebi.ac.uk/embl/ The EMBL (European Molecular Biology Laboratory) ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: fenBilk
Category:

less

Transcript and Presenter's Notes

Title: What is Bioinformatics


1
What is Bioinformatics?
  • Bioinformatics collection and storage of
    biological information
  • Computational biology development of algorithms
    and statistical models to analyze biological data

2
Jobs for bioinformaticians
3
Databases make biological data available to
scientists
  • As biology has increasingly turned into a
    data-rich science, the need for storing and
    communicating large datasets has grown
    tremendously.
  • Nucleotide, protein sequences
  • Protein structure
  • Expression data
  • Gene/protein networks

4
Nucleotide Databases
  • EMBL www.ebi.ac.uk/embl/
  • The EMBL (European Molecular Biology Laboratory)
    nucleotide sequence database is maintained by the
    European Bioinformatics Institute (EBI) in
    Hinxton, Cambridge, UK.

5
Nucleotide Databases cont.
  • GenBank maintained by the National Center for
    Biotechnology Information (NCBI) contains Entrez
    for accession to nucleotides, proteins,
    annotations, etc.
  • www.ncbi.nlm.nih.gov/Genbank/
  • UniGene a non-redundant set of gene-oriented
    clusters www.ncbi.nlm.nih.gov/UniGene/

6
Protein Databases
  • SWISS-PROT SWISS-PROT is a protein sequence
    database to provide a high level of annotations
    (such as the description of the function of a
    protein, its domains structure,
    post-translational modifications, variants,
    etc.), a minimal level of redundancy and high
    level of integration with other databases.
    www.expasy.ch/sprot/

7
Protein Databases
  • PIR
  • http//pir.georgetown.edu/
  • -The Protein Information Resource (PIR) is a
    division of the National Biomedical Research
    Foundation (NBRF) in the US. It is involved in a
    collaboration with the Munich Information Center
    for Protein Sequences (MIPS) and the Japanese
    International Protein Sequence Database (JIPID).
    Release 67.00 (31 Dec 2000) contains 198,801
    entries.

8
Sequence Motif Databases
  • Pfam
  • www.sanger.ac.uk/Software/Pfam/
  • Pfam is a database of protein families defined as
    domains (contiguous segments of entire protein
    sequences). For each domain, it contains a
    multiple alignment of a set of defining sequences
    (the seeds) and the other sequences in SWISS-PROT
    that can be matched to that alignment.

9
3D-Structure Databases
  • PDB
  • www.rcsb.org/pdb/
  • -The PDB is the main primary database for 3D
    structures of biological macromolecules
    determined by X-ray crystallography and NMR.
    Structural biologists usually deposit their
    structures in the PDB on publication, and some
    scientific journals require this before accepting
    a paper. It also accepts the experimental data
    used to determine the structures.

10
How to get sequences?
  • Entrez Database provides nucleotide and protein
    sequences in different formats.
  • One of the formats is FASTA

11
FASTA FORMAT
  • Each sequence begins with a description line gt

12
A protein in FASTA format
  • gtHBA_ALLMI
  • VLSMEDKSNVKAIWGKASGHLEEYGAEALEMFCAYPQTKIYFPHFDMSH
    NSAQIRAHGKKVFSALHEAVNHIDDLPGALCRLSELHAHSLRVDPVNFKF
    LAHCVLVVFAIHHPSALSPEIHASLDKFLCAVSAVLTSKYR
  • The first line is the description line, starts
    with a character 'gt' shows that the description
    line of a sequence follows the string following
    the 'gt' and ending at the first space (' ') is
    the sequence id (HBA_ALLMI).

13
A DNA sequence in Fasta
  • gtX sequence
  • ATGAATAGCACAGAGAGACCAAGAGAGAGAGAGAGACCCAGATATATCA
    GATAGAGA

14
Why align sequences?
  • Find evolutionary relationship between species
    and/or genes.
  • Identify novel genes and define similar genes in
    other species.
  • Study genomes and how they change.

15
Sequence Alignment
  • Homology means that two (or more) sequences have
    a common ancestor.
  • An example to sequence alignment

Sequence 1
Sequence 2
16
CLUSTALW A software for aligning sequences
http//www.ebi.ac.uk/clustalw/
17
Genome Databases
  • www.ensembl.org

18
Genome Databases Gene Prediction
  • Define the location of genes (coding sequences,
    regulatory regions)
  • Gene prediction using software based on rules and
    patterns. Find Open Reading Frames (ORFs), with
    additional criteria for good start sequence for a
    gene.
  • Gene identification through alignment with known
    proteins and EST sequences (Expressed Sequence
    Tags mRNA sequences).
  • Gene prediction through similarity with proteins
    or ESTs in other organisms.
  • Gene prediction through comparison with other
    genomes conserved regions are probably coding or
    regulatory regions.

19
Genome Databases Annotation
  • Annotation of the genes Compare with
    genes/proteins of known function in other
    organisms.
  • Functional classification. Broad groups of
    functional characterization, such as 'ribosomal
    proteins', 'nucleotide metabolism', 'signal
    transduction'.

20
Genome Databases Evolution
  • Evolutionary history
  • Genome duplications
  • Gene loss

21
Transcription Databases
  • Microarrays can analyze 1000s of transcripts
    simultaneously.
  • Allow analysis of genes that are high or low in
    expression between normal and disease, for
    example.
  • Microarray Databases contain expression data
    (large amounts).
  • Stanford Microarray Database

22
Signaling Metabolic Pathways
  • Analyze how genes/proteins interact and learn
    about function of genes
  • KEGG Kyoto Encyclopedia of Genes and Genomes
  • http//www.genome.ad.jp/kegg/
Write a Comment
User Comments (0)
About PowerShow.com