BIIN200: Bioinformatics I - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

BIIN200: Bioinformatics I

Description:

Analysis of multiple organisms including Zebrafish ... The zebrafish virtual map was also able to identify a gene, pyruvate carboxylase ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 39
Provided by: craigs89
Learn more at: http://www.mscs.mu.edu
Category:

less

Transcript and Presenter's Notes

Title: BIIN200: Bioinformatics I


1
BIIN200 Bioinformatics I
  • Introduction
  • Craig A. Struble, Ph.D.
  • Department of Mathematics, Statistics, and
    Computer Science
  • Marquette University
  • Norie Dela Cruz, Ph.D.
  • Rat Genome Database
  • Medical College of Wisconsin

2
Overview
  • Introduction to Bioinformatics
  • Syllabus
  • Student Introductions

3
What Is Bioinformatics?
  • Bioinformatics is a new subject of genetic data
    collection, analysis and dissemination to the
    research community. Hwa A. Lim (1987)
  • Bioinformatics Research, development, or
    application of computational tools and approaches
    for expanding the use of biological, medical,
    behavioral or health data,including those to
    acquire, store, organize, archive, analyze, or
    visualize such data. NIH working definition
    (2000)

4
What is Bioinformatics?
Informatics Computer Science Computer
Engineering Information Science
Biology Other Natural Sciences
Bioinformatics
Mathematics Statistics
5
Bioinformatics Related Fields
  • Computational biology
  • Computational molecular biology
  • Biomolecular informatics
  • Computational genomics

6
Biological Data
  • Genomes
  • DNA Sequences of A, T, C, G
  • Annotated with function, interesting features
  • Proteins
  • Amino Acid Sequences
  • Sequences of 20 letters
  • Annotated with structure, function, etc.

7
Biological Data
  • Gene Expression
  • Dynamic behavior of genes
  • Protein Expression
  • Dynamic behavior of proteins
  • Structural Features
  • RNA and proteins

8
Biological Data Sus scrofa agouti-related
protein gene
  • 1 ggcacattct cctgttgagc caggctatgc
    tgaccacaat gttgctgagc tgtgccctac
  • 61 tgctggcaat gcccaccatg ctgggggccc agataggctt
    ggcccccctg gagggtatcg
  • 121 gaaggcttga ccaagccttg ttcccagaac tccaaggtca
    gtgcgggcag gagtgggttg
  • 181 ggtggggctt ggacatcctc tggccacaaa gtattctgct
    tgtatgagcc ctttcttccc
  • 241 cttcccaatc ccaggcctgg gaggtgggtg ttttgtgcat
    gggtggttct gccctcacat
  • 301 catctgtccc agatctaggc ctgcagcccc cactgaagag
    gacaactgca gaacgggcag
  • 361 aagaggctct gctgcagcag gccgaggcca aggccttggc
    agaggtaaca gctcagggaa
  • 421 agggctgagg ccacaagtct tgagtgggtg tgtcaagcat
    caacctctat ctgtgcttgg
  • 481 agttgccact gtggtacaac gggattggcg gtgtcttggg
    agcgctggga cgtggtttca
  • 541 tccccggcca gcacaagtgg gttaaggatc tggccttgcc
    atcccttcag cttaggctga
  • 601 gactgtggct tggagctgat ctctgaccgg aagctccata
    tgctctgggg tgaccaaaaa
  • 661 tggaaaaaca aacatacaaa acacctctac ctgcacttcc
    tgaccccctc acccggggcg
  • 721 acactgcaga ccatcccgtt cacgctccac ttccatcctg
    ccttgatctg gcgcattcca
  • 781 tgaatgtgct tttggaagtc cttgtttccc aacccttgta
    ggtgctagat cctgaaggac
  • 841 gcaaggcacg ctccccacgt cgctgcgtaa ggctgcacga
    atcctgtctg ggacaccagg
  • 901 taccatgctg cgacccatgt gctacatgct actgccgttt
    cttcaacgcc ttctgctact
  • 961 gccgcaagct gggtactgcc acgaacccct gcagccgcac
    ctagctggcc agccaatgtc
  • 1021 gtcg

9
Genome Sizes
10
Database Growth
11
Database Growth
12
Database Growth
13
Database Growth
  • Exponential growth in sequence data
  • Not much growth in sequence size
  • Expect exponential growth in annotation
    information
  • We have lots of data, but its difficult to make
    sense of it.

14
Fundamental Problems in Bioinformatics
  • Pairwise Sequence Alignment
  • Multiple Sequence Alignment
  • Phylogenetic Analysis
  • Sequence Based Database Searches
  • Gene Prediction
  • Structure Prediction (RNA and Protein)
  • Protein Classification
  • Gene Expression
  • ...

15
Pairwise Sequence Alignment
  • Given two DNA or AA sequences, find the best way
    to line them up
  • Biology allows for variation
  • Gaps, mismatches, etc..

HEAGAWGHEE
PAWHEAE
HEAGAWGHE-E
HEAGAWGHE-E
P-A--W-HEAE
--P-AW-HEAE
16
Multiple Sequence Alignment
  • Extend pairwise problem to multiple sequences

17
Phylogenetic Analysis
  • Study relationships between organisms
  • Characteristic similarity
  • Sequence similarity
  • Whole genome comparison

18
Phylogenetic Analysis
19
Sequence Based Database Searches
  • Keyword
  • Find all sequences named cytochrome c
  • Sequence
  • Find all sequences similar to HEAGAWGHEE
  • Remember, there are gigabytes to search, and Im
    not about to wait two days for an answer!
  • BLAST, FASTA,

20
Gene Prediction
  • Does the following sequence contain a gene?
  • How many introns? Exons? Promoters? Other
    features?

TTGTAATCTCCTCTGTGACTATAATGACTAGTCTCAGGCCTGCCTTCCCC
AGAAACCTCTCTTTTGGCTATTTCTCTTTC TAGTTCTCTGTTTAAACAA
AATTTATTCTATATATCTATCTATCTGTCTATCTATCTATCTATCTATCT
ATCTATCTATC TATCTATCTATCTATCATCTACTTATCATCTGTCTAGC
CATTTGAAGCATCTTTGTGTTTTAGGTCCTGTTAGATTCTCC TTTCAGC
CAGTGGAGGATCTGGACAGAGCTATTTCTTAGCTTCCCCTAAGCCATGTT
GTTAGAACGAATCCCCCACACCT CCTCTGAGTGCTACGTCTCCGTCAAG
AATTATGTATGTGGGATCCAGATGGCCCAGTGGATAAAACTGCAAGTGTC
ATGA CCATGACCTGACTTCAAGGGATTGTGTAGAAAGGGAGTTATCACA
GTGTGAGGGACAGGGCTAAGGACACTAACCCGTAT GTTGAGGGGCACAG
ACGCTAGCAACAACAGTGAAGTGTTTAAAAAGGCAAAAATCATGTTTCTA
GAAGTCAGGAAGAGCC TAACTTGTGGACAAGGACCAACAGGCAGCAGTT
GTAATGGGGCAGGGCAGAGGGAGAGCGGACACGCAGCTTTTGGCATC AA
ACACACCCAGAGTGTGGATAGAGAGTAGGGAAATACTCTAGTCTCTGGCT
AGGATACTCCCCTCTCTTTTTGACATTT CTCATTGGCAGCCCCAAGTGG
TCACTGGAGAGCCAGGAAGCCTAAAGGACACAGTTAGTAGCAGCCAGCTC
CTTTGGTGG AATTTTGGGGACATGGTGGGGTGACTTGGCTCTATCCAGG
CCAGGGCTGGGTGTGAGTATACACTTAGTGACTGGCCTTC
21
Gene Prediction
22
Structure Prediction (RNA, Protein)
  • From sequence, predict 2 and 3D structures.

23
Protein Classification
  • From sequence, identify characteristics of a
    protein
  • Active sites
  • Families (e.g. globin)
  • Blocks
  • Domains
  • Folds
  • Motifs
  • Etc.

24
Gene Expression
  • Study of gene activity under experimental
    conditions
  • Large scale studies with microarrays

25
Bioinformatic Based ApplicationVCMAP
  • Comparative mapping is a strategy that allows
    cross-organism study of physiological genomics
  • Virtual Comparative Map (VCMap) performs homology
    analysis with mathematical predictions to
    construct un-tested (in the wet-lab)
    cross-organism maps between human, rat, mouse and
    zebrafish
  • This application provides a highly modular
    investigative environment for the
  • Analysis of multiple organisms including
    Zebrafish
  • Collection of genetic and radiation hybrid maps
  • Prediction of Genes based on homology

26
VCMAP
  • Homology analysis was based on sequence
    similarity (Altschul, et al 1990) and curated
    homologous genes.
  • 85 similarity with 100 bp stretch across all
    species was used to create the maps
  • NCBIs UniGene sequence sets, RH and Genetic
    maps were chosen to create anchor objects
    (Kwitek-Black, et al. 2001).
  • 1-to-1 homologous objects were used for building
    the virtual comparative maps with a pipeline
    architecture

27
VCMAP
Download UniGene data from NCBI
Mask UniGene sequences
Load UniGene data to DB
DB
Format masked sequences
Blast
Map Data
Search UniGene
VC Maps Building
Anchor Report
Generate anchor report
Create Homolog UniGene Object and Scoring
1-to-1 Objects
28
VCMAP
29
Different perspectives on Bioinformatics
  • Bioinformatics is a tool
  • Biologists, biochemists, medical professionals,
    etc.
  • Obtain meaningful and understandable results
  • Bioinformatics is a discipline
  • Informaticians, mathematicians, statisticians,
    etc.
  • Generate meaningful and understandable results

30
Goals of the Course
  • Communication between biologists and
    computational scientists
  • Access, retrieve, and analyze bioinformatic data
  • Know fundamental problems in bioinformatics
  • Use standard bioinformatic tools to answer
    biological questions
  • Understand theories used to build the tools
  • Critically assess solutions to bioinformatic
    problems

31
How Are We Going To Get There?
  • References
  • Cynthia Gibas and Per Jambeck, Developing
    Bioinformatics Computer Skills, OReilly
    Publishers, 2001, ISBN 1-56592-644-1.
  • David W. Mount, Bioinformatics Sequence and
    Genome Analysis, Cold Spring Harbor Laboratory,
    2001, ISBN 0879696087.

32
How Are We Going To Get There?
  • Lab Assignments
  • Nine (9) assignments covering the major topics
  • Maintain a lab notebook, collected 3 times for
    review
  • Bistro Lab
  • 368 Cudahy Hall
  • Windows workstations, Sun server
  • Variety of software
  • Lab orientation (when should we have it?)

33
How Are We Going To Get There?
  • Lab Web Page
  • http//bistro.mscs.mu.edu
  • For 70 grade
  • Post 3 stories with commentary
  • Post 3 links to bioinformatic tools, properly
    categorized
  • Post 5 comments on others stories
  • More posts, writing plug-ins, writing lab HOWTOs,
    etc. will increase grade

34
How Are We Going To Get There?
  • Exams
  • Midterm
  • Final
  • Intangibles
  • Discussion with instructors
  • Being engaged in the class
  • Suggestions about the lab

35
Grading
36
Who We Are
  • Craig A. Struble, Ph.D.
  • Ph.D. in Computer Science, 2000 from Va. Tech
  • 3rd year at Marquette
  • Interests Microarray data analysis, medical
    literature mining, miRNA,
  • Norie Dela Cruz, Ph.D.
  • Rat Genome Database

37
Who Are You?
  • Name
  • Where are you from?
  • Background
  • Why bioinformatics?

38
Summary
  • Bioinformatics is truly interdisciplinary
  • Biology (natural sciences), informatics,
    mathematics statistics
  • Databases
  • Large, semistructured, incomplete, inaccurate
  • Wide-range of problems
  • Solutions employ knowledge from sciences with
    algorithms and models from informatics,
    mathematics, and statistics
Write a Comment
User Comments (0)
About PowerShow.com