Title: Genomics, Proteomics, and Bioinformatics
1Genomics, Proteomics, and Bioinformatics
- Biology 224
- Instructor Tom Peavy
- January 29, 2008
2What is bioinformatics?
- Interface of biology and computers
- Analysis of genomes, genes, mRNA and proteins
using computer algorithms and computer databases -
3What is Genomics?
What is Proteomics?
What is the Transcriptome?
4What do you want out of this course?
5Top ten challenges for bioinformatics
1 Precise models of where and when
transcription will occur in a genome
(initiation and termination) 2 Precise,
predictive models of alternative RNA
splicing 3 Precise models of biological
pathways ability to predict cellular
responses to external stimuli 4 Determining
proteinDNA, proteinRNA, proteinprotein
recognition codes 5 Accurate ab initio protein
structure prediction
6Top ten challenges for bioinformatics
6 Rational design of small molecule inhibitors
of proteins 7 Mechanistic understanding of
protein evolution 8 Mechanistic understanding
of speciation 9 Development of effective gene
ontologies systematic ways to describe
gene and protein function 10 Education
development of bioinformatics curricula
Source Ewan Birney, Chris Burge, Jim Fickett
7Themes throughout the course gene/protein
families
- Retinol-binding protein 4 (RBP4)
- member of the lipocalin family
- small, abundant carrier protein
- We will study it in a variety of contexts
including - --homologs in various species
- --sequence alignment
- --gene expression
- --protein structure
- --phylogeny
8(No Transcript)
9Tool-users
Tool-makers
10DNA
RNA
phenotype
protein
protein sequence databases
cDNA ESTs UniGene
genomic DNA databases
11There are three major public DNA databases
GenBank
EMBL
DDBJ
Housed at EBI European Bioinformatics Institute
Housed at NCBI National Center
for Biotechnology Information
Housed in Japan
12Growth of GenBank
Base pairs of DNA (billions)
Sequences (millions)
1982
1986
1990
1994
1998
2002
Updated 8-12-04 gt40b base pairs
Year
13- Press Release (August 22, 2005)
- 100 gigabases of sequence data
- (NCBI, EMBL, DDBJ)
- over 165,000 organisms
14The growth of GenBank. The blue area shows the
total number of bases including those from whole
genome shotgun sequencing projects (WGS). The
checkered area shows only the non-WGS portion.
With release 149, the number of WGS bases
exceeded the number of bases in the traditional
GenBank divisions.
15Go to NCBI website http//www.ncbi.nlm.nih.gov/
16- PubMed is
-
- National Library of Medicine's search service
- 12 million citations in MEDLINE
- links to participating online journals
- PubMed tutorial (via Education on side bar)
17- Entrez integrates
- the scientific literature
- DNA and protein sequence databases
- 3D protein structure data
- population study data sets
- assemblies of complete genomes
18Entrez is a search and retrieval system that
integrates NCBI databases
19- BLAST is
- Basic Local Alignment Search Tool
- NCBI's sequence similarity search tool
- supports analysis of DNA and protein databases
- 80,000 searches per day
20- OMIM is
- Online Mendelian Inheritance in Man
- catalog of human genes and genetic disorders
- edited by Dr. Victor McKusick, others at JHU
21- Books is
- searchable resource of on-line books
22- TaxBrowser is
- browser for the major divisions of living
organisms - (archaea, bacteria, eukaryota, viruses)
- taxonomy information such as genetic codes
- molecular data on extinct organisms
23- Structure site includes
- Molecular Modelling Database (MMDB)
- biopolymer structures obtained from
- the Protein Data Bank (PDB)
- Cn3D (a 3D-structure viewer)
- vector alignment search tool (VAST)
24Review ofGenetics, Biochemistry Evolution
25Human Genome Project
26What is a typical Genomic structure for a
Eukaryotic gene?
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Synonymous vs. nonsynonymous changes
31Synonymous Substitution
Non-synonymous Substitution
32Central Dogma
- DNA ? RNA ? protein
- sequence ? structure ? function ? evolution
33What kind of modifications Are made to Eukaryotic
mRNAs?
34RNA Modifications
35(No Transcript)
36(No Transcript)
37What are cDNAs?
38Protein structures
- X-ray crystallography and Nuclear magnetic
resonance (NMR) - Primary structure
- linear AA
- Secondary structure-
- alpha helix and beta sheet
- Tertiary structures-
- 3-d that exposes binding domains etc
39(No Transcript)
40Linkage maps
- YAC Yeast artificial chromosome
- BAC Bacterial artificial chromosome
- -used to clone large pieces of DNA
- -overlapping clones
- Are genes linked?
41Organization of genomes
- Groups of genes within a species
- -Comparative Genomics
- plastid genomes and mt genomes
42(No Transcript)
43How do we determine functions of genes?
44How do we determine functions of genes?
- Expression patterns
- Northerns
- RT-PCR
- SAGE
- Microarrays
- Transgenics
- insert genes what results?
- Mutants
- classical genetics
- molecular genetics
- And Functional Protein Assays
45Charles Darwin
- Descent with modification
- species change through time and are related to a
common ancestor - Natural Selection is the process by which this
change occurs
46Understanding Natural selection
- acts on individuals though consequences occur in
populations - Individuals phenotype reason survived and
reproduced - after a time this will change the distribution in
the population, - what ultimately changes?
- Gene pool
47New alleles
- Point change is all needed
- not always a "big deal"
- neutral change
- can be in Sickle cell anemia
48Gene duplication
- creates an additional copy of a gene
- unequal cross-over
- X-rays
- Are these duplicates maintained in populations?
- Psuedogenes
49(No Transcript)
50Polyploidy
- additional set of chromosomes
- Found in plants
- Amphibians, invertebrates
- Through a type of parthenogenesis
- Triploid
- Poor fertility
- Hybridization or meiosis malfunction
51Homology
- study of likeness (literal)
- Similarity between species (or genes) that
results from inheritance of traits from a common
ancestor - Unless know of a common ancestor have to be
careful when using this word.
52Orthologous vs Paralogous Genes
a
Gene Duplication
Speciation
Species 1
Species 2
53Species
- All organisms alive today can trace their
ancestry back to the origin of life some 3.8
billion years ago - Since then millions if not billions of branching
events have occurred - Mechanisms have to be in place for change to
occur - genetic drift and natural selection