Title: Genomics for Librarians
1Genomics for Librarians
Stuart M. Brown, Ph.D. Director, Research
Computing, NYU School of Medicine
2A Genome Revolution in Biology and Medicine
- We are in the midst of a "Golden Era" of biology
- The Human Genome Project has produced a huge
storehouse of data that will be used to change
every aspect of biological research and medicine - The revolution is about treating biology as an
information science, not about specific
biochemical technologies.
3The Human Genome Project
4The job of the biologist is changing
As more biological information becomes available
and laboratory equipment becomes more automated
...
- The biologist will spend more time using
computers - on experimental design and data analysis (and
less time doing tedious lab biochemistry) - Biology will become a more quantitative science
(think how the periodic table affected chemistry)
5 A review of some basic genetics
6(No Transcript)
7DNA
- 4 bases (G, C, T, A)
- base pairs
- G--C
- T--A
- genes
- non-coding regions
8Decoding Genes
9What is Bioinformatics?
- The use of information technology to collect,
analyze, and interpret biological data. - An ad hoc collection of computing tools that are
used by molecular biologists to manage research
data. - Computational algorithms
- Database schema
- Statistical methods
- Data visualization tools
10Genomics
- What is Genomics?
- An operational definition
- The application of high throughput automated
technologies to molecular biology. - A philosophical definition
- A wholistic or systems approach to the study of
information flow within a cell.
11Genomics make LOTS of data!
- Investigators need complex databases just to
manage their own experiments - Biologists need to know how to do data mining to
answer even simple questions in these huge data
sets - Librarians understand the challenges of storage
and searching of large amounts of data
12New Biology New Librarians?
- How do Genomics and Bioinformatics overlap or
interact with Library Science? - The NCBI (Natl. Center for Biotechnology
Information), the home of GenBank, is part of
the National Library of Medicine - We store and organize genes like Journal articles
- accession number, annotation, etc. - A big part of bioinformatics involves keyword
searches and SQL queries in relational databases
13Bioinformatics is Not Library Science
- We are NOT cataloging a set of known information
- Programming and complex algorithms - pattern
matching, string matching, biostatistics - Data mining and multi-dimensional visualization
tools - Uncertainty of the data and constant revision of
the known - Genes are guesses based on complex algorithms,
not books on the shelf
14 15(No Transcript)
16Raw Genome Data
17BLAST Similarity Search
- gbBE588357.1BE588357 194087 BARC 5BOV Bos
taurus cDNA 5'. - Length 369
- Score 272 bits (137), Expect 4e-71
- Identities 258/297 (86), Gaps 1/297 (0)
- Strand Plus / Plus
-
- Query 17 aggatccaacgtcgctccagctgctcttgacgactccac
agataccccgaagccatggca 76 -
- Sbjct 1 aggatccaacgtcgctgcggctacccttaaccact-cgc
agaccccccgcagccatggcc 59 -
- Query 77 agcaagggcttgcaggacctgaagcaacaggtggagggg
accgcccaggaagccgtgtca 136 -
- Sbjct 60 agcaagggcttgcaggacctgaagaagcaagtggagggg
gcggcccaggaagcggtgaca 119 -
- Query 137 gcggccggagcggcagctcagcaagtggtggaccaggcc
acagaggcggggcagaaagcc 196
18Multiple Alignment
19Protein domains (Pattern analysis)
20Clustering (Phylogenetics)
21UCSC
22(No Transcript)
23The Challenge of New Data Types (Genomics)
- Gene expression microarrays
- thousands of genes, imprecise measurements
- huge images, private file formats
- Proteomics
- high-throughput Mass Spec
- protein chips protein-protein interactions
- Genotyping
- thousands of alleles, thousands of individuals
- Regulatory Networks
24Biological Information
25Microarray Technology
26Spot your own Chip (plans available for free
from Pat Browns website)
Robot spotter
Ordinary glass microscope slide
27cDNA spotted microarrays
28Goal of Microarray experiments
- Microarrays are a very good way of identifying a
bunch of genes involved in a disease process - Differences between cancer and normal tissue
- Tuberculosis infected vs resistant lung cells
- Mapping out a pathway
- Co-regulated genes
- Finding function for unknown genes
- Involved these processes
29Proteomics
- Identify all of the proteins in an organism
- Potentially many more than genes due to
alternative splicing and post-translational
modifications - Quantitate in different cell types and in
response to metabolic/environmental factors - Protein-protein interactions
30Yeast ProteomeJeong H, Mason SP, A.-L
BarabasiNature 411 (2001) 40-41
31Human Genetic Variation
- Every human has essentially the same set of genes
- But there are different forms of each gene --
known as alleles - blue vs. brown eyes
- genetic diseases such as cystic fibrosis or
Huntingtons disease are caused by dysfunctional
alleles
32- Alleles are created by mutations in the DNA
sequence of one person - which are passed on to
their descendants
33High-Throughput Genotyping
34Relate genes to Organisms
- Diseases
- OMIM Human Genetic Disease
- Metabolic and regulatory pathways
- KEGG
- Cancer Genome Project
35(No Transcript)
36Human Alleles
- The OMIM (Online Mendelian Inheritance in Man)
database at the NCBI tracks all human mutations
with known phenotypes. - It contains a total of about 2,000 genetic
diseases and another 11,000 genetic loci with
known phenotypes - but not necessarily known gene
sequences - It is designed for use by physicians
- can search by disease name
- contains summaries from clinical studies
37(No Transcript)
38Training "computer savvy" scientists
- Know the right tool for the job
- Get the job done with tools available
- Network connection is the lifeline of the
scientist - Jobs change, computers change, projects change,
scientists need to be adaptable
39Why teach genomics in undergraduate (or Medical)
education?
- Demand for trained graduates from the biomedical
industry - Bioinformatics is essential to understand current
developments in all fields of biology - We need to educate an entire new generation of
scientists, health care workers, etc. - Use bioinformatics to enhance the teaching of
other subjects genetics, evolution, biochemistry
40Genomics in Medical Education
- The explosion of information about the new
genetics will create a huge problem in health
education. Most physicians in practice have had
not a single hour of education in genetics and
are going to be severely challenged to pick up
this new technology and run with it." - Francis Collins
41Long Term Implications
- A "periodic table for biology" will lead to an
explosion of research and discoveries - we will
finally have the tools to start making systematic
analyses of biological processes (quantitative
biology). - Understanding the genome will lead to the
ability to change it - to modify the
characteristics of organisms and people in a wide
variety of ways
42Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet
Essentials of Medical Genomics
43www.GenomicsHelp.com