Genomics for Librarians - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Genomics for Librarians

Description:

... 1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369 ... to change it - to modify the characteristics of organisms and people in a wide variety of ways ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 44
Provided by: researchco8
Category:

less

Transcript and Presenter's Notes

Title: Genomics for Librarians


1
Genomics for Librarians
Stuart M. Brown, Ph.D. Director, Research
Computing, NYU School of Medicine
2
A Genome Revolution in Biology and Medicine
  • We are in the midst of a "Golden Era" of biology
  • The Human Genome Project has produced a huge
    storehouse of data that will be used to change
    every aspect of biological research and medicine
  • The revolution is about treating biology as an
    information science, not about specific
    biochemical technologies.

3
The Human Genome Project
4
The job of the biologist is changing
As more biological information becomes available
and laboratory equipment becomes more automated
...
  • The biologist will spend more time using
    computers
  • on experimental design and data analysis (and
    less time doing tedious lab biochemistry)
  • Biology will become a more quantitative science
    (think how the periodic table affected chemistry)

5
A review of some basic genetics
6
(No Transcript)
7
DNA
  • 4 bases (G, C, T, A)
  • base pairs
  • G--C
  • T--A
  • genes
  • non-coding regions

8
Decoding Genes
9
What is Bioinformatics?
  • The use of information technology to collect,
    analyze, and interpret biological data.
  • An ad hoc collection of computing tools that are
    used by molecular biologists to manage research
    data.
  • Computational algorithms
  • Database schema
  • Statistical methods
  • Data visualization tools

10
Genomics
  • What is Genomics?
  • An operational definition
  • The application of high throughput automated
    technologies to molecular biology.
  • A philosophical definition
  • A wholistic or systems approach to the study of
    information flow within a cell.

11
Genomics make LOTS of data!
  • Investigators need complex databases just to
    manage their own experiments
  • Biologists need to know how to do data mining to
    answer even simple questions in these huge data
    sets
  • Librarians understand the challenges of storage
    and searching of large amounts of data

12
New Biology New Librarians?
  • How do Genomics and Bioinformatics overlap or
    interact with Library Science?
  • The NCBI (Natl. Center for Biotechnology
    Information), the home of GenBank, is part of
    the National Library of Medicine
  • We store and organize genes like Journal articles
    - accession number, annotation, etc.
  • A big part of bioinformatics involves keyword
    searches and SQL queries in relational databases

13
Bioinformatics is Not Library Science
  • We are NOT cataloging a set of known information
  • Programming and complex algorithms - pattern
    matching, string matching, biostatistics
  • Data mining and multi-dimensional visualization
    tools
  • Uncertainty of the data and constant revision of
    the known
  • Genes are guesses based on complex algorithms,
    not books on the shelf

14

15
(No Transcript)
16
Raw Genome Data
17
BLAST Similarity Search
  • gbBE588357.1BE588357 194087 BARC 5BOV Bos
    taurus cDNA 5'.
  • Length 369
  • Score 272 bits (137), Expect 4e-71
  • Identities 258/297 (86), Gaps 1/297 (0)
  • Strand Plus / Plus

  • Query 17 aggatccaacgtcgctccagctgctcttgacgactccac
    agataccccgaagccatggca 76

  • Sbjct 1 aggatccaacgtcgctgcggctacccttaaccact-cgc
    agaccccccgcagccatggcc 59

  • Query 77 agcaagggcttgcaggacctgaagcaacaggtggagggg
    accgcccaggaagccgtgtca 136

  • Sbjct 60 agcaagggcttgcaggacctgaagaagcaagtggagggg
    gcggcccaggaagcggtgaca 119

  • Query 137 gcggccggagcggcagctcagcaagtggtggaccaggcc
    acagaggcggggcagaaagcc 196

18
Multiple Alignment
19
Protein domains (Pattern analysis)
20
Clustering (Phylogenetics)
21
UCSC
22
(No Transcript)
23
The Challenge of New Data Types (Genomics)
  • Gene expression microarrays
  • thousands of genes, imprecise measurements
  • huge images, private file formats
  • Proteomics
  • high-throughput Mass Spec
  • protein chips protein-protein interactions
  • Genotyping
  • thousands of alleles, thousands of individuals
  • Regulatory Networks

24
Biological Information
25
Microarray Technology
26
Spot your own Chip (plans available for free
from Pat Browns website)
Robot spotter
Ordinary glass microscope slide
27
cDNA spotted microarrays
28
Goal of Microarray experiments
  • Microarrays are a very good way of identifying a
    bunch of genes involved in a disease process
  • Differences between cancer and normal tissue
  • Tuberculosis infected vs resistant lung cells
  • Mapping out a pathway
  • Co-regulated genes
  • Finding function for unknown genes
  • Involved these processes

29
Proteomics
  • Identify all of the proteins in an organism
  • Potentially many more than genes due to
    alternative splicing and post-translational
    modifications
  • Quantitate in different cell types and in
    response to metabolic/environmental factors
  • Protein-protein interactions

30
Yeast ProteomeJeong H, Mason SP, A.-L
BarabasiNature 411 (2001) 40-41
31
Human Genetic Variation
  • Every human has essentially the same set of genes
  • But there are different forms of each gene --
    known as alleles
  • blue vs. brown eyes
  • genetic diseases such as cystic fibrosis or
    Huntingtons disease are caused by dysfunctional
    alleles

32
  • Alleles are created by mutations in the DNA
    sequence of one person - which are passed on to
    their descendants

33
High-Throughput Genotyping
34
Relate genes to Organisms
  • Diseases
  • OMIM Human Genetic Disease
  • Metabolic and regulatory pathways
  • KEGG
  • Cancer Genome Project

35
(No Transcript)
36
Human Alleles
  • The OMIM (Online Mendelian Inheritance in Man)
    database at the NCBI tracks all human mutations
    with known phenotypes.
  • It contains a total of about 2,000 genetic
    diseases and another 11,000 genetic loci with
    known phenotypes - but not necessarily known gene
    sequences
  • It is designed for use by physicians
  • can search by disease name
  • contains summaries from clinical studies

37
(No Transcript)
38
Training "computer savvy" scientists
  • Know the right tool for the job
  • Get the job done with tools available
  • Network connection is the lifeline of the
    scientist
  • Jobs change, computers change, projects change,
    scientists need to be adaptable

39
Why teach genomics in undergraduate (or Medical)
education?
  • Demand for trained graduates from the biomedical
    industry
  • Bioinformatics is essential to understand current
    developments in all fields of biology
  • We need to educate an entire new generation of
    scientists, health care workers, etc.
  • Use bioinformatics to enhance the teaching of
    other subjects genetics, evolution, biochemistry

40
Genomics in Medical Education
  • The explosion of information about the new
    genetics will create a huge problem in health
    education. Most physicians in practice have had
    not a single hour of education in genetics and
    are going to be severely challenged to pick up
    this new technology and run with it."
  • Francis Collins

41
Long Term Implications
  • A "periodic table for biology" will lead to an
    explosion of research and discoveries - we will
    finally have the tools to start making systematic
    analyses of biological processes (quantitative
    biology).
  • Understanding the genome will lead to the
    ability to change it - to modify the
    characteristics of organisms and people in a wide
    variety of ways

42
Stuart M. Brown, Ph.D.stuart.brown_at_med.nyu.eduww
w.med.nyu/rcr
Bioinformatics A Biologist's Guide to
Biocomputing and the Internet
Essentials of Medical Genomics
43
www.GenomicsHelp.com
Write a Comment
User Comments (0)
About PowerShow.com