Title: Bioinformatics: Definitions, Challenges and Impact on Health Care Systems
1Bioinformatics Definitions, Challenges and
Impact on Health Care Systems
- Joyce Mitchell, PhD
- Professor and Chair
- Department of Biomedical Informatics
- University of Utah School of Medicine
- http//uuhsc.utah.edu/medinfo
2Topics
- What is Bioinformatics?
- Scope of Bioinformatics
- Genomics
- Proteomics
- Functional genomics
- Genomics data and patient care
- Impact of Bioinformatics on Health Information
Systems - What is coming?
3Central Dogma of Molecular Biology
Transcription
DNA
RNA
Protein
Phenotype
Phenotype
Translation
Post Translational Modification
Replication
4 What is Bioinformatics?
5NIH Definition
- Bioinformatics applies principles of information
sciences and technologies to make the vast,
diverse, and complex life sciences data more
understandable and useful.
6NIH Definition cont
- Bioinformatics Research, development, or
application of computational tools and approaches
for expanding the use of biological, medical,
behavioral or health data, including those to
acquire, store, organize, archive, analyze, or
visualize such data. - http//www.bisti.nih.gov/CompuB
ioDef.pdf
7AnotherNCBI (National Center for Biotechnology
Information)
- Bioinformatics is the field of science in
which biology, computer science, and information
technology merge into a single discipline. The
ultimate goal of the field is to enable the
discovery of new biological insights and to
create a global perspective from which unifying
principles in biology can be discerned. - http//www.ncbi.nlm.nih.gov/About/primer/bioinform
atics.html
8Bioinformatics Health Informatics
- Bioinformatics is the study of the flow of
information in biological sciences. - Health Informatics is the study of the flow of
information in patient care. - These two field are on a collision course as
genomics data becomes used in patient care. - Russ Altman,MD, PhD, Stanford Univ.
9(No Transcript)
10Scope of Bioinformatics
11Omes and Omics
- Genomics
- Primarily sequences (DNA and RNA)
- Databanks and search algorithms
- Supports studies of molecular evolution (Tree
wars) - Proteomics
- Sequences (Protein) and structures
- Mass spectrometry, X-ray crystallography
- Databanks, knowledge bases, visualization
- Functional Genomics (transcriptomics)
- Microarray data (and SNP Chips)
- Databanks, analysis tools, controlled
terminologies - Genetic Epidemiology finding gene-disease
associations - Linkage studies
- GWAS studies (genome wide association studies)
- Systems Biology (metabolomics)
- Metabolites and interacting systems
(interactomics) - Graphs, visualization, modeling, networks of
entities
12Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Functional Genomics (Transcriptomics)
Structural Genomics
Phenomics
Proteomics
13Human Genome Project
- Human Genome Project - International research
effort - Determine sequence of human genome and other
model organisms - Began 1990, completed 2003
- We are now in the Post-Genomic era
- Next steps for 20,000 genes
- Function and regulation of all genes
- Significance of variations between people
- Cures, therapies, genomic healthcare
14Genome and Genomics
- Genome entire complement of DNA in a species
- Both nuclear and mitochondrial/chloroplast
- Variants among individuals
- Genomics study of the sequence, structure and
function of the genome. Study relationships
among sets of genes rather than single genes. - Comparative genomics study of the differences
among species. Usually covers evolutionary
studies of differences conservation over time.
15Genome Databases (e.g., GenBank)
- Consists of
- long strings of DNA bases ATCG..
- Annotations of this database to attach meaning to
the sequence data. - Example entry from GenBank
- http//www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val
NM_000410doptgb Hemochromatosis gene HFE
16The Genome Sequence is at handso?
The good news is that we have the human genome.
The bad news is its just a parts list
17The Human Genome Project has catalyzed striking
paradigm changes in biology - biology is now an
information science.
- Leroy Hood, MD, PhD
- Institute for Systems Biology
- Seattle, Washington
18Genomes In Public Databases
- Published complete genomes
- Ongoing prokaryotic genome projects
- Ongoing eukaryotic genomes
2700
http//www.genomesonline.org/
19Genomics activities
- Sequence the genes and chromosomes done by
breaking the DNA into parts - Map the location of various gene entities to
establish their order - Compare the sequences with other known sequences
to determine similarity - Across species, conserved sequence motifs
- Predict secondary structure of proteins
- Create large databases GenBank, EMBL, DDBJ
- Develop algorithms and similarity measures
- BLAST and its many forms
20Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype
Genomics
Proteomics
Transcriptomics Functional Genetics
21Proteome vs Transcriptome
- Functional genomics (transcriptomics) looks at
the timing and regulation of gene products (mRNA,
primarily) - Proteome is final end-product (set of many or all
proteins). - Relationship between transcriptome and proteome
is complex, due to longevity of mRNA signal,
subsequent control of translation to protein, and
post translational modifications.
22Functional Genomics TechnologiesGene Chips,
Microarrays, etc
23Functional Genomics Microarrays
- Transcriptome and transcriptomics
- High throughput technique designed to measure the
relative abundance of mRNA in a cell or tissue
in response to an experiment. - Also called gene expression analysis
- (and multiple kinds of microarrays)
24Gene Chips (SNP-Chips)
- High throughput technique designed to measure
whether or not a sample of tissue has various
SNPs in its DNA. - The Gene Chip has small segments of DNA on the
chip with known variants (SNPs). 1 million
SNPs per chip
25GeneChip synthesis
26GeneChip synthesis
- Now have RNA, DNA and protein chips on the same
piece of equipment. - 25 DNA or RNA bases or Amino Acids in each of 1
million spots - DNA microarray synthesis
- http//www.hhmi.org/biointeractive/media/gene_chip
s-lg.mov
27Characteristics of Array Data
- Voluminous tens of thousands of variables with
relatively few observations of each (upside down
vs. classical biostatistics) - Noisy error rates up to 8
- Methods designed to detect patterns and
associations always find patterns and
associations
28Experimental Design
- A fundamental challenge of microarray
experiments underdetermined systems
Kohane IS, Kho AT, Butte AJ. Microarrays for an
Integrative Genomics. (The MIT Press Cambridge,
MA 2003), p. 11.
29GWAS Studies
- Genome Wide Association Studies
- Use SNP chips data to look for associations
between SNP profiles and diseases. - Beyond single gene studies into multiple gene
studies. Usually common diseases. - Beyond family studies into population studies
- Some analyses on combining GWAS analyses for
diseases in pedigrees.
30Published Genome-Wide Associations through
3/2009, 398 published GWA at p lt
5 x 10-8
NHGRI GWA Catalog www.genome.gov/GWAStudies
31Public Microarray Data Repositories
- Major public repositories
- GEO (NCBI)
- http//www.ncbi.nlm.nih.gov/geo/
- ArrayExpress (EBI)
- http//www.ebi.ac.uk/arrayexpress/
32Standards and Repositories
- Brazma, A, et al. Minimum information about a
microarray experiment (MIAME)-toward standards
for microarray data. Nature Genetics. 2001
Dec29(4)373. - http//www.nature.com/cgi-taf/DynaPage.taf?file/
ng/journal/v29/n4/full/ng1201-365.html - Ball, CA, et al. Submission of Microarray Data to
Public Repositories. PLoS Biology. 2004
September 2 (9) e317 - http//www.pubmedcentral.nih.gov/articlerender.fc
gi?toolpubmedpubmedid15340489
33Public GWAS Study Repositories
- dbGAP (database of genotype and phenotype) at
the NLM - http//www.ncbi.nlm.nih.gov/sites/entrez?dbgapcm
dsearchterm
34Central Dogma of Molecular Biology
DNA
RNA
Protein
Phenotype
Phenotype Tissues Organs Organisms
Genomics
Proteomics
Transcriptomics Functional Genetics
35Proteome and Proteomics
- Proteome the entire set of proteins (and other
gene products) made by the genome. - Proteomics study of the interactions among
proteins in the proteome, including networks of
interacting proteins and metabolic
considerations. Also includes differences in
developmental stages, tissues and organs.
36Protein Functions
- Catalysis
- Transport
- Nutrition and storage
- Contraction and mobility
- Structural elements
- Cytoskeleton
- Basement membranes
- Defense mechanisms
- Regulation
- Genetic
- Hormonal
- Buffering capacity
37Protein Databases
- SwissProt
- PIR
http//www.pir.uniprot.org/ - GENE http//www.ncbi.nlm.nih.gov/gene
- InterPro http//www.ebi.ac.uk/interpro/
- Correspond to (and derived from) Genome data
bases - All connected by Reference Sequences (NCBI)
UniProt
38Gene/Protein Database entries
- HFE record in Entrez GENE (NCBI)
- http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
genecmdretrievedoptGraphicslist_uids3077
39Structure Databases
- Contain experimentally determined and predicted
structures of biological molecules - Most structures determined by X-ray
crystallography, NMR - Example MMDB molecular modeling db
http//www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.sh
tml - HFE Entry
- http//www.ncbi.nlm.nih.gov/Structure/mmdb/mmdbsrv
.cgi?form6dbtDoptsuid9816
40Protein Interaction Databases
- Record observations of protein-protein
interactions in cells - Attempts to detail interactions observed in
thousands of small-scale experiments described in
published articles - Examples
- BIND Biomolecular Interaction Network Database
- DIP Database of Interacting Proteins
- MIPS Munich Information Center for Protein
Sequences - PRONET Protein interaction on the Web
- Many others, both academic and commercial
41Controlled Vocabularies in Bioinformatics
- The Gene Ontology http//www.geneontology.org/
- Knowledge about gene function (the ontology
itself) - Annotation of gene products (for comparisons)
- The MGED Ontology (arising from MIAME)
- http//mged.sourceforge.net/
- Annotation of microarray experiments for public
repositories - Clinical Bioinformatics Ontology
- Annotation of gene tests in electronic medical
records - http//www.cerner.com/cbo
- MIAPE from Proteomics Standards Initiative (PSI)
- Annotation of proteomics experiments for public
repositories - http//psidev.sourceforge.net/
42Genomics Data and Patient Care
- From genotype to phenotype
43Human Disease Gene Specifics
- Genes linked to human diseases (9-2004)
- 425 in 2 yrs
- 1700/20,000 9 of loci
44Informatics Issues related to Genomics Data and
Patient Care
- Linking known data for genes causing human
diseases to clinical decision support and EMR
documentation - Linking treatment/dx data from gene expression
profiles in EMRs - Representation of genetic data in electronic
medical records - Making this understandable to providers and
patients
45Genetics is Impacting Medicine Today
- 2000 genes health conditions
- gt 1700 single gene tests for diagnosis
- http//www.Genetests.org
- Growing number of gene expression profiles
(multi-gene microarray data) - Relate to diagnosis, therapy, drug dosage,
occupational hazards, reproductive plans, health
risks, .
46Well-known Examples (germ line)
- Pharmacogenetics
- CYP450 alleles exaggerated, diminished or
ultra-rapid drug responses. E.G., Warfarin. 93
of patients are OK on standard doses. 7 of
patients have severe hemorrhage. CYP2C92 and
CYP2C93 most severe of 6 known mutations. - Environmental susceptibility
- Sickle Cell trait carrier and malaria parasite
- Nutrition
- PKU and avoidance of phenylalanine
47(No Transcript)
48Clinical Uses of Expression Profiling
- Disease Dx and Tx
- Distinguish morphologically similar cancers
- DLBCL (Poulsen et al (2005) Microarray-based
classification of diffuse large B-cell lymphomas
European Journal of Haematology 74(6)453-65.)) - Therapy potential
- Rabson AB, Weissmann D. From microarray to
bedside targeting NF-kappaB for therapy of
lymphomas. Clin Cancer Res. 2005 Jan 111(1)2-6.
49More Applications
- Diagnostic tool to screen for infective agents
- Chip imprinted with set of pathogenic genomes
used to identify bacterial, viral, or parasite
genomic material in patients body fluids - Diagnostic chip to check for mutations involved
in drug-gene interactions. - Roche Amplichip
50Cardiac Transplant RejectionHoward Eisen MD,
Chief of Cardiology at Hahnemann (5-2009)
- Have 11 gene profile (the Allomap genes) to
predict transplant rejection. - http//www.xdx.com/allomap/
- Some centers reduce the numbers of biopsies if
the molecular profile predicts low risk. - New studies of other transplants (liver, kidney,
etc) - Brouard et al PNAS 2007 10415448-53. biomarker
panel renal transplant rejection
51Endomyocardial biopsy
- Currently Only way to test for rejection
- Risk Factors associated
52Breast Cancer Gene Expression Profiling
- 4 main molecular classes of breast cancer using
gene expression profiles (signatures) - Basal-like
- HER2-positive
- Luminal B
- Luminal A
- MammaPrint commerically available genomic assay
for outcome prediction - http//en.wikipedia.org/wiki/MammaPrint
- Treatment based on profiles
- Sotiriou and Pusztai. NEJM 2009. 360790-800.
Gene expression signatures in breast cancer.
53Example (somatic mutation)Iressa (gefitinib)
erlotinib
- Non-small cell lung CA 140,000 pt/yr
- Iressa (Astra Zeneca) causes remission in 1 of 10
patients. Newer drug is erlotinib - Iressa erlotinib efficacy correlates with EGFR
mutation in the tumor. Now have gene testing for
EGFR so can target appropriate people.
http//www.sciencemag.org/cgi/content/full/305/568
8/1222a
54Implications for Health Care System
- More gene tests will be ordered. reports of 300
increase in gene tests in 2003. - Arch Pathol Lab Med 2004, 128(12)1330-1333
- The FDA will regulate panels of tests.
- http//www.fda.gov/bbs/topics/news/2004/new01149.h
tml - Non-discrimination laws for insurance and
employment will open a floodgate. GINA - Preventive healthcare will play a larger part.
- Environmental risk factors dictate OSHA-type
approach to worker empowerment and education
about safe behavior
55Unsolved Informatics IssuesWhat Should Be
Stored in the EMR?
- Complete DNA sequence for specific genes into the
EMR? Where? - Microarray expression and gene chip data?
- Meta-data about the DNA sequence arrays?
- If not the sequence (ie., diff from reference
sequence), what to do when the reference sequence
changes? Or gene chip changes? - How to trigger alerts and reminders? And for
what? - How to link to the patients PHR?
56Genetic data in electronic medical records
- Implications for component systems
- Laboratory
- Pharmacy
- Computerized order entry
- Documentation and notes
- Knowledge management
- Alerts and reminders
- Finding patients matching profiles
- Practice guidelines and clinical trials
- Appropriate therapies and medications
- Keeping current
57Genome Data and Other Information Systems
- Genomic information will be pervasive in all
healthcare information systems. - Also in public health systems
- Newborn screening
- Tissue and organ banks
- DOD requires DNA samples
- Bioterrorism and homeland security
- Identification of World Trade Center victims
- Infection agent identification, origin spread
- Privacy and security issues are important but not
inherently different than other EMR data.
58Consumer Health Issues related to
Genetics/genomics
59Genetics Home Reference
- Consumer health resource to help the public
navigate from phenotype to genotype. - Focus on health implications of the Human Genome
Project. - http//ghr.nlm.nih.gov
- Mitchell, Fun, McCray, JAMIA, 2004 Nov
11(6)439-437
60Direct to consumer genetic tests
- 23me https//www.23andme.com/
- Navigenetics http//www.navigenetics.com
- DecodeMe http//www.decodeme.com/
61Reports on genotype-phenotype associations
62(No Transcript)
63What is Coming?
- Next generation sequencing
- More public information (personal genome project)
- Environmental variables to correlate with
genotype functional info - Human microbiome http//nihroadmap.nih.gov/hmp/
- Epigenetics
- Nanoparticles and nanomedicine
- More consumer activism
- Personalized Medicine
- And all of this in the EMR??
64Summary
- Informatics will be the key enabling technology
for personalized, genomic medicine. - Current separation between bioinformatics and
clinical informatics is diminishing as the two
subdisciplines become more entwined
65Discussion?
66Optional ExerciseHands-on with GHR
- Scavenger hunt with hemochromatosis and the genes
that influence it. - Explore the Genetics Home Reference by answering
the following questions. Start at
http//ghr.nlm.nih.gov .
67GHR Scavenger Hunt
- How common is hemochromatosis?
- How many genes have been proven to be involved in
hemochromatosis when the genes are mutated? - What are the symbols for these genes?
- Can you find the link to MedlinePlus with health
information on hemochromatosis?
68GHR Scavenger Hunt
- What are the names of the patient support
associations for hemochromatosis? - One synonym for this condition is bronze
diabetes. Can you find a reason for this? - What kind of damage is done to the liver of
people with hemochromatosis?
69GHR Scavenger Hunt
- For the genes involved in hemochromatosis, how
many of them are available as a DNA test? - Give one place where you would choose to send a
tissue sample for DNA testing. - What sites are listed under Research Resources
for the TFR2 gene? - How many alternately spliced proteins for TFR2?
- In what tissues is this gene expressed?
70GHR Scavenger Hunt
- How do people inherit hemochromatosis?
- Do the genes involved in hemochromatosis cause
other health conditions when they are mutated? - Can you find a protein sequence for one of the
genes? - What clinical trials are available for
hemochromatosis patients close to where you live?
71Questions to
- Joyce Mitchell
- Joyce.mitchell_at_hsc.utah.edu
- http//uuhsc.utah.edu/medinfo