Title: Donna Maglott, PH.D.
1PRO and Medical Genetics resources at NCBI
2Opportunities
- The medical genetics group is a relatively recent
addition to the suite of resources at NCBI, and
manages the NIH Genetic Testing Registry (GTR),
ClinVar, and MedGen. These databases share the
need to standardize representation of genes,
proteins, small molecules, variation, conditions,
and phenotypes, not only with respect to explicit
terms, but also the relationships among those
terms. This presentation will focus on
opportunities for utilization of PRO in the
NCBIs Medical Genetics group.
3Case studies
- Medical genetics ClinVar, Gene, GTR, MedGen
4A quick tour
From the home page
5Using the resource sections
6Try all sections
7Try all sections
8 major domains of information
Concept NCBI database/Resource Used in
Diseases and their defining features MedGen (Diseases, Findings) ClinVar, dbVar, Gene, GTR, PheGenI, dbGaP
Drugs MedGen (Pharmacologic Substance) ClinVar, GTR
Genes and gene products Gene, Nucleotide, Protein, HomoloGene, RefSeq ClinVar, dbSNP, dbVar, GTR
Biological processes, cellular components, molecular functions --- Gene
Interactions and pathways Biosystems, Gene Biosystems, Gene
Variation ClinVar, dbSNP, dbVar ClinVar, dbSNP, dbVar
Records connected by reciprocal, generic links
via database identifiers
9Some Talking points
- Except for RefSeq, curation minimal
- RefSeq-based with pointers to UniProtKB
- Use ontologies to acquire and represent standard
terms - Point to ontologies, but not used to support
node-based query interfaces - Capturing primary data that can be used to drive
development of ontologies - Some user communities think in terms of
nucleotide only - Data being submitted with uncertain significance
- Look for opportunities for adding value to NCBIs
databases and tools
10Gene and data standards
- Name of the gene (nomenclature committees)
- Names of protein products
- Primary product (Swiss-Prot)
- Isoforms (RefSeq)
- Names of associated conditions (multiple)
- Descriptions of pathways (submitters)
- Biological processes, cellular components and
molecular functions (GO) - HIV interactions (NIAID)
- http//www.ncbi.nlm.nih.gov/gene?termhiv1interact
ionsProperties - http//www.ncbi.nlm.nih.gov/projects/RefSeq/HIVInt
eractions/
11Human mismatch repair
12Restrict to those reported to be disease-causing
13www.ncbi.nlm.nih.gov/gene/4292
Summary Bibliography Interactions Pathways Gene
Ontology General protein information Reference
sequences Locus-specific databases
Phrase found in
14Titles of pathways Descriptions of interactions
15(No Transcript)
16Genelt-gtProtein
17HomoloGene
18Diseases and phenotypes
- MedGen UMLS, HPO, OMIM, ORDO, GTR
19Why MedGen?
- A stable node of identifiers within NCBI for
disease names, their clinical features, and
pharmacological substances - Built on the foundation of a subset of UMLS, with
supplements from HPO, OMIM (between UMLS
releases), and submissions to GTR and ClinVar - Primarily automated, but some overview by M.D.s
and genetic counselors on staff, and feedback
from the community
20Terms from UMLS/OMIM/GTR/ClinVAr
21HierarchieS curated by GTR staff
Guided by OMIMs clinical series and user feedback
22Hierarchies computed from nodes in UMLS
23Hierarchy from DNA Repair Deficiency Disorders
24Using HPO for Clinical features
- Partial display
- Organized by top nodes of the ontology
- Each specific term supports a link to disorders
manifesting that feature
25Clinvar reported variation-phenotype relationship
26Clinvar reported variation-phenotype relationship
- Submitter archive (not curated)
- Variant
- Disease and/or phenotypes
- Interpretation
- Confidence
27Subset of a detailed record
- Gene name and symbol
- Sequence ontology for molecular and functional
consequences - Diseases
- Identifiers and links
- Observed phenotypes (as distinct from those
reported to be characteristic of the diagnostic
term) - Protein change from the variant
28Data sources and growth
29Submissions from UniProt
Summarize submissions by genes, diseases, and
phenotypes
30Current status clinGen-related
- Diseases
- Genes
- Variants
- Predictions
- Conserved sequence
- Conserved domains
- Pathways
http//www.clinicalgenome.org/
31Phenotype anD clingen/ClinVAr
- Working group on phenotype
- Make distinctions among
- Disease category (body system, metabolic
perturbation, cancer) - Diagnosis
- Characteristic features
- General or gene-specific
- Diseases targeted by drugs for which the response
is genetically determined - Observed phenotypes
- HPO
- PhenoDB
- Indications for testing
- Standardization
- One ontology or many?
- Relationship to OMIM
32Variation and CliNGEN/ClinVar
- Sequence Ontology for variant location and effect
- Coordinate with PharmGKB for pharmacogenomics
- Description of haplotypes
- No discussion yet about authorities for pathways,
conserved domains, post-translational
modifications
33Current Status NCBI
- Working with UMLS to improve representation of
terms and relationships - Mapping concepts
- Reporting relationships
- Supplement current UMLS with HPO, Orphanet (ORDO,
in progress), and recent data from OMIM - Working with Clinical Pharmacogenetics
Implementation Consortium (CPIC) and PharmGKB - Representation of haplotypes/star alleles
- Drug responses/Disease target
- Consumer of ontologies to standardize
terminology, with definitions - Link to resource site
- Provide attribution
- Support term-specific queries
34Current Status NCBI
- Queries currently term by term, not by node
- Some relationships based on links in Entrez
- Gene lt-gtdisease
- Disease lt-gtclinical feature
- Variation lt-gt gene
- Some relationships explicit
- Genome-gttranscript-gtprotein
- Nucleotide change-gtprotein change
- Some relationships reported as hierarchies
- GTR
- MedGen (MeSH)
- ORDO (in progress)
35Current Status NCBI
- Maintenance
- primarily automatic
- Some curatorial review by staff of ClinVar and
NIH Genetic Testing Registry (GTR) - Expect expanded review from the ClinGen group
- Data freely available by ftp or E-utilities
- ftp//ftp.ncbi.nih.gov/pub/clinvar/
- ftp//ftp.ncbi.nih.gov/gene/
- ftp//ftp.ncbi.nih.gov/pub/GTR/
- ftp//ftp.ncbi.nih.gov/pub/medgen/
36Acknowledgements
Slava Gorelenkov MedGen
Melissa Landrum ClinVar
Jennifer Lee GTR, ClinVar
Terence Murphy Gene
Lon Phan dbSNP/dbVar
Kim Pruitt RefSeq
Wendy Rubinstein GTR, MedGen
Ming Ward dbSNP
and all their staff and all their staff