Donna Maglott, PH.D. - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Donna Maglott, PH.D.

Description:

NCBI database/Resource. Used in. Diseases and their defining features. MedGen (Diseases, Findings ... Some user communities think in terms of nucleotide only. – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 37
Provided by: buf57
Category:

less

Transcript and Presenter's Notes

Title: Donna Maglott, PH.D.


1
PRO and Medical Genetics resources at NCBI
  • Donna Maglott, PH.D.

2
Opportunities
  • The medical genetics group is a relatively recent
    addition to the suite of resources at NCBI, and
    manages the NIH Genetic Testing Registry (GTR),
    ClinVar, and MedGen.  These databases share the
    need to standardize representation of genes,
    proteins, small molecules, variation, conditions,
    and phenotypes, not only with respect to explicit
    terms, but also the relationships among those
    terms. This presentation will focus on
    opportunities for utilization of PRO in the
    NCBIs Medical Genetics group.

3
Case studies
  • Medical genetics ClinVar, Gene, GTR, MedGen

4
A quick tour
From the home page
5
Using the resource sections
6
Try all sections
7
Try all sections
8
major domains of information
Concept NCBI database/Resource Used in
Diseases and their defining features MedGen (Diseases, Findings) ClinVar, dbVar, Gene, GTR, PheGenI, dbGaP
Drugs MedGen (Pharmacologic Substance) ClinVar, GTR
Genes and gene products Gene, Nucleotide, Protein, HomoloGene, RefSeq ClinVar, dbSNP, dbVar, GTR
Biological processes, cellular components, molecular functions --- Gene
Interactions and pathways Biosystems, Gene Biosystems, Gene
Variation ClinVar, dbSNP, dbVar ClinVar, dbSNP, dbVar
Records connected by reciprocal, generic links
via database identifiers
9
Some Talking points
  • Except for RefSeq, curation minimal
  • RefSeq-based with pointers to UniProtKB
  • Use ontologies to acquire and represent standard
    terms
  • Point to ontologies, but not used to support
    node-based query interfaces
  • Capturing primary data that can be used to drive
    development of ontologies
  • Some user communities think in terms of
    nucleotide only
  • Data being submitted with uncertain significance
  • Look for opportunities for adding value to NCBIs
    databases and tools

10
Gene and data standards
  • Name of the gene (nomenclature committees)
  • Names of protein products
  • Primary product (Swiss-Prot)
  • Isoforms (RefSeq)
  • Names of associated conditions (multiple)
  • Descriptions of pathways (submitters)
  • Biological processes, cellular components and
    molecular functions (GO)
  • HIV interactions (NIAID)
  • http//www.ncbi.nlm.nih.gov/gene?termhiv1interact
    ionsProperties
  • http//www.ncbi.nlm.nih.gov/projects/RefSeq/HIVInt
    eractions/

11
Human mismatch repair
12
Restrict to those reported to be disease-causing
13
www.ncbi.nlm.nih.gov/gene/4292
Summary Bibliography Interactions Pathways Gene
Ontology General protein information Reference
sequences Locus-specific databases
Phrase found in
14
Titles of pathways Descriptions of interactions
15
(No Transcript)
16
Genelt-gtProtein
17
HomoloGene
18
Diseases and phenotypes
  • MedGen UMLS, HPO, OMIM, ORDO, GTR

19
Why MedGen?
  • A stable node of identifiers within NCBI for
    disease names, their clinical features, and
    pharmacological substances
  • Built on the foundation of a subset of UMLS, with
    supplements from HPO, OMIM (between UMLS
    releases), and submissions to GTR and ClinVar
  • Primarily automated, but some overview by M.D.s
    and genetic counselors on staff, and feedback
    from the community

20
Terms from UMLS/OMIM/GTR/ClinVAr
21
HierarchieS curated by GTR staff
Guided by OMIMs clinical series and user feedback
22
Hierarchies computed from nodes in UMLS
23
Hierarchy from DNA Repair Deficiency Disorders
24
Using HPO for Clinical features
  • Partial display
  • Organized by top nodes of the ontology
  • Each specific term supports a link to disorders
    manifesting that feature

25
Clinvar reported variation-phenotype relationship
26
Clinvar reported variation-phenotype relationship
  • Submitter archive (not curated)
  • Variant
  • Disease and/or phenotypes
  • Interpretation
  • Confidence

27
Subset of a detailed record
  • Gene name and symbol
  • Sequence ontology for molecular and functional
    consequences
  • Diseases
  • Identifiers and links
  • Observed phenotypes (as distinct from those
    reported to be characteristic of the diagnostic
    term)
  • Protein change from the variant

28
Data sources and growth
29
Submissions from UniProt
Summarize submissions by genes, diseases, and
phenotypes
30
Current status clinGen-related
  • Diseases
  • Genes
  • Variants
  • Predictions
  • Conserved sequence
  • Conserved domains
  • Pathways

http//www.clinicalgenome.org/
31
Phenotype anD clingen/ClinVAr
  • Working group on phenotype
  • Make distinctions among
  • Disease category (body system, metabolic
    perturbation, cancer)
  • Diagnosis
  • Characteristic features
  • General or gene-specific
  • Diseases targeted by drugs for which the response
    is genetically determined
  • Observed phenotypes
  • HPO
  • PhenoDB
  • Indications for testing
  • Standardization
  • One ontology or many?
  • Relationship to OMIM

32
Variation and CliNGEN/ClinVar
  • Sequence Ontology for variant location and effect
  • Coordinate with PharmGKB for pharmacogenomics
  • Description of haplotypes
  • No discussion yet about authorities for pathways,
    conserved domains, post-translational
    modifications

33
Current Status NCBI
  • Working with UMLS to improve representation of
    terms and relationships
  • Mapping concepts
  • Reporting relationships
  • Supplement current UMLS with HPO, Orphanet (ORDO,
    in progress), and recent data from OMIM
  • Working with Clinical Pharmacogenetics
    Implementation Consortium (CPIC) and PharmGKB
  • Representation of haplotypes/star alleles
  • Drug responses/Disease target
  • Consumer of ontologies to standardize
    terminology, with definitions
  • Link to resource site
  • Provide attribution
  • Support term-specific queries

34
Current Status NCBI
  • Queries currently term by term, not by node
  • Some relationships based on links in Entrez
  • Gene lt-gtdisease
  • Disease lt-gtclinical feature
  • Variation lt-gt gene
  • Some relationships explicit
  • Genome-gttranscript-gtprotein
  • Nucleotide change-gtprotein change
  • Some relationships reported as hierarchies
  • GTR
  • MedGen (MeSH)
  • ORDO (in progress)

35
Current Status NCBI
  • Maintenance
  • primarily automatic
  • Some curatorial review by staff of ClinVar and
    NIH Genetic Testing Registry (GTR)
  • Expect expanded review from the ClinGen group
  • Data freely available by ftp or E-utilities
  • ftp//ftp.ncbi.nih.gov/pub/clinvar/
  • ftp//ftp.ncbi.nih.gov/gene/
  • ftp//ftp.ncbi.nih.gov/pub/GTR/
  • ftp//ftp.ncbi.nih.gov/pub/medgen/

36
Acknowledgements
Slava Gorelenkov MedGen
Melissa Landrum ClinVar
Jennifer Lee GTR, ClinVar
Terence Murphy Gene
Lon Phan dbSNP/dbVar
Kim Pruitt RefSeq
Wendy Rubinstein GTR, MedGen
Ming Ward dbSNP
and all their staff and all their staff
Write a Comment
User Comments (0)
About PowerShow.com