Disease Informatics: Brush up the terms describing techniques and resources

1 / 50
About This Presentation
Title:

Disease Informatics: Brush up the terms describing techniques and resources

Description:

... often consist of long lists of candidate ... Medical Subject Heading ... with purpose of bridging the gap between heterogeneous biomedical terminologies. ... –

Number of Views:29
Avg rating:3.0/5.0
Slides: 51
Provided by: msc3
Category:

less

Transcript and Presenter's Notes

Title: Disease Informatics: Brush up the terms describing techniques and resources


1
Disease Informatics Brush up the terms
describing techniques and resources
  • R. P. Deolankar

Half knowledge is always dangerous
2
Wet lab
  • A laboratory allowing for hands-on scientific
    research and equipped with
  • Appropriate plumbing
  • Ventilation
  • Equipment

3
High-throughput technology
  • The technology handling high volume of data or
    material
  • Large-scale methods to purify, identify, and
    characterize DNA, RNA, proteins and other
    molecules. These methods are usually automated,
    allowing rapid analysis of very large numbers of
    samples.

4
Microarray
  • A tool used to sift through and analyze the
    information contained within a genome. A
    microarray consists of different nucleic acid
    probes that are chemically attached to a
    substrate, which can be a microchip, a glass
    slide or a microsphere-sized bead.

5
DNA microarray
  • A microarray of immobilized single-stranded DNA
    fragments of known nucleotide sequence that is
    used especially in the identification and
    sequencing of DNA samples and in the analysis of
    gene expression (as in a cell or tissue)

6
Protein microarray
  • Protein microarray is a piece of glass on which
    different molecules of protein have been affixed
    at separate locations in an ordered manner thus
    forming a microscopic array.

7
Mass spectrometry
  • An instrumental method for identifying the
    chemical constitution of a substance by means of
    the separation of gaseous ions according to their
    differing mass and charge -- called also mass
    spectroscopy
  • Mass spectrometry A method used to determine the
    masses of atoms or molecules in which an
    electrical charge is placed on the molecule and
    the resulting ions are separated by their mass to
    charge

8
Tandem mass spectrometry
  • Multiple steps of mass spectrometry selection,
    with some form of fragmentation occurring in
    between the stages
  • Immunofluorescence and immunocytochemistry,
    ELISA, immunoblotting

9
Dry lab
  • A laboratory for making computer simulations or
    for data analysis especially by computers (as in
    bioinformatics)called also dry laboratory

10
Gene prioritization
  • The results of experimental or computational
    analyses in the post-genomic era (e.g., those
    from microarrays, proteomics, ChIP-chip,
    genome-wide in silico searches, genetic linkages,
    etc.) often consist of long lists of candidate
    genes. There are methods that provide score to
    the gene and rank them. This process is known as
    gene prioritization.

11
PhenoGO
  • PhenoGO is a multiorganism database that provides
    phenotypic context, such as the cell type,
    disease, and tissue and organ to existing
    associations between gene products and Gene
    Ontology (GO) terms as specified in the Gene
    Ontology Annotations (GOA).

12
BioMedLEE
  • One existing Natural Language Processing (NLP)
    system, known as BioMedLEE, automatically
    extracts biological information consisting of
    bio-molecular substances and phenotypic data.

13
MeSH
  • Medical Subject Heading
  • MeSH is the National Library of Medicine's
    controlled vocabulary thesaurus. It consists of
    sets of terms naming descriptors in a
    hierarchical structure that permits searching at
    various levels of specificity.

14
PhenOS
  • Phenotype Organizer System, PhenOS is a system
    under development by the Lussier research group
    with purpose of bridging the gap between
    heterogeneous biomedical terminologies.

15
Inparanoid algorithm
  • The protein interaction networks of two species
    are aligned by assigning proteins to sequence
    homology clusters using the Inparanoid algorithm

16
POCUS
  • Prioritization of candidate genes using
    statistics
  • Reference Turner FS, Clutterbuck DR, Semple CA.
    POCUS mining genomic sequence annotation to
    predict disease genes. Genome Biol.
    20034(11)R75.

17
OMIM
  • Mendelian Inheritance in Man
  • The Online Mendelian Inheritance in Man. A
    catalog of human genes and genetic disorders
    authored and edited by Dr. Victor A. McKusick and
    his colleagues at Johns Hopkins and elsewhere,
    and provided through NCBI. The database contains
    information on disease phenotypes and genes,
    including extensive descriptions, gene names,
    inheritance patterns, map locations and gene
    polymorphisms.

18
TOM
  • A web-based integrated approach for
    identification of candidate disease genes,
    Transcriptomics of OMIM
  • Reference Rossi S, Masotti D, Nardini C, Bonora
    E, Romeo G, Macii E, Benini L, Volinia S. TOM a
    web-based integrated approach for identification
    of candidate disease genes. Nucleic Acids Res.
    2006 Jul 134

19
Data mining
  • Data mining (sometimes called data or knowledge
    discovery) is the process of analyzing data from
    different perspectives and summarizing it into
    useful information

20
Online Predicted Human Interactions Database or
OPHID
  • Designed to be both a resource for the laboratory
    scientist to explore known and predicted
    protein-protein interactions, and to facilitate
    bioinformatics initiatives exploring protein
    interaction networks.

21
Single nucleotide polymorphisms (SNPs)
  • A single nucleotide polymorphism (SNP, pronounced
    snip), is a DNA sequence variation occurring when
    a single nucleotide - A, T, C, or G - in the
    genome (or other shared sequence) differs between
    members of a species (or between paired
    chromosomes in an individual).

22
Synonymous - nonsynonymous substitutions
  • Substitutions that result in amino acid
    replacements are said to be nonsynonymous while
    substitutions that do not cause an amino acid
    replacement (such as a GGG to GGC change - both
    codons still encode glycine) are said to be
    synonymous substitutions. Because of the
    difference in their effects on the physiology of
    the organism, synonymous and nonsynonymous
    substitutions can have quite different dynamics.
    For example, synonymous substitutions usually
    occur at a much faster rate than do nonsynonymous
    substitutions. Hence, for coding sequence it is
    often desirable to separate these two.

23
Ka/Ks values
  • In genetics, the Ka/Ks ratio or dN/dS ratio is
    the ratio of the rate of non-synonymous
    substitutions (Ka) to the rate of synonymous
    substitutions (Ks), which can be used as an
    indication of selection on a protein-coding gene.

24
dbSNP
  • db (Database) of Single nucleotide polymorphism
  • A public-domain archive for a broad collection of
    Single Nucleotide Polymorphisms (SNPs) and is
    hosted at the National Center for Biotechnology
    Information.

25
Orthodisease
  • OrthoDisease, a comprehensive database of model
    organism genes that are orthologous to human
    disease genes
  • Orthodisease is constructed primarily using
    Inparanoid analysis. Inparanoid is a program that
    automatically detects orthologs (or groups of
    orthologs) from 2 species

26
Field Biology
  • Biology of organisms living in their natural
    environments
  • Applications in Ecology and Evolutionary Biology

27
Epidemiology
  • Epidemiology is the study of how often disease
    occur in different groups of people and why
  • Planning and evaluating strategies to prevent
    illness
  • Guide to the management of patients in whom
    disease is already developed
  • Reference Epidemiology for the uninitiated by
    Coggon, Rose and Barker

28
Population at risk
  • The population at risk is the group of people,
    healthy or sick, who would be counted as cases if
    they had the disease being studied
  • It defines the denominator for the calculation of
    rates of incidences and prevalence
  • It is the number of persons potentially capable
    of experiencing the event or outcome of interest

29
Floating numerator
  • Numerator floating without its denominator
  • Common error occurring in field investigations
  • The error occurs due to the number of cases not
    relating to the at risk population
  • Epidemiological conclusions (on risk) cannot be
    drawn from purely clinical data (on the number of
    sick people seen)

30
Target population
  • It is the population about which the conclusions
    are to be drawn
  • Sometimes measurement can be made on the full
    target population else study samples are used

31
Study population and study sample
  • The group of individuals in a study
  • In a clinical trial, the participants make up the
    study population
  • Study sample is chosen from study population

32
Aetiology
  • The study of the factors that predispose to or
    precipitate the disease
  • External agent, a susceptible host, and an
    environment that brings the host and agent
    together is a disease etiology triad

33
Surveillance
  • Watching over a population and recording data
    likely to have epidemiological significance,
    usually with the aim of early detection of
    disease. Essentially an interventionist exercise
    compared with monitoring, which is passive.

34
Case
  • Disease in populations exists as a continuum of
    severity rather than as an all or none phenomenon
  • The real question in population studies is not
    has the person got the disease? but How much
    of the disease has he or she got?
  • Diagnostic continuum is dichotomized into cases
    and non-cases on the basis of statistical,
    clinical, prognostic or operational options
  • Hence case definition should be precise and
    unambiguous.
  • Epidemiological case definitions are narrower and
    more rigid than clinical ones

35
Incidence
  • It is the rate at which new cases occur in a
    population during a specified period
  • (number of new cases) / (Population at risk)
    (Time during which cases were ascertained)

36
Prevalence
  • Point prevalence
  • The proportion of a population that are cases at
    a point in time
  • Period prevalence
  • The proportion of a population that are cases at
    any time within a stated period

37
Attributable risk and relative risk
  • Attributable risk is the disease rate in exposed
    persons to that in people who are unexposed
  • Relative risk is the ratio of the disease rate in
    exposed persons to that in people who are
    unexposed
  • Attributable risk rate of disease in unexposed
    persons (relative risk 1)

38
Confounding
  • Causing confusion about causation due to 2 or
    more variables associated with the disease
  • Confounding may give rise to spurious
    associations when in fact there is no causal
    relation, or at other extreme, it may obscure the
    effects of a true cause

39
Bias
  • Bias is the deviation of inferences from the
    truth
  • Selection bias is the biased selection of
    individuals into the study
  • Information bias is the biased collection or
    biased analysis of the data
  • Motto of the epidemiologist could well be dirty
    hands but a clean mind (manus sordidae, mens
    pura)

40
Chance
  • A measure of how likely it is that some event
    will occur
  • Random, unpredictable influences on events
  • The association between the exposure and disease
    is considered to be statistically significant
    if the probability that the test statistic lt 0.05

41
Sensitivity
  • The proportion of persons with the disease who
    are correctly identified by defined criteria
  • The proportion of persons with the disease who
    are correctly identified by a screening test
  • The ability of a system to detect epidemics and
    other changes in disease occurrence
  • A sensitive test detects high proportion of the
    true cases

42
Specificity
  • The proportion of persons without a disease who
    are correctly identified by a test
  • The number of true negative results divided by
    the total number of all those without the disease

43
Randomization
  • Randomization is used to obtain a similar
    allocation of individuals to each group, the
    groups are followed at the same time
  • Purpose of randomization To obtain unbiased
    estimates of differences among treatment
    responses (means or effects) and to obtain an
    unbiased estimate of the random error variation
    in the experiment

44
Replication and Local control
  • Replication is the repetition of an experiment in
    order to test the validity of its conclusion
  • Local control is blocking or grouping to
    eliminate or to control the various sources of
    variation (error)
  • Replication and local control are necessary to
    achieve a reduction in the random variation among
    treatment effects in the experiment

45
Observational (non-experimental) studies
  • Person-level unit of observation
  • 1. Longitudinal measurements
  • a. Cohort samples
  • b. Case control samples
  • 2. Cross-sectional measurements
  • Aggregate level units of observation (ecological
    studies)
  • Reference Epidemiology Kept Simple An
    Introduction to Traditional and Modern
    Epidemiology by B. Burt Gerstman

46
Personal-level vs. Aggregate-level
  • Personal level study on smoking might collect
    information on each persons smoking habits, age
    and disease status
  • Aggregate level of study on smoking might collect
    information on each regions per capita cigarette
    consumption, age distribution and disease rate

47
Longitudinal studies
  • Longitudinal studies are studies in which the
    sequence of events in individuals can be
    delineated over time
  • In cohort studies the incidence of disease in
    exposed and non-exposed groups are compared
  • In case-control studies people with disease
    (cases) and people without disease (controls) are
    sampled from the source population and exposure
    histories of cases and controls are compared

48
Longitudinal vs. Cross sectional studies
  • Longitudinal measurements relates exposures and
    diseases in individuals at various time
    references
  • Cross-sectional measurements are not definitively
    time sequenced in individuals
  • In cross-sectional studies the analysis of data
    is gathered from samples at one point in time.
    Since both the outcome and the variables are
    measured at the one time these studies are not
    strong at showing cause-effect relationships.

49
Experimental studies
  • In experimental studies, the investigator
    introduces or removes an exposure in order to
    observe its influence on a health outcome. Such
    allocations may be based on chance mechanism
    (randomized trials) or on other deliberate
    mechanisms built into the studys protocol
    (non-randomized trials)

50
Other disease informatics lectures Supercourse
Epidemiology, the Internet and Global
Health Lecture numbers 31981, 30331, 28921,
25381, 25371, and 34011
Thank you
Write a Comment
User Comments (0)
About PowerShow.com