Title: Disease Informatics: Brush up the terms describing techniques and resources
1Disease Informatics Brush up the terms
describing techniques and resources
Half knowledge is always dangerous
2Wet lab
- A laboratory allowing for hands-on scientific
research and equipped with - Appropriate plumbing
- Ventilation
- Equipment
3High-throughput technology
- The technology handling high volume of data or
material - Large-scale methods to purify, identify, and
characterize DNA, RNA, proteins and other
molecules. These methods are usually automated,
allowing rapid analysis of very large numbers of
samples.
4Microarray
- A tool used to sift through and analyze the
information contained within a genome. A
microarray consists of different nucleic acid
probes that are chemically attached to a
substrate, which can be a microchip, a glass
slide or a microsphere-sized bead.
5DNA microarray
- A microarray of immobilized single-stranded DNA
fragments of known nucleotide sequence that is
used especially in the identification and
sequencing of DNA samples and in the analysis of
gene expression (as in a cell or tissue)
6Protein microarray
- Protein microarray is a piece of glass on which
different molecules of protein have been affixed
at separate locations in an ordered manner thus
forming a microscopic array.
7Mass spectrometry
- An instrumental method for identifying the
chemical constitution of a substance by means of
the separation of gaseous ions according to their
differing mass and charge -- called also mass
spectroscopy - Mass spectrometry A method used to determine the
masses of atoms or molecules in which an
electrical charge is placed on the molecule and
the resulting ions are separated by their mass to
charge
8Tandem mass spectrometry
- Multiple steps of mass spectrometry selection,
with some form of fragmentation occurring in
between the stages - Immunofluorescence and immunocytochemistry,
ELISA, immunoblotting
9Dry lab
- A laboratory for making computer simulations or
for data analysis especially by computers (as in
bioinformatics)called also dry laboratory
10Gene prioritization
- The results of experimental or computational
analyses in the post-genomic era (e.g., those
from microarrays, proteomics, ChIP-chip,
genome-wide in silico searches, genetic linkages,
etc.) often consist of long lists of candidate
genes. There are methods that provide score to
the gene and rank them. This process is known as
gene prioritization.
11PhenoGO
- PhenoGO is a multiorganism database that provides
phenotypic context, such as the cell type,
disease, and tissue and organ to existing
associations between gene products and Gene
Ontology (GO) terms as specified in the Gene
Ontology Annotations (GOA).
12BioMedLEE
- One existing Natural Language Processing (NLP)
system, known as BioMedLEE, automatically
extracts biological information consisting of
bio-molecular substances and phenotypic data.
13MeSH
- Medical Subject Heading
- MeSH is the National Library of Medicine's
controlled vocabulary thesaurus. It consists of
sets of terms naming descriptors in a
hierarchical structure that permits searching at
various levels of specificity.
14PhenOS
- Phenotype Organizer System, PhenOS is a system
under development by the Lussier research group
with purpose of bridging the gap between
heterogeneous biomedical terminologies.
15Inparanoid algorithm
- The protein interaction networks of two species
are aligned by assigning proteins to sequence
homology clusters using the Inparanoid algorithm
16POCUS
- Prioritization of candidate genes using
statistics - Reference Turner FS, Clutterbuck DR, Semple CA.
POCUS mining genomic sequence annotation to
predict disease genes. Genome Biol.
20034(11)R75.
17OMIM
- Mendelian Inheritance in Man
- The Online Mendelian Inheritance in Man. A
catalog of human genes and genetic disorders
authored and edited by Dr. Victor A. McKusick and
his colleagues at Johns Hopkins and elsewhere,
and provided through NCBI. The database contains
information on disease phenotypes and genes,
including extensive descriptions, gene names,
inheritance patterns, map locations and gene
polymorphisms.
18TOM
- A web-based integrated approach for
identification of candidate disease genes,
Transcriptomics of OMIM - Reference Rossi S, Masotti D, Nardini C, Bonora
E, Romeo G, Macii E, Benini L, Volinia S. TOM a
web-based integrated approach for identification
of candidate disease genes. Nucleic Acids Res.
2006 Jul 134
19Data mining
- Data mining (sometimes called data or knowledge
discovery) is the process of analyzing data from
different perspectives and summarizing it into
useful information
20Online Predicted Human Interactions Database or
OPHID
- Designed to be both a resource for the laboratory
scientist to explore known and predicted
protein-protein interactions, and to facilitate
bioinformatics initiatives exploring protein
interaction networks.
21Single nucleotide polymorphisms (SNPs)
- A single nucleotide polymorphism (SNP, pronounced
snip), is a DNA sequence variation occurring when
a single nucleotide - A, T, C, or G - in the
genome (or other shared sequence) differs between
members of a species (or between paired
chromosomes in an individual).
22Synonymous - nonsynonymous substitutions
- Substitutions that result in amino acid
replacements are said to be nonsynonymous while
substitutions that do not cause an amino acid
replacement (such as a GGG to GGC change - both
codons still encode glycine) are said to be
synonymous substitutions. Because of the
difference in their effects on the physiology of
the organism, synonymous and nonsynonymous
substitutions can have quite different dynamics.
For example, synonymous substitutions usually
occur at a much faster rate than do nonsynonymous
substitutions. Hence, for coding sequence it is
often desirable to separate these two.
23Ka/Ks values
- In genetics, the Ka/Ks ratio or dN/dS ratio is
the ratio of the rate of non-synonymous
substitutions (Ka) to the rate of synonymous
substitutions (Ks), which can be used as an
indication of selection on a protein-coding gene.
24dbSNP
- db (Database) of Single nucleotide polymorphism
- A public-domain archive for a broad collection of
Single Nucleotide Polymorphisms (SNPs) and is
hosted at the National Center for Biotechnology
Information.
25Orthodisease
- OrthoDisease, a comprehensive database of model
organism genes that are orthologous to human
disease genes - Orthodisease is constructed primarily using
Inparanoid analysis. Inparanoid is a program that
automatically detects orthologs (or groups of
orthologs) from 2 species
26Field Biology
- Biology of organisms living in their natural
environments - Applications in Ecology and Evolutionary Biology
27Epidemiology
- Epidemiology is the study of how often disease
occur in different groups of people and why - Planning and evaluating strategies to prevent
illness - Guide to the management of patients in whom
disease is already developed - Reference Epidemiology for the uninitiated by
Coggon, Rose and Barker
28Population at risk
- The population at risk is the group of people,
healthy or sick, who would be counted as cases if
they had the disease being studied - It defines the denominator for the calculation of
rates of incidences and prevalence - It is the number of persons potentially capable
of experiencing the event or outcome of interest
29Floating numerator
- Numerator floating without its denominator
- Common error occurring in field investigations
- The error occurs due to the number of cases not
relating to the at risk population - Epidemiological conclusions (on risk) cannot be
drawn from purely clinical data (on the number of
sick people seen)
30Target population
- It is the population about which the conclusions
are to be drawn - Sometimes measurement can be made on the full
target population else study samples are used
31Study population and study sample
- The group of individuals in a study
- In a clinical trial, the participants make up the
study population - Study sample is chosen from study population
32Aetiology
- The study of the factors that predispose to or
precipitate the disease - External agent, a susceptible host, and an
environment that brings the host and agent
together is a disease etiology triad
33Surveillance
- Watching over a population and recording data
likely to have epidemiological significance,
usually with the aim of early detection of
disease. Essentially an interventionist exercise
compared with monitoring, which is passive.
34Case
- Disease in populations exists as a continuum of
severity rather than as an all or none phenomenon - The real question in population studies is not
has the person got the disease? but How much
of the disease has he or she got? - Diagnostic continuum is dichotomized into cases
and non-cases on the basis of statistical,
clinical, prognostic or operational options - Hence case definition should be precise and
unambiguous. - Epidemiological case definitions are narrower and
more rigid than clinical ones
35Incidence
- It is the rate at which new cases occur in a
population during a specified period - (number of new cases) / (Population at risk)
(Time during which cases were ascertained)
36Prevalence
- Point prevalence
- The proportion of a population that are cases at
a point in time - Period prevalence
- The proportion of a population that are cases at
any time within a stated period
37Attributable risk and relative risk
- Attributable risk is the disease rate in exposed
persons to that in people who are unexposed - Relative risk is the ratio of the disease rate in
exposed persons to that in people who are
unexposed - Attributable risk rate of disease in unexposed
persons (relative risk 1)
38Confounding
- Causing confusion about causation due to 2 or
more variables associated with the disease - Confounding may give rise to spurious
associations when in fact there is no causal
relation, or at other extreme, it may obscure the
effects of a true cause
39Bias
- Bias is the deviation of inferences from the
truth - Selection bias is the biased selection of
individuals into the study - Information bias is the biased collection or
biased analysis of the data - Motto of the epidemiologist could well be dirty
hands but a clean mind (manus sordidae, mens
pura)
40Chance
- A measure of how likely it is that some event
will occur - Random, unpredictable influences on events
- The association between the exposure and disease
is considered to be statistically significant
if the probability that the test statistic lt 0.05
41Sensitivity
- The proportion of persons with the disease who
are correctly identified by defined criteria - The proportion of persons with the disease who
are correctly identified by a screening test - The ability of a system to detect epidemics and
other changes in disease occurrence - A sensitive test detects high proportion of the
true cases
42Specificity
- The proportion of persons without a disease who
are correctly identified by a test - The number of true negative results divided by
the total number of all those without the disease
43Randomization
- Randomization is used to obtain a similar
allocation of individuals to each group, the
groups are followed at the same time - Purpose of randomization To obtain unbiased
estimates of differences among treatment
responses (means or effects) and to obtain an
unbiased estimate of the random error variation
in the experiment
44Replication and Local control
- Replication is the repetition of an experiment in
order to test the validity of its conclusion - Local control is blocking or grouping to
eliminate or to control the various sources of
variation (error) - Replication and local control are necessary to
achieve a reduction in the random variation among
treatment effects in the experiment
45 Observational (non-experimental) studies
- Person-level unit of observation
- 1. Longitudinal measurements
- a. Cohort samples
- b. Case control samples
- 2. Cross-sectional measurements
- Aggregate level units of observation (ecological
studies) - Reference Epidemiology Kept Simple An
Introduction to Traditional and Modern
Epidemiology by B. Burt Gerstman
46Personal-level vs. Aggregate-level
- Personal level study on smoking might collect
information on each persons smoking habits, age
and disease status - Aggregate level of study on smoking might collect
information on each regions per capita cigarette
consumption, age distribution and disease rate
47Longitudinal studies
- Longitudinal studies are studies in which the
sequence of events in individuals can be
delineated over time - In cohort studies the incidence of disease in
exposed and non-exposed groups are compared - In case-control studies people with disease
(cases) and people without disease (controls) are
sampled from the source population and exposure
histories of cases and controls are compared
48Longitudinal vs. Cross sectional studies
- Longitudinal measurements relates exposures and
diseases in individuals at various time
references - Cross-sectional measurements are not definitively
time sequenced in individuals - In cross-sectional studies the analysis of data
is gathered from samples at one point in time.
Since both the outcome and the variables are
measured at the one time these studies are not
strong at showing cause-effect relationships.
49Experimental studies
- In experimental studies, the investigator
introduces or removes an exposure in order to
observe its influence on a health outcome. Such
allocations may be based on chance mechanism
(randomized trials) or on other deliberate
mechanisms built into the studys protocol
(non-randomized trials)
50Other disease informatics lectures Supercourse
Epidemiology, the Internet and Global
Health Lecture numbers 31981, 30331, 28921,
25381, 25371, and 34011
Thank you