Title: CRITICAL ANALYSIS
1CRITICAL ANALYSIS
- WHICH RESEARCH DESIGN FOR
- WHICH CLINICAL PROBLEM?
2(No Transcript)
3 4 5(No Transcript)
6Appraising a Clinical Experimental Study
- Population/Subjects
- What was the source population?
- What were the inclusion/exclusion criteria?
- Were they be a representative and relevant
sample? - How long was the follow-up period?
7- Are the Results of the Study Valid?
- Randomization? Was randomization list hidden?
- Were baseline characteristics of the groups same
at start? - Was there an intention-to-treat analysis?
- Were interventions outcomes clearly defined
replicable? - How complete was blinding? Assessed at end?
- Apart from the experimental intervention, were
the groups treated equally ? i.e. same
co-interventions? - Was comparison group contaminated with main
interventions? - Was compliance with interventions
measured/assured? - Were all accounted for at end? (was follow-up
complete) - Was follow-up time sufficient to detect relevant
outcomes?
8- Results
- How large were the intervention effects?
- what measure(s) of 'event rate' or outcome were
used? - What was the NNT or NNH ?
- How accurate were estimations of the intervention
effects - e.g. p-values, confidence intervals
- How large were the intervention effects?
- Did the study have sufficient power?
9- Applicability and Conclusions
- Applicability Relevance?
- (to your patients, and is the treatment
feasible available) - Were all important outcomes considered?
- Are the likely treatment benefits worth the
potential harm costs? (adverse effects) - Strengths and weaknesses?
10Appraising a Diagnostic Study
- Population/Subjects/Setting
- What was the source population tested?
- What were the inclusion/exclusion criteria?
- Were subjects a representative and relevant
sample? - How did they recruit subjects?
11- Validity
- Was there a Comparison with a 'Gold Standard
Test' - How did they define 'caseness' to be detected by
the test? - If is no 'Gold Standard', can test be validated
in other ways? - Was there blinding of Subjects and of
Investigators to theory? - How thorough was this and was it assessed at the
end? - Was Sample Size OK re Power?
- Did all subjects get both new test and Gold
Standard test? - Was there testing by 2 independent investigators?
- Was there planning for any adverse effects
dropouts - Statistical analysis sensible?
- Test-retest issues discussed?
12- Conclusions
- Sensitivity - Proportion of true positives
identified by a test or by epidemiological
screening. - Specificity - Proportion of true negatives
identified by a test or by epidemiological
screening - Did the test work as well as Gold Standard?
- Benefits vs harm?
- Relevance? Practicality in the real world?
- Are the likely clinical benefits worth the
potential harm costs? (e.g. adverse effects) - Strengths and weaknesses?
- How could it be improved?
13APPRAISING A CAUSATION STUDY
- Population/Subjects
- What is the source population being studied?
- Did they define 'exposed' group vs 'comparison'
group (cohort study) - Or define controls (case-control study) - any
randomisation? - What were the inclusion/exclusion criteria?
- Were subjects a representative and relevant
sample? - How did they recruit subjects and
comparisons/controls?
14- Basic Structure of Study
- Cohort study?
- A Longitudinal study in which groups of
people are interviewed repeatedly over a period
of time - respondents usually share a common
characteristic. Where the same group of people
are followed up over time this is known as a
cohort study. If a group of different people are
interviewed in each wave a survey this is known
as a trend design. - Case-control study? (did exposure precede
outcome?) - Cross-sectional study? (did exposure precede
outcome?) - Did Researchers Define
- The causal factor studied - is their theory
sensible? - The 'outcome' caused by causal factor?
- Often the Risk Ratio is discussed (A comparison
of the risk of some health-related event such as
disease or death in two groups) - Was there Blinding?
- Re the hypothesis - ideally both subjects
assessors - How good was this and was it assessed at the end?
15- Data Validity
- Was Sample Size Ok re Power
- Did they follow-up long enough?
- How did they allow for and manage dropouts?
- Significance? C.I.s? dose-response? Specificity?
- Conclusions
- Relevance usefulness?
- Strengths and Weaknesses of study?
- How could it be improved?
16APPRAISING A PROGNOSIS STUDY
- A Prognostic Factor is a patient characteristic
that can predict the patient's eventual outcome - a demographic e.g. sex, age, race
- disease-specific e.g. tumour stage, symptom
pattern - comorbidity other co-existing conditions
- Articles that report prognostic factors often use
two independent patient samples - derivation sets asks - "what factors might
predict patient outcomes?" - validation sets ask - "do these prognostic
factors predict patient outcomes accurately?"
17- Methods
- Design? (cohort / case series / prospective vs.
retrospective) - Setting? hospital / location / clinic
- Patient Population? - number / screening or
enrollment methods / number screened vs number
enrolled - Description of prognostic or outcome factors
considered - Prognostic Outcome Factors are the numbers of
events that occur over time, expressed in - absolute terms e.g. 5 year survival rate
- relative terms e.g. risk from prognostic factor
- survival curves a curve that starts at 100 of
the study population and shows of the
population still surviving at successive times.
Applied to onset of a disease, complication or
some other endpoint (e.g. time before relapse)
18- Validity
- Was a defined, representative sample of patients
assembled at a common (usually early) point of
the illness ? - Inclusion and exclusion criteria?
- Selection biases?
- Stage of disease?
- Was patient follow-up sufficiently long
complete? - Reasons for incomplete follow-up?
- Prognostic factors similar for patients lost and
not-lost to follow-up? - Were objective unbiased outcome criteria used?
- Outcomes defined at start of study?
19- Validity
- Assessors and subjects blinded to prognostic
factor theory? - Statistical models seem OK?
- Follow-up duration / completeness / accounting
for patients - If subgroups with different prognoses were
identified - Was there adjustment for important prognostic
factors? - Are the (hopefully valid) results of this
prognosis study important? i.e. - How large is the likelihood of the outcome
event(s) in a specified time? - Survival curves?
- How precise are prognostic estimates?
- Confidence intervals?
20- Conclusions
- Strengths and Weaknesses of Study
- In context of other studies /or current standard
of care? - Next steps for further study of this problem?
- Can you apply the (hopefully valid important)
results of this study to caring for your own
patients? - i.e. - were the study patients similar to your own?
- patients similar for demographics, severity,
co-morbidity, and other prognostic factors? - will this evidence make a clinically important
impact on your views on what to tell or to offer
your patients? - Compelling reason why the results should not be
applied? - Will the results lead directly to you selecting
or avoiding therapy? - Are the results useful for reassuring or
counselling patients?
21- Incidence
- can be defined as the number of new
occurrences of a phenomenon e.g. illness, in a
defined population in a specified period. An
incidence rate would be the rate at which new
cases of the phenomena occur in a given
population. -
- Prevalence (also called Prevalence Rate re
prevalence across time) - the number of cases (or events, or conditions)
within a specified time period. e.g. prevalence
of a condition includes all people with the
condition even if the condition started prior to
the start of the specified time period. - Period prevalence The amount a particular
disease present in a population over a period of
time. - Point prevalence The amount of a particular
disease present in a population at a single point
in time.
22Appraising Systematic Reviews(of treatment /
Intervention Studies)
- What were the relevant population(s)?
- What were the main exposure(s)?
- What were the comparison(s)?
- What were the outcome(s)?
- Design of the Studies
- experimental or non-experimental ?
- cross-sectional or longitudinal ?
- All trials included in a review should first have
been appraised using the model for experimental
studies
23- Validity of Review Results
- were the criteria used to select studies for
inclusion in Review both explicit and
appropriate? - Is it likely that any important, relevant studies
were missed? (completeness of literature search) - Was the validity of the included studies
appraised? - Were assessments of the studies reproducible?
(documented and replicated) - Were the results similar from study to study?
(tests of heterogeneity)
24- Results (Size of Effects and Precision)
- What were the overall results of the review - how
large were the effects ? - How precise were the results ?
- Applicability Relevance
- Are the results applicable in normal practice?
- Were all important outcomes considered?
- Are the likely treatment benefits worth the
potential harm costs ? (e.g. adverse effects
etc.) - Strengths weaknesses of the Review?
- How could the Review be improved?
25Critical Appraisal - NNTS NNHS
- Decide from reading the study if the experimental
group had a better outcome than the control group
- if so, do the NNT - Or
- if the control group had a better outcome than
the experimental group - if so, do the NNH - When the experimental treatment decreases risk of
an undesirable outcome NNT and RBI (relative
benefit increase) are useful - Number Needed to Treat number of patients who
need to be treated to cause 1 good outcome - Number Needed to Harm number of patients who
need to be treated to cause 1 bad outcome
26- EER event rate in the experimental group
- CER event rate in the control group
- If this is a difference, ignore minus signs
except as a reminder as to whether treatment was
overall helpful or harmful - E (event) outcome (express it as a
decimal eg. 40 occurrence as 0.4) - e.g. in a study comparing mood stabilisers,
a bad outcome might be that the manic state does
not improve with the treatment, or gets worse - Absolute Benefit Increase when the treatment
benefits more experimental subjects than occurs
with those in the control group - ABI EER - CER
- Relative Benefit Increase fewer bad outcomes in
the experimental group compared with the control
group - RBI EER - CER / CER
- NNT 1 / ABI
27- EXAMPLE
- Treatment of acute mania.
- Results are a reduction of a certain amount on
the young mania rating scale (YMRS) After 1 week - DRUG A
- 65 OF SUBJECTS HAD OUTCOME
- PLACEBO
- 30 OF SUBJECTS HAD OUTCOME
- EER event rate in the experimental group
- CER event rate in the control group
- E (event) outcome 65 (S) 30 (C)
- EER IS THUS 0.65 CER IS
THUS 0.30 - ABI EER - CER 0.35
- NNT 1 / ABI 1 / 0.35 2.86
- So number needed to treat is close to 3 - i.e. We
have to treat 3 patients for 1 to get benefit.
This would be an extremely good and impressive
NNT.
28Asking a Research Question
- What is the Question? (the Clinical Problem to be
answered) - What sort of Issue being investigated
- An Intervention or Treatment ?
- A Diagnostic Test or Instrument ?
- A Causal factor ?
- A Prognostic Factor ?
- What is the main alternative for Comparison
- A Control group?
- A Comparison group?
- A Placebo group?
- Comparing 2 interventions?
- What is the main Outcome or Outcomes?
29Examples
- You are sure that on-call nights for
psychiatric registrars and crisis nurses are
always busier when there is a full moon. How
would you try to determine whether this is in
fact the case?
30- You are working in the C-L service of a
general hospital. Budget cuts are threatened and
you have to justify maintaining the C-L service
to several medical wards. One ward refers to C-L
a lot, and the other hardly ever. You feel that
your services C-L input shortens the length of
stay for patients with delirium and self-harm.
How could you demonstrate this in time for next
years budgeting round in 9 months time?
31Significance - p values
- The statistical significance of a result is the
probability that the observed relationship (e.g.,
between variables) or difference (e.g., between
means) in a sample occurred by pure chance, and
that in the population from which the sample was
drawn, no such relationship or differences exist.
- The p-value represents the probability of error
in accepting our observed result as valid, or
"representative of the population."
32P-values
- A p-value of 0.05 (1 in 20) indicates that there
is a 5 probability that the relation between the
variables found in our sample is a "fluke." -
- p values of lt0.05 are by convention 'just'
significant - but this level of significance still involves a
pretty high probability of error (5). - Results that are significant at the p lt0.01 level
are considered by convention statistically
significant, and p lt0.005 or p lt0.001 levels are
often called highly significant.
33Data-mining and spurious significance
- The more analyses you perform on a data set, the
more results will "by chance" meet the
conventional significance level. - For example, if you calculate correlations
between ten variables (i.e., 45 different
correlation coefficients), then you should expect
to find by chance that about two (i.e., one in
every 20) correlation coefficients are
significant at the p lt0.05 level, even if the
values of the variables were totally random and
don't correlate in the population. - Some statistical methods that involve many
comparisons include some "correction" for the
total no. of comparisons - but not all do.
34(No Transcript)
35Correlation Coefficients
- Shows the extent to which a change in one
variable is associated with change in another
variable the relationship between them. - Best to have /-0.90 and above to show a
correlation - Range from -1.00 to 1.00.
- -1.00 perfect (strong) negative relationship.
- 1.00 perfect (strong) positive relationship.
- 0.00 (midpoint) no relationship at all.
36Strength vs Reliability of a Relationship
Between Variables
- In general, in a sample of a particular size, the
larger the size of the relationship between
variables, the more reliable the relationship. - If there are few observations, then there are
also few possible combinations of values, so the
probability of a chance combination showing a
strong correlation is high - so small 'n' studies
are statistically weak. - If a correlation between variables in question is
very small in the population, then there's no way
to identify it in a study unless the sample is
very large. - Similarly, if a correlation is very large in the
population, then it can be found to be highly
significant even in a very small sample. - If a coin is slightly asymmetrical, and when
tossed is slightly more likely to produce heads
than tails (e.g. 60 vs. 40), then ten tosses
would not be enough to show that the coin is
asymmetrical. But if the coin is weighted to
almost always fall as heads, then ten tosses
would be quite enough to show this.
37(No Transcript)
38- Other terms and concepts to learn
- Measures of Central Tendancy and of Variability
- Types of Data
39Confidence Interval
- If the Confidence Interval does not overlap zero,
the effect is said to be statistically
significant - CI is range of values, within which we're fairly
sure the true value of the parameter being
investigated lies. - If independent samples are taken repeatedly from
the population a Confidence Interval calculated
for each, a certain (confidence level) of the
intervals will include the unknown population
parameter. Confidence intervals are usually
calculated so that this percentage is 95. - Width of the confidence interval shows how
uncertain we are about the unknown parameter.
Very wide interval ? more data should be
collected before anything definite can be said
about parameter.
40Odds Ratios
- Compares frequency of exposure to risk factors in
epidemiological studies - The odds ratio is a reasonable approximation of
the relative risk when the outcome is relatively
large (e.g., when less than 1 of the people
exposed to an agent develop disease). The odds
ratio produces larger errors as the outcome rate
rises above 1. - You can say that a proposed risk factor acts as a
significant risk to disease if - odds ratio is gt1
- lower edge of the C.I. gt1
41VARIOUS TESTS
- Have some idea what each is for -
- A reasonable reference is
- http//www.une.edu.au/WebStat/unit_materials/c6_co
mmon_statistical_tests/ - Parametric Tests and Non-Parametric Tests
- Nonparametric methods are used when we know
nothing about the distribution of the variable in
the population. Not so much that they are for
non-normal distributed data, but there's no
assumption of a normal distribution - Parametric tests are used where there is a normal
distribution
42Parametric vs Non-Parametric tests
- Memorize a name of each sort e.g.
43Null Hypothesis
- The alternative hypothesis (to the
researchers theory). It usually assumes that
there is no relationship between the dependent
and independent variables. The null hypothesis is
assumed to be correct until research demonstrates
that it is incorrect. This process is known as
falsification.
44POWER
- Type I Error Rate (Alpha)
- The probability of incorrectly rejecting a true
null hypothesis (a Type I error gives a false
positive result) - Type II Error Rate (Beta)
- The probability of incorrectly accepting a false
null hypothesis (a Type II error gives a false
negative result)
45- In the social sciences there are conventions
that - ? the Type I error (risk of a false positive)
- must be kept at or below 0.05 (50) - ? the Type II error (risk of a false
negative)- must be kept low as well (20 or
less, generally) - Statistical Power is equal to 1 - ?
- and must be kept correspondingly high
- Power should be at least 0.80 (80) to detect a
reasonable departure from the null hypothesis - Statistical Power The probability of
rejecting a false null hypothesis
46In Reject-Support (RS) research (the usual kind)
- (the opposite is true in Accept-Support AS
research) - The researcher wants to reject the null
hypothesis - "Society" wants to control Type I error (false
positives) - The researcher is very concerned about Type II
error (false negative - missing the fact that
you have a result that supports your theory - is
much more likely to get published) - High sample size works for the researcher
- But if there is "too much power", trivial effects
become "highly significant"
47Factors influencing power in a statistical test
- 1. What kind of statistical test is being used
- 2. Sample size
- 3. Size of the experimental effect
- 4. Level of error in experimental measurements
- A Sampling Distribution
- the distribution of a statistic over repeated
samples - The Standard Error of the Proportion
- the standard deviation of the distribution of the
sample proportion over repeated samples
48Power Analysis in Studies
- In planning a study, one must estimate
- What would be the reasonable minimum
experimental effect that one wants to detect - A minimum Power to detect that effect
- The sample size that will achieve that desired
level of Power
49Steps required for Power analysis and sample size
estimation
- The type of analysis and the null hypothesis are
specified - Power and required sample size for a reasonable
range of likely experimental effects is
investigated - The sample size required to detect a reasonable
experimental effect (i.e. departure from the null
hypothesis) with a reasonable level of power is
calculated, while allowing for a reasonable
margin of error
50- Method (Excerpt) Statistical analysis
- It was estimated that in order to detect a
30 difference between the percentage of
responders in the control group compared with
that in the exercise group at the P0.05 level of
significance, a sample size of 40 subjects per
group would be required to give a power of 90.
Data on poorly responsive depression are scant
but the proportion of responders in the control
group was reasonably anticipated to be 10,
compared with an anticipated 40 in the exercise
group.
51- Was a power analysis done prior to the study?
What is the main implication? - Yes. The power was set at 0.9 (90)
- Power 1-beta (beta is the probability of making
a Type-II error) - So, 0.9 1- beta, or Beta 1 - 0.9, which is
0.1 or 10. Thus the risk of making a Type-II
error in this study was 10, as opposed to most
studies which set Power at 0.8 - i.e. they
tolerate a risk of 20 of making a Type-II error
(a false negative) - Main Implication was that the study did have
enough power to detect a significant improvement,
which it did not do
52Ethics in Research
- http//www.wma.net/e/policy/b3.htm World
Medical Association Helsinki principles for
research in humansEthics Committees Think
about their role and how to design studies to
meet these requirementsRANZCP principles from
Code of EthicsPsychiatrists involved in
clinical research shall adhere to those relevant
ethical principles embodied in national and
international guidelines
53College Code of Ethics (paraphrased)
- It's done on people so high standards are needed
and must be scientifically justified - Must be OKd by an Ethics Committee
- Minimize any harm to subjects
- The interests of subjects always takes precedence
over science or society's interests - Informed consent must be obtained from people
participating in research - Special care to be taken with consent from those
in dependent relationships, eg. students,
prisoners, the elderly - For minors - consent from parent/guardian
54College Code of Ethics (paraphrased)
- If subjects aren't competent to consent get this
from a relative or guardian - Subjects can withdraw at any time it won't
jeopardise their care - If a researcher uncovers clinically relevant
information needing acting on, researcher should
tell the patient their doctor - Confidential information obtained from the
research stays within the study - No plagiarism, acknowledge all references
- Research reports to be truthful and accurate
- Ensure participants are deidentified
- Declare any conflict of interest in all
publications