Title: Data Management
1- Data Management
- Issues for
- Clinical Research
- Allan Williams - Database Manager
- alwill_at_scharp.org
- Anthony Mwatha - Statistician
- mwatha_at_scharp.org
- SCHARP
- Statistical Center for HIV/AIDS Research and
Prevention
2Key Principles
- Keep your focus on the big picture. Some things
are more important than others. - Pay attention to the details. The devil is in
those details. - Case Report Form Development
- Data Collection and Management
- Use a Risk Management perspective what is most
important to worry about, and if problems occur
what is your mitigation strategy?
3Important References
- Take good care of your data, Svend Juul, 2004
(www.epidata.dk/downloads/takecare.pdf)Very
wise, excellent practical suggestionsExamples
for EpiData, SPSS, Stata but useful for any data
management and analysis environment - Good Clinical Data Management PracticesVersion
4, September, 2007Society of Clinical Data
Management(www.scdm.org/GCDMP) - Excellent reference for entire field
- Recommendations for Best Practices and Standard
Operating Proceduress (SOPs)
4What is the big picture?
- Protocol Primary Objectives Endpoints
- How will they be measured, collected, checked?
- Secondary Objectives
- Other major pre-planned analysis
5How Much Data?
- Friedman, Furberg DeMets, 1996
- Beware of it would be interesting to know, Pg.
355 - AIDS CLINICAL TRIALS, 1995
- A typical Phase III clinical trial of primary HIV
infection conducted by the ACTG collects more
than 1,000 data items per patient Less than 100
items usually appear in the published reports
NDAs also tend to focus on the same items - Clinical Trial Safety Surveillance, DIA, 1997
- The more that is asked for on a CRF, the more
variability will occur in the answers.
Investigators complain about being overwhelmed
with CRFs. This does not create an ideal
environment for collecting quality safety data. - DeMets, 2006 Course Description, Introduction to
Clinical Trials - Many studies also fail because the amount of
data collected exceeds what is necessary and what
is affordable.
6Less is More
- Make everything as simple as possible, but no
simpler Einstein - Long questionnaires do not necessarily give
better or more complete information - Respondent fatigue
- Strategies for avoidance Bias
- Overly complex data collection may decrease
overall quality enough to jeopardize the ability
to answer the primary hypothesis - Not having a primary hypothesis is itself a
problem with respect to less is more
7Definition of Data Quality
- There can be no perfect data set
- Quality data would therefore be defined as data
that sufficiently support conclusions and
interpretations equivalent to those derived from
error-free data. - From Assuring Data Quality and Validity in
Clinical Trials for Regulatory Decision Making,
Workshop Report, National Academy Press,
Washington, D.C., 1999. - http//www.nap.edu/catalog/9623.html
- Data can be overcleaned Perfect Data is
probably not true
8Pay attention to the details
- Design your forms, database design, and QA
activities around those primary, secondary and
other major pre-planned objectives. - Build in support mechanisms and layered QA
processes around the data for those key
objectives. - Validate that those processes and QA systems work
as intended - Think about Risk Management and Audits
9The CRFs are the study
- Form design is even harder than a good protocol
- Focus on Primary/Secondary Hypothesis Definition
- Focus on Safety (if clinical intervention)
- Organize Visit Structure
- Pilot test forms and visit flow
- Details count. Finding operational problems early
is better than patching up for them afterwards - Review, Review, Review! Multiple people with
different viewpoints, responsibilities
10CRF Design
CRF Design
- Standardized Modules
- Eligibility
- Enrollment/Randomization Confirmation
- Vital Signs/Physical Exam
- Medical History/Concomitant Disorders
- Participant Compliance
- Outcome Measurements
- Laboratory Tests and Specimen Tracking
- Adverse Experiences, SAE Reporting
- Concomitant Medications
- Behavioral (eg Quality of Life, compliance,
diary) - Administrative (visit record, change in study
arm/cycle) - Termination (Finished/Withdrawal)
11Visit Schedules
- Which data and lab test collected at each visit?
- What are visit windows?
- Are data allowed to be collected outside
windowor is the visit counted as missed? - Which forms are required at a visit? Which are
optional? How is each tracked? - Events are often treated as logs? (Adverse
experiences, concomitant meds, patient diaries)
12Data Collection Forms
- Designing data collection forms
- Organization and content
- Review protocol for required data create as
form items - Deciding on what forms are needed by visit and
category - Some logical divisions include
- Who will be completing the forms ?
- When will the data be available ?
- Where will the data be collected ?
- Better to have many pages than to try and fit too
much in one page.
13Data Collection Forms
- Design forms with ease of use in mind
- Individuals filling the forms, data entry and
analyzing the data. - Responses to questions include text strings,
numeric or categorical values. - Text strings Responses to these types of
questions contain data that is typically not
analyzed, e.g. name of a medication. May have to
later categorize if want to analyze. - Allow adequate space for handwriting the
information.
14Data Collection Forms (Keys)
- Participant ID number
- Dont overload with meaning
- Use of checksum
- Initials for cross-check (confidentiality?)
- Unique to study or keep across studies
- Study ID
- Site ID
- Form type/page sequence number
- Visit number, visit date
- Interviewer ID (Regulatory requirement)
15Data Collection Forms (numeric)
- Provide correct number of boxes for the answer
pre-print decimal points commas or
punctuations specify relevant units if rounding
is required be clear on whether to round up or
down. - Examples
- Weight ____ . _ Kg
- CD4 cells ____ cells / ?L
- BP ___ / ___ mmHg
16Format of Questions
- Keep text as concise and clear as possible
- __ __ .__ o C Temperature
- __ __ .__ o C What was the patients temperature
at delivery - Use terminology that is familiar / pilot the
forms - Dates formats, US vs. international -
09-OCT-2003 / 10-9-2003 / 9-10-2003 - Time 24 hr clock - 0000 hrs vs. 2400 hrs
- Unknown/Not applicable/Not available/Not done
- Other
- Give reason for stopping
treatment - 1. Completed per protocol
- 2. Refused
- 3. Side effects
- 4. Other, specify ________________
- 9. Unknown
17Modifications to CRF
- After activation of study, only when absolutely
essential - If necessary things to consider
- Change adds a question
- Change adds an additional response
- A question is being removed
- Form version numbers/dates
- Procedures for dealing with missing data (before
addition) - Documentation of change, impacts, mitigation
procedures
18Data Collection and ManagementProcess Overview
(multi-part paper forms)
Site
Data Management
19Database Design
- Desirable Clinical Data Management system
features - DE screens easy to set up and maintain
- Supports variable labels, range checking,
category checks, skip patterns, edit-check
language - Options for missing values Missing vs. NA vs.
required skip - Supports relational links (mother, children
associations) - Support for double entry and verification
- Ability to do queries, w/out extensive
programming - Good links to SPSS, SAS, other statistics package
- Multi-user, simultaneous read/write
- Cost
- Patient tracking
- Security
- Audit trails
20Database/Entry Options (1)
- Microsoft Excel (Not recommended)
- General Database Software
- Microsoft Access
- Filemaker
21Database/Entry Options (2)
- Epi. Data Entry/Data Management Systems
- Epi Info (www.cdc.gov/epiinfo/)
- Uses Microsoft Access, writes Access .mdb
files, which can be read by SPSS, SAS, STATA - Strongly Point and Click oriented, easy to use
- Supported by CDC
- EpiData (www.epidata.dk/index.htm)
- Danish Free - Not Access (not affected by Access
updates) - Outputs to Excel, SAS, SPSS, Stata
- Oriented to Batch Processing, but easy to set up
- Good edit check language and Codebook
documentation features
22Database/Entry Options (3)
- Scanning
- Optical Mark Reading fill in the bubble - NCS
- Fax submission, image scanning Teleform, DataFax
- Commercial Clinical Trial Data Management
Software - Oracle Clinical
- Phase Forward Inform (Web based data capture)
Clintrial (Oracle based) - DataFax Image based CRF transfer, central data
entry, often internet based transfer
23Paper Change Control
- Multi-part Paper Model
- Forms separated when sentto Sponsor for Data
Entry - Change Control very complicated(Requires
change-NCR forms!) - Result is delay of sending andextensive onsite
checking by CRA prior to splitting
(harvesting) to send to Stat. Center for batch
Data Entry
24Centralized Paper Models
- Most familiar to pharmaceutical and FDA
- CRFs filled out at site, sent to SC
- Multi-part carbonless forms standard
- CRA checks forms before harvesting to send to
SC - SC double-enters paper forms in batches
- Site focuses on forms and protocol, not
technology - Huge paper flow tracking issues
- Change Control and QC flow is complex
- May get into field quickly. Easiest for large
sites - Strong software support from major
vendors(Clintrial, Oracle Clinical, SAS
Ph-Clinical)
25WEB Models
- Mixture of remote and centralized
- Remote data entry onto central database
- Single or double entry?
- Less site IT support Use browser model
- Control over entry, edits like distributed model
- Infrastructure requirements
- Web, network, good ISPs, Bandwidth issues
- New environment, FDA cautious about Investigator
responsibility/control - Roll your own or Purchased (www.phaseforward.com
)
26Image Change Control
- Image/Fax Model
- Single piece of paper
- Always at study site
- All changes made directly to form and resent
- Impt Change in white space, initialized and
dated - Encourages sending of CRF immediately after use
- Image and data can be immediately available to
DE, programmers, statisticians, QA at same time - Audit trail and backup very important
- Corrections may require multiple
transmissions/entry - Pages in a form may be sent out of sequence
27Risk Management Approach
- Invest in selected prevention and risk reduction
activities to optimize regulatory compliance,
problem resolution efforts and costs. - Bottom Line Avoid costly activities that have
no, or little value in order to prevent
significant regulatory and operational costs
throughout the system life cycle
28Layered Risk Management
- Most secure, least risky systems use a layered
approach, with multiple approaches and points for
proactively reducing risk, and/or detecting risk
events and having an effective mitigation
strategy upon detection - Increasing probability of detecting a fault
condition is an effective risk mitigation strategy
29Categorize Risk Priority
- Risk Type
- Regulatory risk potential impact on patient
safety, decision quality, or regulated data
integrity - Business risk potential impact on reputation,
brand, development, cycle time, or revenue - Likelihood of event occurring (high, medium, low)
- Severity of impact occurring (high, medium, low)
- Detectability of discovering a fault condition
(high, medium, low)
30Sources of Error in a Study
Omission, mis-communication
Transcription
Data entry errors
Programming, summary tables Statistical
interpretation
Clinical interpretation
31Questions to Ask
- The following questions should be answered to
identify critical systems/data . - Does the system have direct or indirect impact on
patient safety? - Which functions of the system have the most/least
risk? - Is the data included in regulatory submissions?
- Has the FDA audited the data in the past?
- What is the worst thing that could happen if data
was lost or corrupted? - What will happen if the system is not available?
- Are there external processes that can detect
failure of system? - Are all processes documented and can the process
be reconstructed? - Determine how the validation effort will be
focused in order to thoroughly test the most
critical aspects of your system
32Standardizing, Organizing, Documenting
- Standardizing
- Standard CRF modules and items
- Standard Operating Procedures
- Organizing
- Process flow
- Study communications
- Programs and systems
- Documenting
- All the above
- Exceptions and deviations from standard
procedures and mitigation strategy to prevent
future occurrences
33Standard Operating Procedures
- Commitment to developing SOPs, following those
SOPs, being able to document that they are/were
followed, and being able to detect when they are
not being followed (risk mitigation) - Use Society for Clinical Data Management (SCDM)
Good Clinical Data Management Practices as a
guide (GCDMP, Vol. 3 or 4) for developing your
own SOPs. - Carefully re-read Take Good Care of your Data
by Svend Juul
34Keeping Organized
- Create Project master folder, separate from
software - Organize sub-directories
- Data, code/procedures, documentation, analysis,
etc. - Keep a log of all actions, modifications,
analyses, locations, flow of procedures (Data
Flow Diagram) - Standardize on file naming (and extension)
conventions - Keep copy of each procedure that modifies or
extends the data. Do not work interactively and
then not save a copy of what you did. Keep copies
of any program/processing so that it can be
re-run later - Document your procedures/programs
- Purpose, inputs, outputs, directory locations,
programmer, modification history
35Batch vs. Interactive Processes
- Most data management and analysis programs should
be run as a batch program. This means that the
program is not run interactively but is saved in
a standard location as a file and run by an
external procedure. - Change control procedures should document each
update to the program and documentation should
exist that shows that proper testing of the
changes were performed.
36Backing Up, Archiving
- Backing Up
- Daily, to disk, tape, both
- Purpose to restore data if loss (accidental
deletion, disk failure, theft, flood, fire, etc.) - Backup when anything of importance changes
- new data, any modifications, analysis changes,
doc. changes - Keep one copy off-site (able to retrieve quickly
but distant) - Archiving
- At final or major intermediate stages
(presentation, publication, outside review, etc) - Snapshot Save all data that went into event
along with all programs that were necessary to
create publications and all output. - Archiving Save all original, programs to clean,
merge, check, analysis. Save codebook, study
protocol, study logs, all documentation, all
final output, analysis. Document
structure/directories on CD/DVD/Tape - Make multiple copies, keep one off-site in safe
location permanently
37Articles
- Good Clinical Data Management Practices, Society
for Clinical Data Management, Version 3,
September, 2003. (www.scdm.org/GCDMP) - Guidance for Industry E6 Good Clinical Practice
Consolidated Guidance, ICH, April 1996, 63 pgs.
(http//www.ich.org Click on Guidelines) - North, Phillip Ensuring Good Statistical
Practice in Clinical Research Guidelines for
Standard Operating Procedures (An Update), Drug
Information Journal, Vol. 32, pp. 665-682, 1998 - Data Management for Multicenter Studies Methods
and Guidelines, Controlled Clinical Trials, Vol.
16, Number 2S, April 1995 - Good Clinical Laboratory Practice Training
Workshop, PPD-HVTN-NIAID, Washington, DC, May
12-14 2002 (www3.niaid.nih.gov/research/resources/
DAIDSClinRsrch/PDF/labs/GCLP.pdf)
38Articles (cont)
- Svend Juul. Take Good Care of your Data.
(http//www.epidata.dk/documentation.php) - Assuring Data Quality and Validity in Clinical
Trials for Regulatory Decision Making. Institute
of Medicine. National Academy Press, 1999.
(www.nap.edu/catalog/9623.html) - Review of the HIVNET 012 Perinatal HIV Prevention
Study. Institute of Medicine. National Academy
Press, 2005. (http//www.nap.edu/catalog/11264.ht
ml) - Reviewer Guidance Conducting a Clinical Safety
Review of a New Product Application and Preparing
a Report on the Review, FDA/CDER, February, 2005,
79 pgs. (www.fda.gov/cder/guidance/3580fnl.pdf)
39Articles (cont)
- Clinical Data Capture, Clinical Trial EDC Task
Group, RhRMA Bostatistics and Data Management
Technical Group, Society for Clinical Data
Management, 2005. (http//www.scdm.org/profession
als/phrma_edc.pdf) - IATA Dangerous Goods Regulations Manual
(DGR)(http//www.iata.org/ps/publications/9065.ht
m) or(http//www.thecompliancecenter.com/publicat
ions/iata.htm)
40Books
- J. Kolman, P. Meng, G. Scott, Good Clinical
Practice Standard Operating Procedures for
Clinical Researchers, Wiley, 1998 - E. McFadden, Management of Data in Clinical
Trials, Wiley, 1998 - D. Finkelstein, D. Schoenfield, Aids Clinical
Trials Guidelines for Design and Analysis,
Wiley, 1995 - Gad SC., Taulbee SM., Handbook of data recording,
maintenance and management for the biomedical
sciences, CRC Press 1996. - John M. Marry, The Great Influenza The epic
story of the deadliest plague in History, Viking
Press 2004 (Origin of American experimental
medicine)