Data Management for Research - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Data Management for Research

Description:

Calculate BMI. Standard Data Entry Conventions ... Computer chart abstraction form showing two common data entry conventions. Demonstration ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 54
Provided by: michae116
Category:

less

Transcript and Presenter's Notes

Title: Data Management for Research


1
Data Management for Research
  • Michael A. Kohn, MD, MPP
  • 7 January 2003

2
Assumptions about Students
  • Actively involved in a clinical research study
  • Some experience with entering and maintaining
    data in single-table spreadsheet or statistical
    software

3
Housekeeping
  • Check Lab sheet to confirm which lab you are in.
  • Labs will be in Library 222.
  • Bring a diskette and your syllabus to labs.
  • Syllabus and Learn MS Access 2000 CD

4
Requirements
  • Turn in all 4 assignments on time
  • Turn in course evaluation

5
Course Objective
  • Learn how to develop a multi-table, relational
    database for a research study using Microsoft
    Access.
  • Example Infant Jaundice Study

6
Data Management for Clinical Research
  • We know how to
  • define the study population, the independent
    variables and the outcome variables
  • measure these variables and anticipate problems
    with measurement
  • analyze the results.

7
Data Management for Clinical Research
  • The DBMS (Database Management System) is for
  • entering and storing the measurements,
  • entering and storing the other information
    necessary to administer the study (subject
    contact information, exam schedules,
    reimbursement records, etc.),
  • monitoring the study, and
  • either analyzing the results or formatting the
    results for analysis.

8
Four Types of Research Database
  • Combination of paper files, Excel spreadsheets,
    and direct keyboard entry into the statistical
    analysis package.
  • Desktop multi-table relational database.
  • Client-Server multi-table relational database.
  • Internet database server.

9
Advantages of a computerized database
  • Ease of data entry
  • Automatic data validation
  • Automatic error checking
  • Alternative is a stack of paper forms

10
Advantages of a Multi-Table Relational Database
  • Eliminates redundancy
  • Ensures data integrity
  • Note Unless you plan on doing your analysis
    long-hand, you always need a computer database of
    some sort (a Stata dataset or an Excel
    spreadsheet may be adequate) you dont always
    need a multi-table relational DBMS (like
    Microsoft Access).

11
Multi-Table Relational Database
Collection of spreadsheet-like, two-dimensional
tables. Rows in Tables Records Columns in
Tables Attributes Tables are related
one-to-many, many-to-many, and one-to-one.
12
Jaundice and Infant Feeding Study Cohort study to
determine the 5-year neuropsychiatric sequelae of
infants with neonatal jaundice or feeding
disorders.
13
Infant Jaundice Study (Our fictional version of
JIFee)
Methods Design-Cohort study. Setting-Single,
urban medical center Subjects-Infants with
neonatal jaundice and randomly selected
non-jaundiced infants Independent
Variable-Presence or absence of jaundice Outcome
Variable- Neuropsychiatric score (ranging from 55
to 145) at age 5 Analysis- ?
14
Infant Jaundice Study Data
  • Approximately 400 children
  • 5 examiners (doctors)
  • Approximately 700 neuropsychiatric examinations,
    measuring weight, height, and NPScore (IQ)
  • Some children to be examined more than once
  • No examiner to see the same child twice
  • If child died before age 5, store age and
    circumstances of death

15
Assignments
Lab 1 Tables and Relationships 1/14/2003 Send
Access file LastnameLab1.mdb to Andrew High,
(ahigh_at_psg.ucsf.edu) by 1/20/2003.
Lab 2 Forms, Queries, and Reports 1/21/2003 Send
Access file LastnameLab2.mdb to Andrew High,
(ahigh_at_psg.ucsf.edu) by 1/27/2003.
We will work through these assignments in the
labs, so you dont need to have Access2000 at
home.
16
Assignments (contd)
Lab 3 Exporting and Analyzing Data 1/28/2003
Determine if neonatal jaundice was associated
with the 5-year neuropsychiatric scores and
create a table, figure, or paragraph appropriate
for the Results section of a manuscript
summarizing the association.
Write a sentence or two for the Methods section
on inter-rater reliability. (Use Bland and
Altman, BMJ 1996 313744)
Send assignment to ahigh_at_psg.ucsf.edu by 2/3/2003.
17
Assignments (contd)
Class session 5 (not a lab) Planning and
Budgeting for Data Management 2/4/2003
Write a one-page data management section for your
research study protocol and create a budget for
data management. If you do not have your own
research study protocol, write the data
management section and create a budget for the
fictional Infant Jaundice Study protocol. We can
compare your budget with the real budget of JIFee.
Send assignment to ahigh_at_psg.ucsf.edu by
2/10/2003.
18
Table of Study Subjects
Infant Jaundice Study Table of Subjects Baby
Row Individual Infant Columns ID, Name,
DOB, Sex, Jaundice. If one set of measurements
per infant, put measurements in subject
table. This is a single-table database.
19
Demonstration
Label columns and enter rows of data in datasheet
view
20
Demonstration
Table design or data dictionary view
21
Demonstration
Disallowed values
Duplicate primary keys
This automatic error checking and data validation
IS why you need to enter your data into a
computer it is NOT why you need a relational
DBMS. Many single-table products (Filemaker Pro,
SAS FSP, even Excel) can do error checking and
data validation.
22
Acceptable table showing one set of exam results
per participant.
23
Table of Study Subjects
Table of Study Subjects Row Individual
Infant Columns ID, Name, DOB, Sex, Jaundice If
some infants have more than one exam, what do you
do?
24
Unacceptable table showing multiple exam results
per study participant.
25
Common Error
  • If you find yourself creating multiple columns
    for the same measurement, e.g., Date1, Score1,
    Date2, Score2, Date3, Score3,
  • Or if your table is more than about 30 columns
    wide,
  • It is time to restructure your table.

26
Unacceptable table with participant-specific data
duplicated for each exam. (Note problem with
Helens DOB.)
27
Normalization
If some infants have multiple exams, normalize
the records into two tables, one for subjects and
one for examinations.
28
Figure 6. Data normalized into two tables one
(Baby) with rows comprising subject-specific
information the other (Exam) with rows
comprising exam-specific information. Note that
Helen can only have one birth date. Subjects
with no exams, e.g. Alejandro, still appear in
the database. SubjectID functions as the
primary key in the Baby table and as the
foreign key in the Exam table.
29
Figure 7. Relationships diagram showing the
one-to-many relationship between the table of
subjects (Baby) and the table of measurements
(Exam).
30
Demonstration
Inability to create integrity violations with
normalized tables.
This IS why you need a multi-table relational
DBMS.
31
Table of Examiners
Neuropsychiatric outcomes are assessed by 5
different examiners (doctors)
May want to assess whether examiner
characteristics (sex, specialty, age) affect
neuropsychiatric scores
Doctor examines many children each child may
have more than one exam but a child is never
examined by the same doctor twice.
32
Table of examiners with multiple
examiner-specific fields.
33
Figure 9. Undesirable table in which
examiner-specific data is repeated with each
examination. (Note that Dr. Novello is a female
pediatrician for two examinations and a male
internist for an exam in between.)
34
Figure 10. Normalization into two tables, one for
exam-specific information and one for
examiner-specific information. (Note that Dr.
Novello cannot change specialty or gender between
examinations.) DocID functions as a second
foreign key in the Exam table. (The other
foreign key is SubjectID.)
35
Figure 11. Relationships diagram showing the
relationships between the table of subjects
(Baby), the table of measurements (Exam) and the
table of examiners (Doctor). The Exam table
functions as a linkage or join table between
Baby and Doctor creating a many-to-many
relationship between study subjects and examiners.
36
One-to-One Relationship Infants and Deaths.
37
Some fields are subject specific but valued for
only a few subjects. Maintaining columns for
these fields in the table of subjects leads to
empty fields and wasted space.
38
Figure 12. Some fields are subject specific but
valued for only a few subjects. Maintaining
columns for these fields in the table of subjects
leads to empty fields and wasted space.
39
Figure 13. Creating a separate table with a
one-to-one relationship eliminates the empty
fields and wasted space.
40
Figure 14. The relationships diagram now includes
a table (Death) with a one-to-one relationship
with the table of subjects (Baby). A subject
can only have one record in the
one-to-one-related table, but the vast majority
of subjects will not have any Death record.
41
Undesirability of Storing Calculated Values
Store raw data, not calculated fields, e.g.,
store dates and times calculate
intervals.   Storing a patients birth date
allows calculation of his or her exact age on the
date of a particular measurement.
42
Figure 15. Calculated fields such as
AgeInMonths are undesirable. What if the birth
date for SubjectID 2322 (Helen) is corrected in
the Baby table?
43
Demonstration
  • Calculate Age
  • Calculate BMI

44
Standard Data Entry Conventions
Several conventions for data entry and display
have developed over time. Although most users of
screen forms are not aware of these conventions,
they have come to expect them subconsciously.
For example, a series of mutually exclusive,
collectively exhaustive choices is usually
displayed as an option group consisting of
several different radio buttons, whereas
choices which are not mutually exclusive are
displayed as check boxes. N.B. An option group
of mutually exclusive choices is a single column
or field. A group of N check boxes represents N
yes/no fields.
45
Use check boxes when options are not mutually
exclusive. (5 fields)
Use radio buttons when options are mutually
exclusive. (1 field)
Computer chart abstraction form showing two
common data entry conventions.
46
Demonstration
Option group for examiners medical specialty
47
Guidelines for Data Management in Clinical
Research
Establish the database tables, their rows and
columns, and their relationships correctly at the
outset.   A poorly organized database makes
data maintenance and retrieval nearly impossible.
Make sure the data are normalized. The data
structures should never require duplicate data
entry or redundant storage.
? MS Genetics Example
48
Guidelines for Data Management in Clinical
Research
Establish and follow naming conventions for
columns and tables.   Short field names without
spaces or underscores are convenient for
programming, querying, and other manipulations.
Instead of spaces or underscores, use IntraCaps
(upper case letters within the variable name) to
distinguish words, e.g. StudyID, FName,
FdDisord, or ExamDate. Table names should be
singular, e.g. Baby instead of Babies, Exam
instead of Exams.
49
Guidelines for Data Management in Clinical
Research
Obtain baseline demographic and clinical
information about members of the study population
from existing computer databases.   Avoid
re-entering data which are already available (in
digital formats) from other sources. In the
JIFee Study, the patient demographic data and
contact information are obtained from the
hospital database. Computer systems can almost
always produce text-delimited or
fixed-column-width character files that the
database management system can import.
50
Guidelines for Data Management in Clinical
Research
Minimize the extent to which study measurements
are recorded on paper forms.   Enter data
directly into the computer database or move data
from paper forms into the computer database as
close to the data collection time as possible.
When you define a variable in a computer
database, you specify both its format and its
domain or range of allowed values. Using these
format and domain specifications, computer data
entry forms give immediate feedback about
improper formats and values that are out of
range. The best time to receive this feedback is
when the study subject is still on site.
51
Guidelines for Data Management in Clinical
Research
Back up the database regularly and check the
adequacy of the back up procedure by periodically
restoring a file from the back up medium.
52
Desktop DBMS
The processing of records is done by the desktop.
The server simply stores files (file server).
Microsoft Access Claris Filemaker
Pro Paradox Microsoft Visual FoxPro Dataease
53
Client-Server DBMS
The processing of records is done by the server.
The desktop manages the screen, but passes
queries on to the server. (Just to confuse
things, MS Access can be a client for SQL Server,
and other enterprise systems. The ultimate in
thin clients is a browser (Internet Explorer).
In this case, the server is an intranet or
internet database server.)
Microsoft SQL Server Oracle Informix Sybase
Write a Comment
User Comments (0)
About PowerShow.com