What is a LIMS - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

What is a LIMS

Description:

What is a LIMS ? LIMS Laboratory Information Management System. Computerized system that tracks and ... Center for Biomedical Genomics and Informatics ... – PowerPoint PPT presentation

Number of Views:438
Avg rating:3.0/5.0
Slides: 56
Provided by: CurtisJ
Category:
Tags: lims

less

Transcript and Presenter's Notes

Title: What is a LIMS


1
What is a LIMS ?
  • LIMS Laboratory Information Management System
  • Computerized system that tracks and manages
    samples through a protocol
  • interfaces for both laboratory personnel and
    instruments
  • helps support high throughput operations

2
Types of LIMS
  • Enterprise
  • cover all aspects of scientific research
  • data capture
  • reagent use and purchasing tracking
  • Protocol-specific
  • cover a specific protocol
  • data capture

3
sample management
inventory management
data collection
instrument management
data warehouse
chain of custody
resource management
data analysis
4
sample management
inventory management
data collection
instrument management
data warehouse
chain of custody
resource management
data analysis
5
Microarrays
  • large-scale sequencing projects like the human
    genome project have given us the ability to
    examine the complete transcriptome (the
    transcriptional response to an environmental
    challenge
  • new (and expensive) technology
  • large output of data

6
Microarray Data
  • produced in a tabular format (rows and columns)
  • users are relatively unsophisticated in
    computational and informatic skills
  • much data ends up in spreadsheets which lack the
    capability to handle rich datasets (no complex
    query or visualization capabilities)

7
Microarray Databases
  • plethora of databases and schemas
  • three types of interactions
  • local data management
  • publication of data in a repository
  • analysis of repository data
  • the latter two interactions require a certain
    level of sophistication to consolidate exogenous
    data

8
Microarrays Concept
9
Microarrays Raw Data
10
Microarrays Data
1 AC3.5 Member of the aminopeptidase protein
family 5 10337580 AC3.5 20834 2/25/00 0 849 196 6
53 650 144 506 438 97 341 199 155 161 924 1.290513
0.774885 1.913864 0.522503 0.734 0.688 0.870632
0.71 0.71 1787 51 1802 66 1 1 1 1 A 1 0 U 2 A
C3.7 Member of the UDP-glucuronosyltransferase
protein family 5 10344769 AC3.7 20835 2/25/00 4 23
4 186 48 188 154 34 127 104 23 187 163 79 594 1.41
1764 0.708333 2.093682 0.477628 1.2 0.116 0.219089
0.32 0.21 1798 953 1809 964 1 1 1 1 A 2 2 U
3 AC3.8 Member of the UDP-glucuronosyltransferase
protein family 5 10347864 AC3.8 20836 2/25/00 0 3
63 198 165 348 155 193 235 105 130 254 221 121 593
0.854922 1.169696 1.267871 0.788724 1.241 1.046 0
.858487 0.25 0.29 1788 71 1801 84 1 1 1 1 A 3
0 U
11
Local Databases
  • make data available to local researchers
  • may have WWW-based tools
  • database and compute server centralized and
    closely linked

12
GeneX
  • National Center for Genome Resources
  • www.ncgr.org/research/genex
  • relational database with Perl, R, and Java
    components

13
GeneX Features
  • Free
  • integrated and extensible toolset
  • multiple types of array technology in single
    database
  • experiment-centric design
  • supports an XML specification to allow
    interchange between databases

14
BASE
  • BioArray Software Environment
  • http//base.thep.lu.se/
  • Relational database (MySQL) with WWW interface
    built upon C/javascript/PHP

15
BASE Features
  • Free
  • MIAME compliant
  • user administration
  • array production
  • sample management

16
(No Transcript)
17
Repositories
  • provide public access to multiple datasets
  • create standard database similar to sequence
  • automatic deposition of data upon publication

18
Stanford Microarray Database
  • genome-www4.stanford.edu/MicroArray
  • www-based database and a dataset distribution
    system
  • relational database
  • perl/java toolset
  • supports some complex querying as well as
    browsing for datasets
  • datasets distributed as compressed flat-files
    and/or graphical images

19
GEO
  • Gene Expression Omnibus
  • www.ncbi.nlm.nih.gov/geo/
  • data repository and distribution system
  • precomputed definitions and descriptions of data
    to aid in data set retrieval

20
Data Interchange
  • Proposed interchange standard
  • MIAME
  • Proposed OMG exchange standards
  • MAML
  • GEML
  • NetGenics

21
MIAME
  • Minimal Information About a Microarray Experiment
  • www.mged.org/Annotations-wg/
  • Goal
  • specify the minimum amount of information needed
    to ensure interpretability
  • facilitate creation of repositories
  • encourage journals and funding agencies to
    require submission of data to repositories

22
Design Considerations
  • reflect data accurately
  • efficient access to data
  • efficient storage of data
  • compatibility with other databases

23
Data Representation
External Sequence Databases
GIPO
GIPO
GIPO
GIPO
GIPO
GIPO
GIPO
spots
spots
spots
spots
spots
spots
spots
Conditions
????
Experiment
Sample
Tissue
Species
Protocol
24
MIAME Considerations
  • Experimental design the set of hybridization
    experiments as a whole
  • Array design each array used and each element
    (spot) on the array
  • Samples samples used, extract preparation and
    labeling
  • Hybridizations procedures and parameters
  • Measurements images, quantitation,
    specifications
  • Normalization controls types, values,
    specifications

25
(No Transcript)
26
Background
  • Center for Biomedical Genomics and Informatics
  • Engaged in a number of gene expression studies
    ranging from liver disease, osteoarthritis and
    cancer
  • Species studies human and rat
  • cDNA in house printed slides (5K human chip, 40K
    human chip)

27
GMU Clinical Genomics
  • studying the relationship between disease and
    genome expression
  • clinical measurements
  • standard battery of tests
  • genomic measurements
  • gene expression levels
  • genetic variation
  • derive correlation between clinical/genomic
    factors and treatment outcome

28
Gene Expression Queries
Patient Demographic Queries
Microarray Data
Clinical Data
Clinical Database
Expression Database
29
Dataflow
Clinical Tests and Samples
Clinical Database
Analysis (Genespring, etc.)
RNA Extraction Protocols
LIMS
WWW Access (GENet)
Researchers
Microarray Experiment Protocols
BASE
30
Generic difference in gene expression patterns
  • We do this via visual inspection following
    clustering (genes and samples)
  • Often we will reduce the number of genes by some
    criterion (e.g., cluster only on genes that are
    2-fold expressed in at least one sample/category)
  • Often we will group the number of samples by
    condition in order to compensate for the lack of
    replicates

31
Clustering of genes and samples
32
Disease vs. Normal
33
Clinical Data Challenges
  • Collection
  • text formats
  • disperse sources
  • Storage
  • heterogenous
  • incomplete
  • degenerate
  • Protection
  • HIPPA regulations

34
Large Clinical Databases
  • Nadkarni and Brandt (1998) JAMIA 5, 511
  • Issues involved in data mining EAV databases
  • Nadkarni et al. (1999) JAMIA 6, 478
  • Extension of EAV with classes and relationships
  • Chen et al. (2000) JAMIA 7, 475
  • Performance of EAV/CR

35
Issues with Clinical Data
  • Too many columns
  • Over 43,000 attributes
  • Sybase capacity
  • 1024 columns per table
  • 32 indexed
  • up to 50 tables per query
  • Sparse data
  • Multiple entries

36
Sample Clinical Table
37
Solution EAV
  • Entity-Attribute-Value
  • form of row modeling
  • turns columns into rows
  • eliminates sparse data
  • reduction in database size
  • Faster single value queries
  • Pushes depth rather than width

38
EAV Clinical Table
39
Accessing Single Attributes
Traditional
SELECT patient, date, BMI FROM relTable WHERE
patient 1017 AND BMI !NULL
EAV
SELECT patient, date, value FROM EAVTable WHERE
patient 1017 AND test BMI
40
Limitations for Data Mining
  • Complex boolean queries tough
  • no set operations
  • Complex SQL
  • nested subqueries
  • self-joins
  • Performance

41
Ad Hoc Query Interface
  • Presents a user interface which generates the
    required complex SQL queries

42
EAV/CR
  • Simulation of a complex logical schema using an
    extensive yet simple physical schema
  • Addition of object tables to contain like
    attributes
  • strong data typing
  • Creates metadata about objects to help describe
    the relationships between data objects

43
(No Transcript)
44
(No Transcript)
45
Testing EAV/CR
  • Data sources
  • used microbiology data from VA patients
  • extracted from existing DB
  • loaded in EAV/CR schema
  • scaled by replicating data with new IDs
  • Benchmarking
  • two attribute centered queries
  • two entity-centered queries

46
(No Transcript)
47
(No Transcript)
48
Results
  • Comparable speeds for entity queries
  • massive hit for attribute query
  • up to 10-fold worse
  • "ancestor" improvement
  • represents denormalization
  • space for performance trade-off

49
EAV for Clinical Genomics ?
  • performance issues a problem
  • data mining on attributes
  • I/O issues
  • full EAV not feasible
  • partial row modeling a good option

50
Clinical Database
  • Used CGO database out of Univ of Arkansas as a
    template
  • Myeloma database
  • Want to generalize it for any cancer

51
(No Transcript)
52
Altering CGO
  • remove gene chip references
  • affymetrix
  • MIAME/MAGE non-compliant
  • attach to GeneX
  • generalize clinical system
  • row model test results
  • row model questionaires

53
Patient
LabReport
LabTest
id birthdate race occupation ...
id test_cat_id(FK) test_id(FK) patient_id(FK) test
_date result
test_cat_id test_id test_cat_desc test_desc
Questionaire
Alcohol
id patient_id
id study_id(FK) Q01 Q02 Q03
...
54
HIPAA
  • Health Insurance Portability and Accountability
    Act
  • ensure the integrity and confidentiality of
    patient information, protect against reasonably
    anticipated threats or hazards to the security or
    integrity of the information or unauthorized uses
    or disclosures of the information

55
Clinical Data Flow
clinical database
redacted database
cleansing protocol
publication services
redacted database
research protocol
Write a Comment
User Comments (0)
About PowerShow.com