Title: The Tissue Microarray Data Exchange Specification
1The Tissue Microarray Data Exchange
Specification Presented for Cambridge
Healthtech Institute Microarrays in
Medicine Boston, MA April 26, 2004 Jules J.
Berman, Ph.D., M.D. Program Director for
Pathology Informatics Cancer Diagnosis
Program National Cancer Institute National
Institutes of Health Rockville, MD This
presentation is a U.S. government-sponsored work
in the public domain
2In brief The TMA Specification is an open
access document that can be used without any
restriction. Its development was sponsored by the
NCI and by the Association for Pathology
Informatics All the documents and software that
you might need to obtain, understand and
implement the specification are available in two
recently published open access manuscripts.
3Basics of the specification Jules J Berman, Mary
Edgerton and Bruce Friedman.The tissue microarray
data exchange specification a community-based,
open source tool for sharing tissue microarray
data. BMC Med Inform Decis Mak. 2003 May 2335
Real-world implementation example Jules J
Berman, Milton Datta, Andre Kajdacsy-Balla,
Jonathan Melamed, Jan Orenstein, Kevin Dobbin,
Ashok Patel, Rajiv Dhir, Michael J Becich. The
tissue microarray data exchange specification
implementation by the Cooperative Prostate Cancer
Tissue Resource. BMC Bioinformatics 2004 Feb 27,
519
4Why is it important to have a data exchange
specification for TMAs? The greatest value of
TMAs is the ability to link TMA data with data
from other TMAs and from other databases that
inform on the data contained in the TMA database.
That value is essentially untapped because
there has been no way to publish, exchange, merge
and link TMA datasets in a manner that everyone
can use and understand. The data exchange
specification provides a common intermediate
structure for TMA data that can be used to
exchange data between different TMA databases.
5Analagous situation Wordperfect (different
versions) Word (different versions) Abiword Postsc
ript Pdf One vendors software often cannot open
files prepared in another vendors software. But
any good word processor should be able to export
a file as an RTF file (simple ascii with markup
for formatting), and should be able to import the
RTF file and convert it to their preferred
proprietary format.
6- We wanted to make a flexible specification for
TMAs that would permit researchers with
proprietary systems to port their TMA data into a
file that could be easily disassembled and
re-assembled into other formats. - The basic properties of the file
- Self-describing
- Made from commonly understood data structures
- Extremely simple (most of our stakeholders are
not sophisticated bioinformaticians, computer
scientists, or metadata experts) - Infinitely scalable (can be endlessly combined
with other data sources)
7The first draft of the specification was
developed through open workshops held at
meetings sponsored by the Association for
Pathology Informatics and the National Cancer
Institute
8May 30, 2001. Ann Arbor, Michigan. Chair of
speaker session Mark A Rubin. Speakers David
Rimm, Steve Bova, Matt Van de Rijn, Jules
Berman Oct. 6, 2001. Pittsburgh, PA and
co-sponsored by The National Cancer Institute.
Chair, Mary Edgerton. Speakers Olli Kallioniemi,
Chris Chute, Richard Lieberman, Paul Spellman.
Chair of Data Exchange Workshop Mary Edgerton.
May 22, 2002. Ann Arbor, Michigan and
co-sponsored by the National Cancer Institute.
Chair of Speaker session Mark A. Rubin.
Speakers James Bacus, Angelo de Marzo, Peggy
Porter, David Rimm and Guido Sauter. Chair of
Data Exchange Workshop Dr. Mary
Edgerton. October 4, 2002. Held in conjunction
with Advancing Pathology Informatics, Imaging and
the Internet, Pittsburgh, PA. Chair of speaker
session Mary Edgerton. Speakers Steve Hewitt,
Ulysses Balis. Chair of Data Exchange Workshop
Mary Edgerton.
9Specification is XML XML allows heterogeneous
systems to communicate and exchange their data It
achieves this through metadata (data about data).
Can produce an ideal document that completely
describes itself, including all data and all
metadata.
10Four required sections 1) Header, containing
the specification Dublin Core identifiers, 2)
Block, describing the paraffin-embedded array of
tissues, 3)Slide, describing the glass slides
produced from the Block, and 4) Core, containing
all data related to the individual tissue samples
contained in the array.
11 Eighty Common Data Elements (CDEs), conforming
to the ISO-11179 specification for data elements
constitute XML tags used in the TMA data exchange
specification. Only a hand-ful of these are
required in TMA files. A set of six simple
semantic rules describe the complete data
exchange specification. Anyone using the data
exchange specification can validate their TMA
files using a software implementation written in
Perl and distributed as a supplemental file with
this publication.
12lthistogt    lttmagt    ltheadergt    lt/headergt       ltb
lockgt          ltslidegt          lt/slidegt         Â
ltcoregt          lt/coregt       lt/blockgt    lt/tmagt lt
/histogt
13lt?xml version"1.0" ?gt lthisto xmlns"http//65.222
.228.150/jjb/tma_cde.htm"
xmlnscpctr"http//www.pathology.pitt.edu/pdf/cpc
tr/cpctr-cde-v22.pdf" xmlnsdc"http//dubl
incore.org"gt lttmagt ltheadergt ltdctitlegtCooperative
Prostate Cancer Tissue Resource (CPCTR) Prostate
Cancer Microarray 1-2lt/dctitlegt ltdccreatorgtCPCTR
lt/dccreatorgt ltdcsubjectgtProstate tissue
microarraylt/dcsubjectgt ltdcdescriptiongtCPCTR TMA
XML datafile for Microarray 1-2lt/dcdescriptiongt lt
dcpublishergtCPCTRlt/dcpublishergt ltdccontributorgt
CPCTRlt/dccontributorgt ltdcdategt2003-10-05lt/dcdat
egt ltdctypegtProstate Cancer Tissue
Microarraylt/dctypegt
14ltrecordgt ltcpctrIMS_Case_Identifiergt1053371588lt/c
pctrIMS_Case_Identifiergt ltcpctrLocation_CodegtG6
1lt/cpctrLocation_Codegt ltcpctrRacegtCaucasianlt/cp
ctrRacegt ltcpctrYear_of_Birthgt1926lt/cpctrYear_o
f_Birthgt ltcpctrYear_of_Diagnosisgt1991lt/cpctrYea
r_of_Diagnosisgt ltcpctrYear_of_Prostatectomygt1991
lt/cpctrYear_of_Prostatectomygt
ltcpctrIs_Residual_Carcinoma_PresentgtYeslt/cpctrIs
_Residual_Carcinoma_Presentgt ltcpctrMost_Prominen
t_Histologic_Typegtadenocarcinoma NOS aka
acinarlt/cpctrMost_Prominent_Histologic_Typegt
ltcpctrGleason_Primary_Gradegt4lt/cpctrGleason_Prim
ary_Gradegt ltcpctrGleason_Secondary_Gradegt3lt/cpct
rGleason_Secondary_Gradegt ltcpctrGleason_Sum_Sco
regt7lt/cpctrGleason_Sum_Scoregt
ltcpctrNumber_of_Nodes_Examinedgt5lt/cpctrNumber_of
_Nodes_Examinedgt ltcpctrNumber_of_Nodes_Positivegt
0lt/cpctrNumber_of_Nodes_Positivegt
ltcpctrDistant_Mets__1_at_Time_of_DiagngtBladderlt/c
pctrDistant_Mets__1_at_Time_of_Diagngt
ltcpctrpT_StagegtpT3blt/cpctrpT_Stagegt
ltcpctrpN_StagegtpN0lt/cpctrpN_Stagegt
ltcpctrpM_StagegtpMXlt/cpctrpM_Stagegt
ltcpctrVital_StatusgtAlivelt/cpctrVital_Statusgt
ltcpctrYear_of_PSA_Recurrencegtlt/cpctrYear_of_PSA_
Recurrencegt ltcpctrPSA_Recurrence_StatusgtUnknownlt
/cpctrPSA_Recurrence_Statusgt ltcpctrRecurrence_F
ree_Yeargtlt/cpctrRecurrence_Free_Yeargt ltarray_loca
tionsgtrow 9, column 18row 10, column
4lt/array_locationsgt lt/recordgt
15- Implementing the specification
- We provide
- The specification (XML data structure and 80
common data elements) - A perl-script validator
- A paper that describes a real-world
implementation (porting TMA data from an excel
spreadsheet) - You provide
- Whatever database you like for storing your TMA
data - A script (java, perl, python, whatever) that can
port your data into the TMA specification. - A script that can port TMA files in the data
exchange specification into whatever database you
prefer.
16 Future?