Title: Looking into the future
1Looking into the future
- Providing Social Science Data Services
- Jim Jacobs
2First principles
- Metadata are data about data -- information about
information. - Its all about having complete, accurate,
re-usable metadata. - Software to process the metadata is secondary. We
should be able to have metadata today that we
know will be usable in unforeseeable computing
environments (operating systems, software,
hardware).
3First principles
- Comprehensive
- Complete
- Uncompromised
- Consistent
- Flexible
- Sharable
- Usable and re-usable
- Preservable
- Parseable by computer
- Documented
- Non-proprietary
4How XML fits in
- XML is designed to be parseable with generic
tools. - XML can encode meaning and can be
self-documenting - XML is non-proprietary, open, flexible.
5How XML fits in
XML is designed to make it easy to find and
usejust the elements you need from a large
document.
Cherry picking
6How XML fits in
ltstdyDscrgt ltcitationgt
lttitlStmtgt lttitlgtGreat Power Wars,
1495-1815lt/titlgt ltIDNogt9955lt/IDNogt
lt/titlStmtgt ltrspStmtgt
ltAuthEntygtLevy, Jack S.lt/AuthEntygt
lt/rspStmtgt ltprodStmtgt
ltfundAggtNational Science Foundation.lt/fundAggt
ltgrantNogtSES86-10567lt/grantNogt
lt/prodStmtgt ltdistStmtgt
ltdistrbtr abbr"ICPSR" affiliation"Institute for
Social Research, University of Michigan"
URI"http//www.icpsr.umich.edu"gtInter-university
Consortium for Political and Social
Researchlt/distrbtrgt ltdistDate
date"1994-05-20"gt1994-05-20lt/distDategt
lt/distStmtgt ltserStmtgt lt/serStmtgt
ltverStmtgt ltdateAddedgt1994-05-20lt/
dateAddedgt ltdateUpdatedgt1994-05-20lt/
dateUpdatedgt lt/verStmtgt
ltbiblCitgtLevy, Jack S. GREAT POWER WARS,
1495-1815 Computer file. New Brunswick, NJ and
Houston, TX Jack S. Levy and T. Clifton Morgan
lttitlgtGreat Power Wars, 1495-1815lt/titlgt
You can cherry-pick just what you need from a
large XML document
7From legacies to the future
- HTML
- PDF
- Any stat package
- Nesstar, SDA, Dataverse
- Library OPAC
- Google
- OAI, METS, etc.
- RSS, RDF
- GIS
- DDI 3, 4
- SAS
- SPSS
- OSIRIS
- PDF
- Paper
- Data dictionary
- Etc.
DDI
8From many contributors to many uses
- The web
- Live documents
- Databases
- publications
- Data archives
- Data libraries
- Institutional repositories
- Secondary analysis
- New research
- New knowledge
- researcher
- Data collector
- Analyst
- Data producer,distributor
- Data archivist
- Data librarian
- Users of statistics
- Governmentagency
DDI
9OAIS Functional Model
OAIS Functional Model
Archival Storage
Access
10Information Packages
OAIS Information Model
SIP
DIP
DIP
AIP
DIP
SIP
11Data stewardship life cycle
12DDI Production
13DDI Use
14DDI will enable transformation
- New kinds of data discovery (beyond indexing)
- Metadata as a primary resource (metadata as data)
15Metadata for data discovery
- ICPSR already uses DDI metadata to create its
Variables database. - Nesstar and Dataverse software use metadata to
produce searchable indexes of data repositories - In the future we should see the harvesting of DDI
from many repositories to create indexes across
collections. (oclc.org/oaister/) - In the future well see data discovery by concept
and methodology and geography and time period,
not just keyword.
16Metadata as data
- By structuring metadata according to a
methodology (the lifecycle-of-data approach), we
create metadata that we can treat as data. - We can analyze metadata the way we would analyze
any data file. - As more metadata of this kind are created, we are
accumulating a body of information that makes it
possible to study trends across time and
geography.
17Metadata as data
- The technical documentation for the Army's Korean
conflict casualty electronic records file has
casualty codes that were never used in the data
files. - The presence of codes in the metadata for injury
by lethal gas and by radiation exposure suggests
that Army personnel who designed this
record-keeping system expected the possible use
of those as weapons. Examination of the data
alone would have missed this suggestion. - The codes for 'place of casualty' included, in
addition to South Korea Sector and North Korea
Sector, the Indo-China Sector, Tibet Sector,
Mongolia Sector, Honan Sector (sic), Manchuria
Sector, North Japan Sector, South Japan Sector,
South China Sector, and Formosa Sector."
18Metadata as data
- A researcher at the Danish Data Archive is doing
a qualitative analysis of the questionnaires used
in seven surveys about ethnic minorities in
Danish society, "with the purpose of showing how
surveys ... mirror and project societal
understandings of the subjects under
investigation."
19Metadata as data
- Wendy Thomas of the Minnesota Population Center
examined U.S. Census metadata from 1790 through
2000 and compared the changing concept of race
and ethnicity as embodied in the categories used
by the Census Bureau questions over time. Those
concepts are only documented in the metadata, not
the Census data files themselves.