Title: XML Web Services: XML Metadata in Acrobat PDF Files
1XML Web Services XML Metadata in Acrobat PDF
Files
- Brand Niemann
- XML Web Services Evangelist (My Internet
Handle) - US EPA Office of Environmental Information
- May 2, 2002
2Overview
- 1. E-Gov E-Records Management Project
- 2. Adobe 5.0 Document Metadata
- 3. Repurposing Adobe PDF Documents
- 4. Large Legacy Document Collections
- 5. Contact Information
31. E-Gov E-Records Management Project
- NARA
- Has the lead on the E-Gov E-Records Management
Project. - Identified two study areas
- Accepting new electronic record formats such as
scanned images, PDF, etc. - Develop archival and records management metadata
to facilitate transfer of these records. - EPA
- Interested in leading/co-leading the second study
area as part of ERDMS work.
41. E-Gov E-Records Management Project
- Two milestones
- September 30, 2002
- Define standardized organizational strategic
approaches to phased implementation of electronic
recordkeeping in a federal agency. - December 31, 2002
- Develop metadata record needed to use XML as the
common language for transferring permanent
e-records to NARA. - Goal
- Propose and pilot test common metadata
stylesheets for NARA and federal agencies use to
transfer electronic records.
51. E-Gov E-Records Management Project
- Problem with Acrobat PDF files
- NARA will not accept because not XML.
- NARA is in discussions with Adobe, Inc. about
this problem. - Adobe, Inc. position on XML
- XML and PDF compliment one another.
- My comments
- Actually one can produce a PDF from XML by using
an XML Formatting Objects Stylesheet (XSL-FO). - An Acrobat Plug-in provides PDF-to-XML
conversion - My comments
- This only works if the PDF is of the third type
(tagged) and it still needs additional work to
make a useful XML document.
62. Adobe 5.0 Document Metadata
- Viewing Document Metadata In Acrobat 5.0, Adobe
PDF files contain Document Metadata in XML
format. This Document Metadata contains (but is
not limited to) information that is also in the
Document Properties. Any changes made in the
Acrobat Document Properties dialog box are
reflected in the Document Metadata. Because
Document Metadata is in XML format, it can be
extended and modified using third-party products.
You can copy and paste the Document Metadata XML
source code. - To view the Document Metadata
- 1 Choose File, Document Properties, Document
Metadata. - 2 The Document Metadata dialog box displays all
the metadata embedded in the document. (Metadata
is displayed by schemathat is, in predefined
groups of related information.) The information
associated with each schema is visible by
default it can be hidden by clicking the
triangle next to the schema name. If a schema
doesnt have a recognized name, it is listed as
Unknown.The XML name space is contained in
parentheses after the schema name. - 3 To view the XML code, click View Source.You can
cut, copy, and paste XML code from the Metadata
Source View dialog box. Click OK to return to the
Document Metadata dialog box. - 4 Click OK to close the Document Metadata dialog
box, and click Cancel to close the dialog box
without making any changes. - See next slides.
72. Adobe 5.0 Document Metadata
82. Adobe 5.0 Document Metadata
92. Adobe 5.0 Document Metadata
102. Adobe 5.0 Document Metadata
- ltrdfRDF xmlnsrdf'http//www.w3.org/1999/02/22-r
df-syntax-ns' - xmlnsiX'http//ns.adobe.com/iX/1.0/'gt
- ltrdfDescription about''
- xmlns'http//ns.adobe.com/pdf/1.3/'
- xmlnspdf'http//ns.adobe.com/pdf/1.3/'gt
- ltpdfModDategt2001-07-30T173238-0600lt/pdfModDat
egt - ltpdfCreationDategt2001-07-30T173204-0600lt/pdfC
reationDategt - ltpdfProducergtAcrobat Distiller 4.05 for
Windowslt/pdfProducergt - lt/rdfDescriptiongt
- ltrdfDescription about''
- xmlns'http//ns.adobe.com/xap/1.0/'
- xmlnsxap'http//ns.adobe.com/xap/1.0/'gt
- ltxapModifyDategt2001-07-30T173238-0600lt/xapMod
ifyDategt - ltxapCreateDategt2001-07-30T173204-0600lt/xapCre
ateDategt - lt/rdfDescriptiongt
- lt/rdfRDFgt
113. Repurposing Adobe PDF Documents
- See Acrobat 5.0 Help
- Repurposing Adobe PDF Documents (pages 82-90) and
Working with PDF (pages 103-107) - Creating tagged Adobe PDF documents (need to do
for accessibility anyway). - Saving Adobe PDF documents to other formats (RTF
and XML). See next slides. - But still need XML authoring tools and expertise
- I have done this for lots of EPA documents in my
XML Web Services training.
123. Repurposing Adobe PDF Documents
133. Repurposing Adobe PDF Documents
144. Large Legacy Document Collections
- Need industrial-strength XML tools and software
platforms for efficient cost-effective
electronic document management solutions (c.f.) - eXtensible Markup Language (XML) Web Services for
Legacy Document Collections, Brand Niemann and
David Eng, April 5, 2002, to appear in
InfoAccess. - XML Web Services Training (c.f.)
- Unit 14 Toxics Release Data.
- Unit 18 Superfund Data (see next slides).
154. Large Legacy Document Collections
164. Large Legacy Document Collections
175. Contact Information
- Brand Niemann, Ph.D.
- USEPA Headquarters, EPA West, Room 6143D
- Office of Environmental Information, MC 2822T
- 1200 Pennsylvania Avenue, NW, Washington, DC
20460 - 202-566-1657
- niemann.brand_at_epa.gov
- EPA http//161.80.70.167
- Outside EPA http//130.11.44.140