Classification of Business Documents - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Classification of Business Documents

Description:

... include Howard Schwartz, Eric Severson, Amber Swope, and Michael Boses. ... Amber suggests that whether a document references external system data might be ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 13
Provided by: michae146
Category:

less

Transcript and Presenter's Notes

Title: Classification of Business Documents


1
Classification of Business Documents
  • DITA BusDocs Subcommittee Meeting
  • January 14th, 2008
  • Presentation with Notes from the Meeting

2
Meeting Summary
  • Classification focus group members include Howard
    Schwartz, Eric Severson, Amber Swope, and Michael
    Boses. Howard was not able to attend the meeting
    due to travel
  • Michael presented the enclosed PowerPoint as a
    starting point for the discussion
  • Discussion was captured and incorporated into the
    PowerPoint under the heading, Notes
  • Next steps
  • Eric will work on a preliminary mapping of a
    limited number of document types that illustrate
    the mapping
  • The focus group will present a summary of what we
    have discussed to the full subcommittee during
    the January 21 meeting

3
Introduction - 1
  • The need for a classification system for business
    documents arises from
  • The desire to indentify the specific document set
    that is being addressed by the subcommittee, as
    well as the rationale behind that selection
  • The ability to further analyze the document set
    using a refinement of the same characteristics
    used to classify them

4
Introduction - 2
  • What type of characteristics are important?
  • Documents can be classified in many ways. The
    most common way used is a semantic classification
    based upon the textual content of the document
  • The subcommittee approach is different since we
    want to classify documents based upon their
    structural characteristics since it is the
    structure of business document that will need to
    be harmonized with DITA

5
Potential Structural Characteristicto Consider
when Classifying
  • Is it a narrative?
  • Narrative complexity
  • Document length
  • Tree depth
  • Tree balance
  • Table frequency
  • Table complexity
  • Graphic frequency
  • XML vocabularies
  • Transclusions
  • Notes Eric feels that repetitive structures will
    be an important characteristic
  • Amber suggests that whether a document references
    external system data might be important as well

6
First-level Classification
  • Notes while the concept is good, none of us is
    happy with the terminology. In particular, we
    need to come up with an alternative for Forms.

7
Form-Narrative Scale
Subject Document
  • Metric
  • Ratio of total elements to total words
  • Notes Eric What is a form? How do we keep from
    excluding documents with structures that we need
    to address, because we called a form? Something
    to describe form that isnt based upon its
    implementation. XML blurs the distinction
    between documents and data
  • A Elements are structural in nature. We need
    to define what type of elements we will use to
    arrive at the ratio

8
Most Significant Characteristic?
  • Once we have established that it is a narrative
    document, what is the next most significant
    characteristic to examine?
  • Notes, general agreement with the presentation,
    that it would be the tree depth of the document

9
The Need to Quantify Hierarchy
  • The author of the highly nested document is using
    structure to communicate semantics.
  • Hierarchical Scale
  • Ratio of total transitions in hierarchy to total
    elements
  • Notes General agreement. No specific comments

10
Qualifying Narrative Density
  • Narrative Density Scale
  • Average paragraph length for paragraphs 100
    characters
  • Notes no specific comments

11
Recap of Characteristic Importance
  • Is it a Narrative?
  • Narrative complexity
  • Document length
  • Tree depth
  • Tree balance
  • Table frequency
  • Table complexity
  • Graphic frequency
  • XML vocabularies
  • Transclusions
  • Notes Eric- we need to address repetitive
    structures (i.e., topics) and constrained
    structures. What do repetitive structures and
    constrained structures mean to DITA?
  • Michael the number of paragraphs per section
    seems importantbut what is a section?

12
Notes Additional Discussion
  • Discussion of an SOP as it relates to repeating
    structures
  • One approach to an SOP is for it to be very
    verbose, with only 4-5 structures
  • Another approach is for it to be very terse, with
    20 structures that add semantics to the content.
  • The goal of XML in general when applied to
    narrative documents, is to imply more and more of
    the semantics through the document structure
  • Document linearity with repeating structures as
    a structural characteristic provides random
    access to the information in the document.
  • Repetitive structures appear to be as important a
    characteristic as the tree depth, if not more.
    Repetitive structures to a degree indicate
    whether the document is a reference or something
    intended to be read end-to-end?
  • Repetitive structures cause a document to
    actually be a collection of mini-documents, each
    that could stand alone
Write a Comment
User Comments (0)
About PowerShow.com