Classification of Business Documents - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Classification of Business Documents

Description:

... repetitive structures will be an important characteristic ... Repetitive structures appear to be as important a characteristic as the tree depth, if not more. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 16
Provided by: michae146
Category:

less

Transcript and Presenter's Notes

Title: Classification of Business Documents


1
Classification of Business Documents
  • DITA BusDocs Subcommittee Meeting
  • 21 January 2008
  • Presentation with Notes from the Focus Group
    Meeting of 14 Jan 2008

2
Meeting Summary
  • Classification focus group members include Howard
    Schwartz, Eric Severson, Amber Swope, and Michael
    Boses. Howard was not able to attend the meeting
    due to travel
  • Michael presented the enclosed PowerPoint as a
    starting point for the discussion
  • Discussion was captured and incorporated into the
    PowerPoint under the heading, Notes
  • Next steps
  • Eric will work on a preliminary mapping of a
    limited number of document types that illustrate
    the mapping
  • The focus group will present a summary of what we
    have discussed to the full subcommittee during
    the January 21 meeting

3
Introduction - 1
  • The need for a classification system for business
    documents arises from
  • The desire to indentify the specific document set
    that is being addressed by the subcommittee, as
    well as the rationale behind that selection
  • The ability to further analyze the document set
    using a refinement of the same characteristics
    used to classify them

4
Introduction - 2
  • What type of characteristics are important?
  • Documents can be classified in many ways. The
    most common way used is a semantic classification
    based upon the textual content of the document
  • The subcommittee approach is different since we
    want to classify documents based upon their
    structural characteristics since it is the
    structure of business document that will need to
    be harmonized with DITA

5
Potential Structural Characteristicto Consider
when Classifying
  • Is it a narrative?
  • Narrative complexity
  • Document length
  • Tree depth
  • Tree balance
  • Table frequency
  • Table complexity
  • Graphic frequency
  • XML vocabularies
  • Transclusions
  • Notes Eric feels that repetitive structures will
    be an important characteristic
  • Amber suggests that whether a document references
    external system data might be important as well
  • Howard Understanding the business purpose might
    be important as a characteristic.
  • Eric could be interesting but maybe not the
    driver
  • We will capture the information as part of the
    analysis
  • Ann Its possibly a different level of
    classification
  • Josef translation should not in itself change
    the structure, but perhaps what we want to look
    at is documents with variants in them.
  • Howard--Business documents will have different
    challenges than technical publications

6
  • Higher level model
  • Structures that are not linked to semantics that
    can then be correlated to documents for different
    usage
  • The end-game is to say where does DITA fit in?
  • semantic neutral way of classifying
  • Apply the general to specific usages later
  • Eric concept, task, and reference were
    specializations to begin withare they even
    meaningful for business documents?
  • Howard-- Informational, vs. persuasive? Intent or
    purposesdoes it correlate to structuredoes it
    dictate structure, does it matter for reuse?

7
First-level Classification
  • Notes while the concept is good, none of us is
    happy with the terminology. In particular, we
    need to come up with an alternative for Forms.
  • The purpose of this slide is to say that there
    are business documents that are out-of-scope.
    This is our first level?

8
Form-Narrative Scale
Subject Document
  • Metric
  • Ratio of total elements to total words
  • Notes Eric What is a form? How do we keep from
    excluding documents with structures that we need
    to address, because we called a form? Something
    to describe form that isnt based upon its
    implementation. XML blurs the distinction
    between documents and data
  • A Elements are structural in nature. We need
    to define what type of elements we will use to
    arrive at the ratio

9
Most Significant Characteristic?
  • Once we have established that it is a narrative
    document, what is the next most significant
    characteristic to examine?
  • Notes, general agreement with the presentation,
    that it would be the tree depth of the document

10
  • Eric- DITA is trying to apply best practices to
    writing is this a fundamental thing about
    writing or is it just tech pubs?
  • Should there be a more generic task that could be
    specialized into a tech pubs task?
  • Ann- what we have now is a specialization for
    tech docs and so it fitsit is possible to start
    at higher at a more generalized level
  • Interesting that paragraphs have topic
    sentence. The topic sentence may be an important
    bridge that allows us to introduce the concept of
    topic based authoring to the business community
  • Business documents are maturingare tech docs
    more mature? Tech docs are most often not read
    for pleasure and are random access information
  • Writing for reuse has a significant impact on how
    content is writtendoes it invalidate some of our
    common business document structures?

11
  • Types of reuse
  • The ability to flow one persons content into
    another persons content and have it hold up
    contextually
  • The ability to have content presented as a result
    of a query or aggregation and have it hold its
    integrity as a single unit of information
  • Will the message change depending upon how
    someone arrived at iteither in the original
    context or by itself?
  • All this ties back to the maturity model that
    will help organizations move to a best practice
    approach to authoring. This will give us
    something valuable for business and acceptable to
    the DITA community.
  • Now our classification can also correlate to this
    issue.

12
The Need to Quantify Hierarchy
  • The author of the highly nested document is using
    structure to communicate semantics.
  • Hierarchical Scale
  • Ratio of total transitions in hierarchy to total
    elements
  • Notes General agreement. No specific comments

13
Qualifying Narrative Density
  • Narrative Density Scale
  • Average paragraph length for paragraphs gt 100
    characters
  • Notes no specific comments

14
Recap of Characteristic Importance
  • Is it a Narrative?
  • Narrative complexity
  • Document length
  • Tree depth
  • Tree balance
  • Table frequency
  • Table complexity
  • Graphic frequency
  • XML vocabularies
  • Transclusions
  • Notes Eric- we need to address repetitive
    structures (i.e., topics) and constrained
    structures. What do repetitive structures and
    constrained structures mean to DITA?
  • Michael the number of paragraphs per section
    seems importantbut what is a section?

15
Notes Additional Discussion
  • Discussion of an SOP as it relates to repeating
    structures
  • One approach to an SOP is for it to be very
    verbose, with only 4-5 structures
  • Another approach is for it to be very terse, with
    20 structures that add semantics to the content.
  • The goal of XML in general when applied to
    narrative documents, is to imply more and more of
    the semantics through the document structure
  • Document linearity with repeating structures as
    a structural characteristic provides random
    access to the information in the document.
  • Repetitive structures appear to be as important a
    characteristic as the tree depth, if not more.
    Repetitive structures to a degree indicate
    whether the document is a reference or something
    intended to be read end-to-end?
  • Repetitive structures cause a document to
    actually be a collection of mini-documents, each
    that could stand alone
Write a Comment
User Comments (0)
About PowerShow.com