Clumps and Runners - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Clumps and Runners

Description:

It can also be important to recognize the difference between the main text of a work (the ... to maps to architecture, landscape, ... PowerPoint Presentation ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 12
Provided by: brand284
Category:

less

Transcript and Presenter's Notes

Title: Clumps and Runners


1
Clumps and Runners
Bamboo Workshop, Tucson AZ, January 13, 2009
John Unsworth
2
Exchange of ideas
  • A substantial amount of the work of the
    humanities is carried out in the form of the
    exchange of ideas. -- Anthony Cascardi
  • No ideas but in things -- William Carlos
    Williams

3
Things in the humanities
  • the whole variety of material cultural heritage,
    from images to sound to text to maps to
    architecture, landscape, and all manner of
    carrier media ... including sheep DNA in
    parchment.

4
Representing things
  • ... a problem usually discussed at a high level
    of abstraction, in digital humanities. I'd like
    to discuss it at a very prosaic level, instead,
    with text objects as the example.

5
TEI
  • "The Text Encoding Initiative Consortium is an
    international organization whose mission is to
    develop and maintain guidelines for the digital
    encoding of literary and linguistic texts. The
    Consortium publishes the Text Encoding Initiative
    Guidelines for Electronic Text Encoding and
    Interchange an international and
    interdisciplinary standard that is widely used by
    libraries, museums, publishers, and individual
    scholars to represent all kinds of textual
    material for online research and teaching."
  • http//www.tei-c.org/About/index.xml

6
Lessons of Google?
  • But why do any mark-up at all? Isn't the lesson
    of Google that all that really matters is the
    word, and tagging is superfluous? Not if you want
    to select out of Google Books
  • Fiction
  • Books about England
  • Books published in the 1900s
  • Books written by women

7
But structural markup? Really?
  • It turns out that, more often than not,
    paragraphs and chapters, lines and verses, are
    meaningful units of composition, and therefore
    they can be meaningful units of analysis. It can
    also be important to recognize the difference
    between the main text of a work (the chapters in
    a novel, say) and the paratext (table of
    contents, preface, running headers, etc.),
    especially if you're asking statistical questions
    about the text.

8
Surely we don't need to mess with the words,
though...
  • ... at the word level, it can be helpful (in a
    novel, for example) to ignore proper names, for
    example, so as to see more clearly what's going
    on with other kinds of words--but even at the
    word level, the information about the words is
    carried, ultimately, in tagging.

9
Interoperability
  • The TEI community believes. . . that "people use
    TEI in many different contexts for many different
    purposes to encode many different kinds of
    material."  But they also believe that this
    somehow, in some universe, achieves the TEI's
    stated goal of interoperability.  It really
    doesn't.  So if people are in fact encoding
    things in all sorts of different ways and for
    different purposes, then why shouldn't I chuck it
    all and roll my own?  You say that's it better
    not to go it alone.  
  • (Steve Ramsay, in email)?

10
Interoperability?
  • As we are coming to the end of this project --
    and returning to an earlier exchange of views on
    the Monk list about interoperable texts-- I can't
    refrain from pointing to the large amount of
    needless and heedless divergence. There is good
    and bad news about it. The bad news is that it
    has caused a lot of work. The good news is that a
    very high percentage of problems can be solved
    quite satisfactorily by supplementary conventions
    to the content rules of elements. If, for
    instance, the people who slapped the Level 4
    Guidelines together had spent two hours about
    making recommendations what to do or not to do
    about soft hyphens at the end of a line or page
    when you encode a text according to Level 4
    Guidelines, we'd have a lot less grief. And so it
    goes with a lot of other little stuff.
  • (Martin Mueller, in email)?

11
Clumps and Runners
  • So, we have data in clumpsin collections that
    are curated and hosted by libraries, publishers,
    and others what we need are the runners that
    connect those clumps, and what we'll discover
    when we have them is that data doesn't move
    between clumps very successfully. That's a
    problem that nobody is really dealing with, and
    that's a role for Bamboo. I'm not recommending
    that Bamboo become a standards group, but rather
    that Bamboo attend to the actual problems of data
    interchange and interoperability, in actual data
    domains, for actual research projects, and that
    it collect and synthesize the experience of
    actual practice and turn it back to the stewards
    of content, to reduce needless and heedless
    divergence.
Write a Comment
User Comments (0)
About PowerShow.com