Symposium on Best Practice - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Symposium on Best Practice

Description:

Popular working forms (like Microsoft Word or database applications) are not ... The form that is archived preserves ... sample presentation form. From a ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 20
Provided by: garys79
Category:

less

Transcript and Presenter's Notes

Title: Symposium on Best Practice


1
Ensuring that digital data last
  • The priority of archival form over working
    form and presentation form Gary
    Simons SIL International

2
A paradox of writing history
  • The more advanced the writing technology, the
    less durable the written product.
  • From most durable to least durable
  • Clay tablets and stone
  • Velum
  • Papyrus
  • Paper
  • Digital word processing

3
Storage media are ephemeral
  • Life expectancy of digital storage media
  • Magnetic tape 10 to 20 years
  • CD-R (write once)
  • Manufacturers say 100 to 200 years
  • Independent lab says 30 years
  • CD-RW (write many times)
  • Manufacturers say 25 years

4
Hardware devices are ephemeral
  • Removable media on personal computers advance
    over 25 years
  • 8-inch floppies
  • 5.25-inch floppies
  • 3.5-inch floppies
  • Zip drives
  • CD-Rs
  • DVD-Rs

5
Software formats are ephemeral
  • Software vendors change file formats and
    functionality with each version.
  • When we use a proprietary single vendor format,
    we lose access to the data when the software is
    obsolete.
  • For instance,
  • Microsoft Word files from the 1980s cannot be
    read by current versions of Word

6
An impending Digital Dark Age
  • Future historians may see our present age as
    another Dark Ages since so much information
    documenting our current civilization is recorded
    digitally and will have vanished.
  • If linguists fail to act in time, our digital
    data records are in danger of dying out before
    the endangered languages we are seeking to
    document.

7
Whats a linguist to do?
  • Do two things to ensure that digital data endure
    long into the future
  • Put the materials into an enduring file format.
  • Deposit the materials with an archive that will
    make a practice of periodically migrating them to
    new storage media as needed.

8
Forms contrasted by function
  • Working form
  • The form in which information is stored as it is
    created and edited.
  • Presentation form
  • The form in which information is presented to the
    public.
  • Archival form
  • The form in which information isstored for
    access long into the future.

9
The problem
  • Popular working forms (like Microsoft Word or
    database applications) are not suitable archival
    forms.
  • Popular presentation forms (like dynamic web
    pages) are not suitable archival forms.
  • Linguists tend to focus on working form and
    presentation form they must look beyond these to
    create enduring work.

10
Unacceptable practice
  • The form that is archived is a binary working
    form that requires a specific piece of software,
    e.g.,
  • .DOC, .XLS, .PPT, .MDB
  • A format supported by homemade software
  • The information will cease to exist when the
    required software ceases to work on the hardware
    in use.

11
Minimally acceptable practice
  • The form that is archived is a presentation form
    based on an open format supported by multiple
    vendors, e.g.,
  • HTML, PDF
  • The good news
  • A snapshot of how you presented the information
    will persist.
  • The bad news
  • It is a dead end formatthe information is not
    repurposeable.

12
Best practice
  • The form that is archived preserves all of the
    information (including its structure) in such a
    way that it is portable and repurposeable.
  • Descriptive XML markup
  • An XML archival form is not a dead end
  • It may be reloaded into a working form.
  • it may regenerate new presentation forms.

13
A sample presentation form
  • From a dictionary of Sikaiana, Solomon Islands

aha na the shell tool used for measuring the
spaces between mesh in nets (seu manu, kupena).
ahaa (from PPN afaa) n a cyclone, a tidal
wave. aaha 1. vt to open up, to push apart, as
in pushing apart branches in order to look
through. 2. vt to open up a new settlement or
start a new garden. 3. vt to start, to begin a
new project or way of life. Tapa mai a koe ko
hano i mua ki aaha te ala o te taina, 'you called
upon me to go first (to school) to open the way
for my brother (MS)'.
14
Unacceptable practice
  • If you archive a .DOC file, this is what future
    generations will see when they open it

15
Minimally acceptable practice
  • If you archive an HTML presentation, this is what
    future generations will see

ltPgtltBgtahalt/Bgt ltIgtnalt/Igt the shell tool used for
measuring the spaces between mesh in nets (ltIgtseu
manu, kupenalt/Igt).lt/PgtltPgtltBgt ahaalt/Bgt (from PPN
afaa) ltIgtnlt/Igt a cyclone, a tidal
wave.lt/PgtltPgtltBgt aahalt/Bgt 1. ltIgtvtlt/Igt to open
up, to push apart, as in pushing apart branches
in order to look through. 2. ltIgtvtlt/Igt to open
up a new settlement or start a new garden. 3.
ltIgtvtlt/Igt to start, to begin a new project or
way of life. ltIgtTapa mai a koe ko hano i mua ki
aaha te ala o te taina,lt/Igt 'you called upon me
to go first (to school) to open the way for my
brother (MS)'. lt/Pgt
16
Best practice
  • If you archive descriptive XML markup, this is
    what future generations will see
  • Future generations (though they lack our current
    working tools) will be able to
  • See and understand the information
  • Load it into their own working tools
  • Create modern presentation forms

17
Is XML just one more ephemeral format?
  • No! Its as rock solid as ASCII.
  • ASCII was adopted in 1963 40 years later it is
    at the heart of operating sys-tems, email, the
    web it wont change.
  • XML uses ASCII notation to essentially extend
    ASCII by solving two of its inherent limitations
  • Via Unicode it encodes text in any language
  • Via tags it encodes the structure of information

18
Is XML just one more theory?
  • No! It has become part of the fabric of the
    global information infrastructure.
  • Its a family of open standards from the
    Worldwide Web Consortium.
  • All major vendors (e.g. Microsoft, IBM, Sun,
    Oracle) have embraced it.
  • Hundreds of small vendors and open-source
    projects have developed tools.

19
Whats linguistics to do?
  • The community needs to recognize the fleeting
    value of digital presentation forms and embrace
    archival forms.
  • Grants should require best practice archiving,
    not just dissemination.
  • Reward archival language documentation.
  • Get into league with libraries and archives.
  • Only by taking steps like these can we ensure
    that our digital data will endure.
Write a Comment
User Comments (0)
About PowerShow.com