Title: Geen diatitel
1? ? ?
Exploring and Enriching a LR Archive via the Web
Marc Kemps-Snijders, Alex Klassmann,
Claus Zinn, Peter Berck, Albert Russel,
Peter Wittenburg MPI for Psycholinguistics DOBES
Endangered Languages Project
2? ? ?
What is a digital archive?
- Two essential dimensions
- Long-term Preservation of all resources and
relations - Accessibility and Interpretability
- Why preserve?
- face the loss of our cultural memory on
electronic media - UNESCO 80 of the recordings about languages
and cultures - are highly endangered
- There are no guarantees for preservation but we
can increase chances of survival - store everything in a well-organized repository
(browsable/searchable) - take care of redundancy, migration and curation
on various dimensions - establish organizations that take responsibility
- Digital Archives are living Entities!
- Live Archives Concept allow enrichments
(standoff), relations etc -
3? ? ?
What is in MPIs archive?
- Endangered Language Documentation resources
- Representative record of a language in its
cultural context - Crucial is the active involvement of the
community - May help in maintaining and revitalizing
languages - Therefore trend towards complementing linguistic
information with ontological one in collaborative
spaces
- Child language, bilingualism, gesture, sign
language, corpus spoken Dutch, sound corpora,
second learner corpora, etc.
Mostly annotated audio/video recordings 30
Terabyte, 53.000 AV resources, 24.000 annotation
files, 60 Mio annotations, lexicons, sketch
grammars, etc.
All from a large number of depositors
4? ? ?
DOBES Languages
40 language teams from the DOBES program
documenting about 60 languages and working
independently
5? ? ?
Language Archiving Technology
LAT to support operations during resource
life-time
support standards where possible
6? ? ?
LAT Dimensions Management Upload
- take care of consistency
- check uploaded formats
- convert where possible
- create presentation formats
- create indexes
- allow access rights definition
- add unique persistent IDs
- take care of distribution
- basis is a robust repository
- system with reliable mechanisms
resources metadata
repository system
metadata editing
7? ? ?
LAT Dimensions Complex Access
- access to annotated
- media or multimedia
- lexica
- callable via any other
- web application
8? ? ?
LAT Dimensions Customized views
- fostering the creation of special web-sites by
REST interfaces and templates - fostering GIS presentations by special
converters
9? ? ?
Who are our users?
Stakeholder Interest
archivist easy management, easy discovery, consistency, statistics, versioning, ..
researchers easy visualization, easy discovery, virtual collections, extensions, permissions, ..
communities semantic exploration, extensions, permissions, ..
journalists appetizers, easy inspection, ..
students curiosity, navigation, inspection, ..
Still in a download first paradigm not
cyberinfrastructure usage (result of an ESF/NSF
workshop)
10? ? ?
Download first problems and disadvantages
- Tool and format updates are propagated to users
at a slow rate - legacy formats offered to archives pose an
increasing burden on archives or tool builders
(conversion/migration) - New techniques slowly spread through the community
- Orchestration between tools becomes much more
difficult if not impossible
- Users need to install tools locally
Can we provide more incentives on the tools side?
11? ? ?
How to extend LAT?
- Paper dictionaries limited usefulness in
language maintenance - language revival (Manning et al., 2000)
- Linear lexicons not at all interesting except
for linguists - Speech community may prefer explicit semantic
acces and links, possibly - of a wide variety of types (i.e. beyond formal
systems) - Semantic view not limited to lexicons, but
should include all fragments
Therefore, introduction of conceptual spaces,
where concepts are related to others anchored
in language illustrated with multimedia
12? ? ?
ADDIT Commentary Relations
- allow authorized people to make arbitrary
comments on and relations between - object fragments
- visualize them in tools and via VICOS
13? ? ?
VICOS Lexical relations navigation
- Allow users to create relations within and across
lexicons - across cognate sets etc
- Visualize and allow easy navigation in conceptual
spaces - Empower community members to actively describe
their LC and to learn from such resources - Decide which words offer key access to cultural
concepts - Technology needed to link words (and the
associations they evoke) to other words and to
all sorts of relevant fragments - Conceptual Spaces informal ontology of
fuzzily-defined concepts and relationships - But where concepts are anchored in
corresponding formal lexicon entries
14(No Transcript)
15? ? ?
Team and Acknowledgements
- LAT Team
- System Managers
- Archive Managers Digitization
- Software Developers
Acknowledgements The work was funded by the
VolkswagenFoundation, the European Commission,
the Dutch Science Organization, the Dutch
Institute for Lexicology, the Max Planck Society
and the Max Planck Institute for Psycholinguistics