Title: Aims of the Ethnographic
1- Aims of the Ethnographic
- Eresearch project
- Nick Thieberger
- University of Melbourne
- Sponsored by the Pacific And Regional Archive for
Digital Sources in Endangered Cultures
(PARADISEC) - Funded by the Australian Research Council
- Supported by HCSNet
2ARC Grant SR0566965 Eresearch ethnographic media
- Summary of aims
- 1 Develop a methodology for collaborative
research based on large digital media datasets,
particularly video, using highspeed networks and
large data repositories. - 2 Ensure that digital research collection
management is able to provide access and ensure
longevity and reusability of research data by
conforming to relevant standards. - 3 Enable access via authentication to
distributed research datasets using embedded
rights metadata. - 4 Ensure that the participant testbeds allow
interoperability of data models and provide
training so that future such datasets can
similarly interoperate. - 5 Provide the means to distribute richly
annotated ethnographic media not only to
researchers but also to its source communities in
rural, regional and remote locations.
3Participants
- Melbourne Thieberger (lead), Wigglesworth,
Nordlinger, Evans, Hajek - Sydney Simpson, Barwick, Marett, Corn, Foley
- ANU Rumsey, Bowden, Buchhorn, Hungerford
- Macquarie Johnston, Schembri
- CSIRO Pfeiffer
- DSTC/Qld Hunter
- AIATSIS McConvell
- School of Oriental and African Studies, London
Austin - University of Alaska at Fairbanks Holton, Alaska
Native Language Center - University of Texas at Austin Johnson, Archive
of the Indigenous - Languages of Latin America
- Galiwinku Indigenous Knowledge Centre Gumbula
4Projects
- 1. Pacific And Regional Archive for Digital
Sources in Endangered Cultures (PARADISEC) - 2. Aboriginal Child Language Acquisition (ACLA)
- 3. Corpus of grammar and discourse strategies of
deaf native users of Auslan (Australian Sign
Language) (ELDP Auslan) - 4. Melbourne University Reciprocals Project
(MURP) - 5. Waima'a (East Timor) documentation project.
- 6. National recording project for indigenous
performance in Australia
5Rationale
- The proposed research will implement and, if
necessary, customise tools for collaboration and
access to audiovisual data of the testbed
projects using tools and software developed by
the ANU Internet Futures project (AccessGrid),
CeNTIE (Centre for Networking Technologies for
the Information Economy) (Annodex) and UQ
(Vannotea). These innovative applications will
take advantage of Australian higher education's
world-class storage and networking capacity
(CeNTIE, GrangeNet, APAC). - Free and open source software
- Archival data
6Initial program (as per the application)
- Transcoding of existing annotations in ELAN,
Transcriber, and CLAN schemas (currently used by
participating projects) to common Continuous
Media Markup Language (CMML). - Batch transcoding of media stored in APAC
repositories to open media formats Ogg vorbis
(audio) and Ogg theora (video), the streaming
formats addressed by CSIROs Annodex software. - Establishment of streaming architecture and
choice of appropriate transport for implementing
streaming of media files amongst participants. - Testing of appropriate telecollaboration
environments for annotation Vannotea and Access
Grid - Staged implementation of text-based browsing and
streamed excerpt delivery for participating
projects via Annodex. - Authorisation and authentication of access to
annotation.
7Issues
- Location of test datasets (online server)
- Implies persistent identification and location
- Use of standard metadata repositories to enable
location of data - Interoperability with existing systems (metadata
and data structure) - Ease of use is critical otherwise practitioners
won't use it
8Issues
- Meta-ethnographic observation
- Solutions exist and are widely known in computing
disciplines - Implementation in a way that practitioners can
access is yet to be achieved - Aim of this project to overcome this problem
9The development of linguistic documents entails
the prior creation of a corpus of primary
material for which relationships need to be
tracked
Grammar
Dictionary
Texts
Media - audio - images - video
transcripts historical sources maps etc .
10Typical linguistic fieldwork
- Media recordings
- Transcribed in a standard schema (e.g. using
Elan, Clan, ITE, or Transcriber) - Output as time-aligned text to Toolbox for
interlinear annotation - Output from Toolbox as interlinear version in XML
11Ideally
- Ability to have multiple files in one of e.g.
four schemas all viewable by the same browser (ie
no transcoding) - Persistent file id allows citation and linking
- Analytical apparatus built on corpus of
annotations - Concordance
- Playlists
- Copying segments from the corpus with their
citation form (e.g. media name, start time, end
time)
12Standalone model
- Audiamus allows the user to interact with
linguistic data - Need to make a more generic tool with the same
functions
13Sample fragment
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE TEXT PUBLIC "-//CNRS-LACITO//DTD
Archivage//EN" "http//lacito.Archivage.vjf.cnrs.f
r/archives/All/Archive.dtd"gt - ltTEXT id"ERKRIRIAL" xmllang"x-sil-ERK"gt
- ltHEADERgt ltTITLE xmllang"en" /gt
- ltSOUNDFILE href"RIRIAL" /gt lt/HEADERgt
- ltSgt
- ltFORM kindOf"sentence"gtIpiatlak nmatu
iskei,lt/FORMgt - ltAUDIO start"0" end"2.3" /gt
- ltWgtltFORM kindOf"morph"gtilt/FORMgt
- ltTRANSL xmllang"en"gt3sgRSlt/TRANSLgtlt/Wgt
- ltWgtltFORM kindOf"morph"gtpiatlaklt/FORMgt
- ltTRANSL xmllang"en"gthavelt/TRANSLgtlt/Wgt
- ltWgtltFORM kindOf"morph"gtnmatult/FORMgt
- ltTRANSL xmllang"en"gtwomanlt/TRANSLgtlt/Wgt
- ltWgtltFORM kindOf"morph"gtiskeilt/FORMgt
- ltTRANSL xmllang"en"gtonelt/TRANSLgtlt/Wgt
- ltTRANSLgtThere was this womanlt/TRANSLgt lt/Sgt
.....
14- http//lacito.archivage.vjf.cnrs.fr/servlet/
15- Annotation as a means of establishing
relationships between existing objects - These objects need to be citable
- Persistent identification
- Persistent location
- Standard cataloging information - metadata
16Wurm collection, Solomon Islands, 1979.
Digitised cassette tape with page image of
transcript, and Wurms language map
17Relationship between objects needs to be tracked
PARADISEC archival objects
Media files
Timecodes
Image file 1
Image file 2
Image file 3
Image file 4
Image file 5
Image file 1 is a transcript of offset points 10
seconds to 2 minutes 30 seconds of media file
SAW2-011
18Relationship between objects needs to be tracked
Field recordings and associated items
Media files
Timecodes
Story 1
Story 2
Story 3
Story 4
Story 5
Each story has interlinear versions, with
relations required to speaker information (name,
age, sex, clan etc.) and other metadata
19Summary
- Need to
- capture and store relations between objects
(Resource Description Framework, RDF) - fix objects with persistent identification
- allow users to locate objects via sufficient
metadata - authenticate authorised users
- capture and store annotations made by authorised
users
20- Sponsored by the Pacific And Regional Archive for
Digital Sources in Endangered Cultures
(PARADISEC) - Funded by the Australian Research Council
- Supported by HCSNet