quads.esds.ac.uk/squad - PowerPoint PPT Presentation

About This Presentation
Title:

quads.esds.ac.uk/squad

Description:

specify, test and propose an eXtended Markup Language (XML) schema for storing ... was rocked by the announcement last Thursday that Mr. Verdi would leave his job ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 3
Provided by: aet4
Category:
Tags: esds | quads | squad | verdi

less

Transcript and Presenter's Notes

Title: quads.esds.ac.uk/squad


1
SMART QUALITATIVE DATA METHODS AND COMMUNITY
TOOLS FOR DATA MARK-UP
THE PROJECT
WHAT FEATURES OF TEXT CAN BE MARKED UP?
SQUAD aims to explore methodological and
technical solutions for exposing digital
qualitative data to make them fully shareable and
exploitable. The main objectives are to
Spoken interview texts provide the clearest and
most common example of the types of encoding
features that can be marked up. There are three
basic groups of structural features
  • specify, test and propose an eXtended Markup
    Language (XML) schema for storing and marking up
    qualitative data
  • investigate requirements for contextualising
    qualitative data and developing standards for
    data documentation
  • develop semi-automated using natural language
    processing (NLP) tools for preparing marked up
    qualitative data for sharing
  • research tools for publishing and interrogating
    data via the web Qualitative Data Mark-Up
    Tools (QDMT)
  • utterance, specific turn taker, defining
    idiosyncrasies in transcription
  • links to analytic annotation and other data types
    (e.g. thematic codes,concepts,audio or video
    links, researcher annotations)
  • identifying information such as real names,
    company names, place names, occupations, temporal
    information

Example Italy's business world was rocked by the
announcement last Thursday that Mr. Verdi would
leave his job as vice-president of Music Masters
of Milan, Inc to become operations director of
Arthur Anderson.
DEFINING CONTEXT
Rich context enables informed re-use of data. But
defining how to provide context for raw data to
make it more usable is complex. ESDS Qualidata
has done much to establish informal ways of
documenting raw data. Micro and macro level
features should be considered including
USING NLP TOOLS
Information Extraction (IE) is a sub-field of NLP
which aims to identify key pieces of information
in texts using 'shallow' analysis techniques. A
typical IE system will perform Named Entity
Recognition where particular kinds of proper
names and terms are identified, classified and
marked up.
  • how the research question was framed
  • the research application process
  • project progress
  • fieldwork situations
  • analyses processes

Fieldwork observations are useful as are
timelines and political chronologies. Equally
when undertaking a replication or restudy,
detailed information on sampling procedures,
field work approaches and question guides will be
essential. SQUAD has identified a minimal
generic set of elements that represent a baseline
for contextualising data.
This is a means of annotating documents with
semantic metadata enabling resource discovery
and data exploration. The Edinburgh LT-XML and
CME tools have been used to process the data.
quads.esds.ac.uk/squad
2
SMART QUALITATIVE DATA METHODS AND COMMUNITY
TOOLS FOR DATA MARK-UP
METADATA STANDARDS
ANONYMISING DATA TOOL
The XML schema will specify a reduced set of
Text Encoding Initiative (TEI) elements
This tool imports marked up data from from the
Edinburgh pipeline system. Named entities are
highlighted and co-reference chains e.g
numerous references to a single person - are
identified.
  • core tag set for transcription
  • names, numbers, dates ltpersnamegt
  • links and cross references ltrefgt
  • notes and annotations ltnotegt
  • text structure ltbodygt
  • unique to spoken texts ltkinesicgt
  • linking, segmentation and alignment ltlinkgt
  • advanced pointing - XPointer framework
  • text and AV synchronisation
  • contextual information (participants, setting,
    text)

Names can be anonymised with chosen pseudonyms.
The references of names to pseudonyms is saved.
Annotations are explored in an XML format in the
NITE NXT model. NXT uses stand off annotation
where annotation is linked to or referenced by
words.
  • ltu who"interviewer" xmlid"u1"gtThere's just
    one or two factual things first of all do you
    mind my asking how old you are?lt/ugt
  • ltu who"subject" xmlid"u2"gt49.lt/ugt
  • ltu who"interviewer" xmlid"u3"gtAnd what
    schools did you go to?lt/ugt
  • ltu who"subject" xmlid"u4"gt
  • ltorgNamegtKing Streetlt/orgNamegt

interview text with XML tags embedded
TOOLS PROGRESS
  • defined header metadata for a standardised
    transcript
  • defined and tested generic XML models for
    qualitative data
  • tested and refined NLP tools for qualitative data
  • built front end to NLP named entity tools
  • chosen software to enable annotation of data
  • explored export formats for longer-term archiving
  • investigated powerful XML based indexing tools
    for searching and retrieving data
  • investigated web display of multimedia data and
    pointers to other resources using XML extending
    the functionality of ESDS Qualidata

DATA EXCHANGE STANDARDS
  • A uniform format for richly encoding qualitative
    research is necessary as it enables preservation
    and re-use of metadata, data and annotation
    ensures consistency of presentation and
    description of data supports the development of
    common web-based publishing and search tools and
    facilitates data interchange and comparison
    among datasets.
  • SQUAD has produced a limited formal definition of
    a common XML vocabulary and DTD based on the TEI
    and tested a new Qualitative Data Interchange
    Format (QDIF).

THE PROJECT TEAM
CONTACT
Claire Grover Maria Milosavljevic
Louise Corti and Claire Grover UK Data
ArchiveUniversity of EssexColchester, Essex CO4
3SQ Email quads_at_esds.ac.ukTel 44 (0)1206
872145 URL quads.esds.ac.uk/squad
Louise Corti
Libby Bishop
Mijail Alexandrov Kabadjov
quads.esds.ac.uk/squad
Write a Comment
User Comments (0)
About PowerShow.com