Title: Kirrkirr: Transforming the Representation of Lexical Knowledge
1Kirrkirr Transforming the Representation of
Lexical Knowledge
- Christopher Manning
- University of Sydney
- http//www.sultry.arts.usyd.edu.au/kirrkirr/
2Project Objectives
- Aims of the project
- examining the richness of lexical structure, in
particular the connotational and figurative use
of words - providing innovative ways for representing a
dictionary, through creative use of the medium of
computers - augmenting dictionaries from corpora
- to be able to provide practical educationally
useful programs as a result (at low labor cost) - Main initial target an interactive front end for
exploring or using the Warlpiri dictionary.
3Acknowledgements
- Ken Hale, Mary Laughren, Robert Hoogenraad, Jane
Simpson, David Nash - Many Warlpiri (Kay Ross for the audio)
- Kevin Jansz, Nitin Indurkhya, Wee Jim Sng
- Susan Poetsch, Miriam Corris
- and many others
4Research Program Lexicon
- A lexicon is not just words but a vast network of
associations between words and within and across
the concepts represented by words - The aim of this work is to provide people with a
better understanding of this conceptual map. - Traditional paper dictionaries offer very limited
ways for making such networks visible - On a computer, one can imagine all sorts of ways
of bringing out such relationships
5Research Computational Lexicography
- Dictionaries on computers are now commonplace
- But there has been little attempt to utilize the
potential of the new medium - Goal fun dictionary tools that are effective for
language learning, browsing, and research - Special interest dictionaries for minority
languages. Here economic, motivational, and user
support reasons all point to an important role
for computers.
6Research Computational Lexicography
- Dictionaries on computers are now commonplace
- But there has been little attempt to utilise the
potential of the new medium - Most present a plain, search-oriented
representation of the paper version - Goal fun dictionary tools that are effective for
browsing and language learning (cf. Kegl 1995)
7Research Computational Lexicography
- Fun dictionary tools
- Like flicking through a paper dictionary, but
better - Innovative ways for representing and linking
dictionary information, through creative use of
computer software - Should improve user supports and incidental
learning - Focus exploration/dissemination, not creation
8MRD Structure
- The internal structures of current Machine
Readable Dictionaries usually merely mimic the
structure of the printed form (Boguraev 1990) - Some work, notably WordNet (Miller 1995) has
involved a fundamental rethinking of dictionary
content and organization (here, organization via
synsets which are related via links of part,
subkind, opposite) - But this research hasnt been taken to users.
9Research Program Education
- Dictionary structure and usability are often
dictated by professional linguists, while the
needs of others (speakers, semi-speakers, young
users, second language learners) are not met - Weiner (1994) The initial purpose of the OED
- to create a record of vocabulary so that English
literature could be understood by all. But
English scholarship grew up and lexicography grew
with it inevitably parting company with the man
in the street. - Challenge is to avoid this.
10Dictionary usefulness and usability
- Kegl (1995) Machine-Readable Dictionaries and
Education - Originally, this paper was intended as a survey
of educational applications using MRDs. As far as
I have been able to determine, no such
applications currently exist - Standard dictionaries are reference works,
ill-suited for use as learning tools - Studies of American dictionary skills training
show that many tasks achieve little in the way of
education (but do teach word lookup!)
11Educational value of dictionaries
- However derived lexical information is useful!
- Think of a high school foreign language textbook
- terminology sets
- pictures with parts named
- vocabulary lists
- word explications
- Major issue
- Not many people sit around reading dictionaries
need something fun
12Data on usability evaluating a paper dictionary
- Study of paper dictionary usability by Susan
Poetsch, tested using Alawa dictionary (draft by
Margaret Sharpe) - In community, old people are very concerned to
keep language strong, and help as volunteers in
bilingual education. They are keen on dictionary - However, they lack the literacy skills to use it
- Susan worked with people aged 2550
- Since volunteers, probably better than average
literacy skills for the community
13Findings
- Not very literate A big dictionary is
overwhelming to someone with emerging literacy
skills - People knew words are ordered but could not use
ordering effectively (restart or flick randomly) - Often around 3 minutes a word lookup
- People lost place in page regularly
- An overcrowding of information is confusing
- One word correspondences are easiest for users,
but often unrealistic linguistically - Subentries were confusing part of speech
puzzling
14Findings (2)
- Regular dictionary users (especially, compilers!)
grossly underestimate the time they have spent
becoming familiar with dictionary structure - If a dictionary is going to be made for a speech
community, then the people in that community need
to feel confident in using it. - Teachers felt that the draft dictionary is too
long and detailed for school use - Conclusion These people need a different
dictionary (My First Alawa) - Would probably be used by adults as well as kids
15Our educational goals
- Aim at school kids
- Information seeking is a complex process which
is often not attended to in K-12 education
(Wallace et al. 1998) - Provide learner supports for getting started with
dictionaries - Adaptable interface can cater to different needs
- Support for active reading by allowing note
taking - An interface where you can see words, but are not
required to know words
16Kirrkirr A Warlpiri dictionary browser
- (Jansz 1998 Jansz, Manning and Indurkhya 1999)
- An environment for the interactive exploration of
dictionaries. - The design is general, but our current work has
just been with Warlpiri - Attempts to more fully utilize graphical
interfaces, hypertext, multimedia, and different
ways of indexing and accessing information - Written in Java, it can either be run over the
web high bandwidth or run locally (here Javas
main advantage is cross-platform support).
17Specific goals
- An interactive environment that encouraged
exploration easy and fun to use - Reduction of the dependence on alphabetical
order The low level of literacy in the region
makes an e-dictionary potentially more useful
than a paper edition - Catering to the needs of different user groups
(kids, teachers, professionals) - Flexible enough to display appropriate
information in appropriate ways depending on user
level
18Overview
- Kirrkirr provides various modules
- Graph layout of word relationships
- Formatted dictionary entries
- Semantic domain browsing
- A notes facility for jotting in the margin
- Multimedia audio, pictures
- Advanced searching interfaces
- others in planning colors, figuration patterns
- These attempt to cater to users with different
competence levels
19(No Transcript)
20The lexical database
- Original text materials are stored in an ad hoc
format using backslash codes origin runoff - These are converted to XML using an
error-correcting stack-based parser (written in
PERL) - The inconsistency and flexibility of dictionary
entries made this a surprisingly difficult task. - Many structural errors/inconsistencies/typos from
years of hand maintenance in text editors and via
regular expressions - Many problems with link consistency
- Heuristic content-sensitive parser imposes data
integrity - Lots of Information Systems 101
21XML
- XML a descendant of SGML for structured markup
of text - XML separates the structure of the data from its
presentation - Much of the recent enthusiasm for XML has
centered around representing simple and rigid
structures such as database records - The rich hierarchical and variable structure of
dictionary entries is really more what something
like XML excels at! - Result remains a portable, tangible text file
22XML indexing
- XML is a median between the structure, indexing,
etc. of a database, and the freedom of a word
processor. - To improve speed, an ad hoc index to the XML file
is built, and can be used for rapid headword and
gloss lookup and indexing which parts of the XML
file to process.
23Visualization of dictionary information
- For applications with simple textual content
behind them, there is little that can be done but
an on-line reflection of a printed page - But we want more than just definitions of words
we want to know their relationships to other
words, and the patterning in these relationships - In a computational approach, can mediate between
the lexical data and the user - The interface can select from and choose how to
present information (according to the users
preferences) in many different ways
24Previous work
- Current systems present the search-dominated
interface of classic Information Retrieval
systems you type a word in a search box - Results try to mimic, but are generally inferior
to, the printed version of the dictionary - Good feature rapid searching
- These systems do little to utilize the
captivating qualities of computers
interactivity, user control and adaptability
(Brown 1985).
25Previous work (2)
- Only effective when user has a clearly specified
information need even here, we are ignoring the
distinction between information gained and
knowledge sought (Sharpe 1995) - Lack browsing, and chances for incidental or
curiosity driven learning - Lack tangibility and situatedness of paper
ineffective for getting an idea of a collection - We wish to exploit the essence of hypertext,
which is click to explore browsing
26Previous work (3)
- Little research work (in corpus linguistics,
visualization etc.) on dictionary visualization - WordNet built a rich network of relationships,
which fundamentally departed from the paper
dictionary tradition, and has been used in many
computational projects - However very little has been done in the way of
interfaces that make these relationships visible
and intelligible to users. - Graphical representations seem particularly
important given our target users.
27MRD Interfaces WordNet
28Graph-based visualization
- There is a little previous work on graphical
representations of dictionaries - For instance, the visual-thesaurus by plumbdesign
derived from WordNet - But it is also a good demonstration of how
chaotic and confusing graphical interfaces can
become.
29Perils of visualization
30Graph-based visualization
- (Jansz 1998 Jansz, Manning and Indurkhya 1999)
- Classic graph layout problem
- Adapts work by Eades et al. (1998) and Huang et
al. (1998) on visualization and navigation of WWW
document linkages - Uses the spring algorithm. Big advantage is that
it is an iterative updating algorithm, and so
gives an easy interactivity - it wiggles and people can play with it.
- Clarity and simplicity of graph Software
maintains a set of focus nodes to prevent
overcrowding
31Educational advantages
- Alphabetical order is important, but
- A web of words offers other effective
opportunities for learning - A student can opportunistically explore words
that are related in various ways - Important semantic relationships can be
understood
32Kirrkirr network display
33Kirrkirr network display
34Formatted dictionary entries
- Are produced automatically from the XML by using
XSL (a style language) - XSL allows easy modeling of some user
preferences. - Most trivially, one can leave out information
such as part of speech, or detailed definitions - This is useful as many users find information
overload quite confusing and demotivating - Can produce bilingual or monolingual dictionary
- Opportunities for various output styles, and
formats such as RTF or TeX for printing.
35Formatted dictionary entries
36Rich typology of link types
- The semantically rich types of linkages present
in a dictionary (synonym, antonym, hyponym,
subheadword, variant, coverbs, ) solves one of
the major problems of the web we have many link
types with a clear semantic interpretation - Use consistent color-coded text and edges to show
these link types - Can tell where you are going before clicking
- Dictionary links can be supplemented by links
derived from collocational analysis of texts
37A collocations e.g
- pangurnu digging scoop
- pangurnu
- pili small coolamon/digging scoop
- rdaku hole in the ground
- kaninjarra downwards
- pangirni dig, produce cavity
- mulju soak in soft earth (dig for water)
- karlaja foot end of sleeping area
- pirrkirni scrape
- yirrarni put down
38Browsing
- Work (at PARC and elsewhere Pirolli et al. 1996)
has stressed role for browsing as well as
searching in information access - It provides a context for learning
- We provide browsing in several ways
- conventional hypertext
- but with rich semantically-interpreted links
- their color-coding matches network edges
- network-based display of words
- browsing through semantic domains
39Semantic Domains
- Alphabetical order is one indexing strategy, but
there are many others - Most requested is ability to find things by
semantic domains e.g., food, manufactured items. - Essentially the nouns structure of WordNet, or
the classical KR ISA hierarchy - We can exploit the domain info in the dictionary
40Semantic Domains
41Other components
- Multimedia (currently pictures and audio)
- Can hear pronunciations gives a better
under-standing of pronunciation than phonetic
symbols - pictures are more intelligible than descriptions
- (future videos of Warlpiri sign language?)
- Advanced search page
- search various fields, regular expressions, fuzzy
spelling, etc. - Notes
- one can annotate dictionary entries (to correct
or personalise)
42Simple features
- Show the alphabet
- The list on the left gives concreteness, and
tangibility - people can start with one of those words
- One can just type a few letters and then look at
the list traditional benefit of paper
dictionary - English lookup can be helpful when Warlpiri
spelling fails - Fuzzy spelling of Warlpiri a user support
43User study
- Mim Corris (Yuendumu, Willowra), Jane Simpson
(Lajamanu) - Observation and testing with primary and (lower)
secondary students - Observation of Warlpiri literacy workers
- Comments from teachers, other adults etc.
- Purely qualitative observational studies of
dictionary use. - Initial reactions quite enthusiastic
- Could use as a basis for classroom activities
(better with some further development games and
puzzles)
44A positive anecdote
- One of the introductory Warlpiri literacy
students, who had not been very interested in the
literacy class, spent nearly 3/4 hour looking at
Kirrkirr apparently in absorbed concentration.
She wasnt especially interested in the sound and
picture possibilities. She moved between words,
scrolling along the list, typing in the search,
clicking on the words in the network pane. She
wasnt even put off when the dictionary
definitions stopped appearing looking at the
networks of words instead. This is quite unlike
her attitude to the backslash coded electronic
dictionary (where she lost interest quickly
because of the difficulty for her of narrowing
down searches). After the Kirrkirr demo she
asked if she could have a printed dictionary to
take away with her to use in camp to learn the
words. I interpret this as a desire to learn
words in her own time and place.
45Endangered lang. dictionaries
- (Corris, Manning, Poetsch, and Simpson 1999).
Based on 72 people. - Testing both paper and electronic dictionaries
- competing goals documentation dictionaries vs.
maintenance/learning dictionaries - symbolic vs. practically useful organization
- lack of training, and limited literacy can make
paper dictionaries ineffective - 4560 minutes for 12 dictionary lookups
- lack of electricity makes e-dictionaries
ineffective in some places (e.g., Indonesia) - E-dictionaries can solve many usability issues
- font size, amount of info, infinite space, easy
lookup, sound
46Conclusions
- Kirrkirr is just a prototype of what one can do
to develop new ways to visualise lexicons - We have demonstrated an approach to making
dictionary information usable through the
creation of an application which mediates between
well-structured data and users needs for
searching/browsing and presentation - While we have focused our research on Warlpiri,
the system can be easily applied to other
languages the design is general
47Conclusions (cont.)
- ... The best future applications of MRDs in
education will be those most able to respond to
the insights and needs of their users (Kegl
1995) - Kirrkirr can be seen as a step towards the future
of edictionaries
48(No Transcript)