Using resources - PowerPoint PPT Presentation

About This Presentation
Title:

Using resources

Description:

1985: a group of psychologists and linguists start to develop a ' ... Shetland pony pony horse equid odd-toed ungulate herbivore mammal vertebrate animal ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: harold
Category:
Tags: pony | resources | using

less

Transcript and Presenter's Notes

Title: Using resources


1
Using resources
  • WordNet and the BNC

2
WordNet History
  • 1985 a group of psychologists and linguists
    start to develop a lexical database
  • Princeton University
  • theoretical basis results from
  • psycholinguistics and psycholexicology
  • What are properties of the mental lexicon?

3
Global organisation
  • division of the lexicon into five categories
  • Nouns
  • Verbs
  • Adjectives
  • Adverbs
  • function words (probably stored separately as
    part of the syntactic component of language
    Miller et al.

4
Global organization
  • nouns organized as topical hierarchies
  • verbs entailment relations
  • adjectives N-dimensional hyperspaces
  • adverbs N-dimensional hyperspaces
  • Miller et al. Each of these lexical
    structures reflects a different way of
    categorizing experience attempts to impose a
    single organizing principle on all syntactic
    categories would badly misrepresent the
    psychological complexity of lexical knowledge.

5
Basic principles
  • organize lexical information in terms of word
    meaning, rather than word forms
  • In this respect, WordNet resembles athesaurus
    more than a dictionary, ... Miller et al.
  • ... a word is a conventional association
    between a lexicalized concept and an utterance
    that plays a syntactic role.
  • word form refers to physical utterance or
    inscription
  • word meaning refers to the lexicalized concept
    that a form can be used to express

6
Lexical semantics
  • How are word meanings represented in WordNet?
  • synsets (synonym sets) as basic units
  • a word meaning is represented by simply listing
    the word forms that can be used to express it
  • example senses of board
  • a piece of lumber vs. a group of people assembled
    for some purpose
  • synsets as unambiguous designators
  • board, plank vs. board, committee

7
Synsets
  • synsets often sufficient for differential
    purposes
  • if an appropriate synonym is not available a
    short gloss may be used
  • e.g. board, (a persons meals, provided
    regularly for money)

8
Lexical Relations in WordNet
  • WordNet is organized by semantic relations.
  • It is characteristic of semantic relations that
    they are reciprocated
  • if there is a semantic relation R between meaning
    x, x, ... and meaning y, y, ..., then there
    is a relation R between y,y, ... and x, x,
    ....

9
Lexical relations synonymy
  • similarity of meaning
  • Leibniz two expressions are synonymous if the
    substitution of one for the other never changes
    the truth value of a sentence in which the
    substitution is made
  • such global synonymy is rare (it would be
    redundant)
  • synonymy relative to a context two expressions
    are synonymous in a linguistic context C if the
    substitution of one for the other in C does not
    alter the truth value
  • consequence of this synonymy in terms of
    substitutability words in different syntactic
    categories cannot be synonyms

10
Lexical relations antonymy
  • antonym of a word x is sometimes not-x, but not
    always
  • rich and poor are antonyms
  • but not rich does not imply poor
  • (because many people consider them neither rich
    nor poor)
  • antonymy is a lexical relation between word
    forms, not a semantic relation between word
    meanings
  • meanings rise,ascend and fall, descend are
    conceptual opposites, but they are not antonyms
    rise/fall and ascend/descend are pairs of
    antonyms
  • w1 w2? S1 w3 w4 ? S2 ant(w1 ,w3 ) ?
    ant(w2 ,w4 )

11
Lexcial relations hyponymy
  • hyponymy is a semantic relation between word
    meanings
  • maple is a hyponym of tree
  • inverse hypernymy
  • tree is a hypernym of maple
  • also called subordination/superordination
    subset/superset ISA relation
  • test for hyponomy
  • native speaker must accept sentences built from
    the frame An x is a (kind of) y

12
Lexcial relations meronymy
  • A concept represented by the synset x, x,...
    is a meronym of a concept represented by the
    synset y, y, ... if native speakers of English
    accept sentences constructed from such frames as
    A y has an x (as a part), An x is a part of
    y.
  • inverse relation holonymy
  • HAS-AS-PART
  • part hierarchy
  • part-of is asymmetric and (with caution)
    transitive

13
Lexical relations meronymy
  • failures of transitivity caused by different
    part-whole relations, e.g.
  • A musician has an arm.
  • An orchestra has a musician.
  • but ? An orchestra has an arm.
  • Types of meronymy in WordNet
  • component most frequently found
  • member
  • composition
  • phase process

14
WordNets noun hierarchy
  • noun hierarchy partitioned into separate
    hierarchies with unique top hypernyms
  • vague abstractions would be semantically empty,
    e.g. entity with immediate hyponyms object,
    thing and idea

15
act,action,activity animal,fauna
artifact attribute,property
body,corpus cognition,knowledge
communication event,happening
feeling,emotion food group,collection
location,place motive
natural object natural phenomenon
person,human being plant,flora
possession process quantity,ammount
relation shape state, condition
substance time
16
Nouns in WordNet
  • noun hierarchy as lexical inheritance system
  • ... seldom goes more than ten levels deep, and
    the deepest examples usually contain technical
    levels that are not part of everyday vocabulary.
  • Shetland pony ? pony ? horse ? equid ? odd-toed
    ungulate ? herbivore ? mammal ? vertebrate ?
    animal

17
Nouns in WordNet
  • man-made artifacts sometimes six or seven levels
    deep
  • roadster ? car ? motor vehicle ? wheeled
    vehicle ? vehicle ? conveyance ? artifact
  • hierarchy of persons about three or four levels
  • televangelist ? evangelist ? preacher ? clergyman
    ? spiritual leader ? person
  • Like all thesaurus structures, words can have
    multiple hypernyms

18
WordNets for other languages
  • Idea has been widely copied
  • Sometimes by translating Princeton WordNet
  • Lexical relations in general are universal ...
  • But are they in practice?
  • Are synsets universal?
  • EuroWordNet combining multilingual WordNets to
    include cross-language equivalence
  • Inherent difficulties, as above

19
BNC
  • One of the most widely used corpora (esp. in
    Britain, but also elsewhere)
  • A balanced synchronic text corpus containing 100
    million words (POS tagged)
  • Collected in late 1980s
  • 90 text, 10 transcribed speech
  • Encoded according to TEI standards
  • Associated tools (mainly for searching), but many
    users write their own (eg in Perl)
  • http//www.natcorp.ox.ac.uk/

20
Using the BNC
  • Just looking up words
  • More interesting to construct queries that
    exploit the mark-up (see Allans slides)
  • Already becoming dated (e.g. numpty)
  • Results often contradict authorities such as
    dictionaries, especially in revealing primary
    senses/uses of words.

21
(No Transcript)
22
(No Transcript)
23
WWW as a corpus
  • Standard Google search engine used with
    individual words does not always give good word
    collocations after all, Google is document
    retrieval
  • Try http//labs1.google.com/sets

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Lexical research
  • Use corpus resource such as BNc together with
    WordNet to get interesting results
  • ? Allans slides
Write a Comment
User Comments (0)
About PowerShow.com