Computing and Linguistics A Cognitive Approach or, Computing - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Computing and Linguistics A Cognitive Approach or, Computing

Description:

Topic types: 'composer', 'city', 'opera' Association types: 'born in', 'composed by' ... Give me all composers that composed operas that were ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 49
Provided by: httppsiont1
Category:

less

Transcript and Presenter's Notes

Title: Computing and Linguistics A Cognitive Approach or, Computing


1
Computing and LinguisticsA Cognitive Approach
or, Computing As We May Think
  • Steve Pepper
  • pepper.steve_at_gmail.com
  • University of Oslo, 2009-04-21

2
Todays research questions
  • How can linguistics and in particular cognitive
    linguistics inform our work with Topic Maps?
  • Can Topic Maps contribute in any way to the
    cognitive linguistics project?
  • Plan of action
  • I tell you about Topic Maps (conceptual model)
  • I draw some parallels with natural language
  • You correct me, elaborate and suggest new
    directions

3
Relevance to you as linguists
  • As users of the technology
  • organizing data collected in your research
  • As consultants to users of the technology
  • e.g. universities, government agencies, private
    enterprise
  • As contributors to the standard
  • clarify some of the cognitive issues, establish
    best practices, help extend the standard
  • As lobbyists to the University of Oslo
  • if you think the new UiO web site should be based
    onTopic Maps, please make your views known to
    the project group http//www.admin.uio.no/prosjek
    ter/nyuioweb/

4
Relevance in general
  • We need to organize information in a new way
  • The summation of human experience is
    beingexpanded at a prodigious rate, and the
    means weuse for threading through the consequent
    mazeto the momentarily important item is the
    sameas was used in the days of square-rigged
    ships.(Vannevar Bush, As We May Think, 1945)
  • We need new ways of managing knowledge
  • In todays global knowledge economy, knowledge
    isthe key asset in many organizations...
  • Topic Maps makes major contributions in both
    areas
  • See the use cases presented at recent Topic Maps
    conferences http//www.topicmaps.com

5
What is Topic Maps?
  • An ISO standard for computer-based
    informationand knowledge management
  • Provides the ability to control infoglut and
    share knowledgeby connecting any kind of
    information from any kind of source based on its
    meaning
  • A semantic technology
  • Cf. Semantic Web (RDF, OWL)
  • A form of knowledge representation (primitive
    perhaps, but useful)
  • Widely used for web-based delivery of information
  • Plus Information Integration, eLearning,
    Business Process Modeling, Product Configuration,
    Business Rules Management, Asset Management,
    Knowledge Management,

6
The problem with computing...
  • ...is that its inside-out!
  • People used to think the sun revolved around the
    earth
  • Copernicus heliocentric theoryturned this idea
    inside out and revolutionized our
    understandingof the universe
  • Today we face a similar situation in computing
  • Our computing universe has computers,
    applications and documents at the centre
  • The concepts that our information is about are
    somewhere in outer space where they cant be found

7
A subject-centric revolution
  • This is wrong, because it does not reflect how
    humans think
  • We think in terms of interrelated concepts (or
    subjects)
  • Subjects are what interest us, not documents or
    applications
  • And so subjects must be givencentre stage
  • We need a subject-centric revolution
  • This has ramifications for every aspect of
    human-computer interaction, including user
    interfaces, operating systems, file systems, etc.
  • Consider the typical user desktop...

8
  • Today our desktops are application-centric and
    document-centric
  • Icons represent applications and documents

9
  • Why cant they be subject-centric, with icons
    that represent the subjects we are interested in?
  • With links between related icons?
  • And with context menus that allow us to find
    everything related to a particular subject?

gambia
K185
opera
topic maps
LING 2110
OOXML
tm2008
rana
INF 2820
janacek
bantu semantics
keynote
bayreuth
håkon
10
Computing As We May Think
  • Bushs solution to information overload
  • Organize information As We May Think, i.e.
    associatively
  • His vision spawned the hypertext movement
  • Doug Engelbart, Ted Nelson, Bill Atkinson, Tim
    Berners-Lee, ...
  • The World Wide Web is its greatest triumph to
    date
  • But hypertext does not correspond to how we think
  • Our heads are not full of millions of interlinked
    documents
  • They are full of interlinked concepts (or
    subjects)
  • Topic Maps provides a close approximation to this
  • It is a technology that is based on cognitive
    principles

11
Background to Topic Maps
  • Emerged from the SGML community in 1990s
  • Use case How to merge (digital) back-of-book
    indexes
  • Some input from library science
  • No input from linguists
  • Precious little input from computer scientists
    before 2001
  • Most of the SGML community came from the
    humanities
  • ISO 13250 first published in 2000 (recently
    revised)
  • A model for representing knowledge organization
    structures (indexes, glossaries, thesauri,
    encyclopedias)
  • Plus interchange syntax, query language,
    constraint language, ...
  • Widely adopted in Norway (esp. public sector)
  • And gaining ground elsewhere

12
The TAO of Topic Maps
Callas, Maria 42 Cavalleria Rusticana
71, 203-204 Mascagni, Pietro Cavalleria
Rusticana . 71, 203-204 Pavarotti, Luciano
45 Puccini, Giacomo . 23, 26-31 Tosca
. 65, 201-202 Rustic Chivalry, see
Cavalleria Rusticana singers .
39-52 baritone . 46 bass
.. 46-47 soprano 41-42, 337
tenor . 44-45 see also Callas,
Pavarotti Tosca 65, 201-202
  • The core concepts are derived from the
    back-of-book index
  • Extended and generalized for use with digital
    information
  • Consider a two-layer model consisting of
  • a set of information resources (below)
  • a knowledge map (above)
  • This is like the division of a book into content
    and index

(INDEX)
knowledge layer
information layer
(CONTENT)
13
(1) The information layer
  • The lower layer contains the content
  • usually digital, but need not be
  • can be in any format or notation or location
  • can be text, graphics, video, audio whatever
  • This is like the content of the book to which
    theback-of-book index belongs

information layer
(CONTENT)
14
(2) The knowledge layer
  • The upper layer consists of (typed) topics and
    associations
  • Topics represent the subjects that the
    information is about
  • Like the list of topics that forms a back-of-book
    index
  • Associations represent relationships between
    those subjects
  • Like see also relationships in a back-of-book
    index

composed by
Domain Italian opera
composed by
Tosca
Puccini
MadameButterfly
born in
knowledge layer
Lucca
(INDEX)
15
Occurrences link the layers
  • Occurrences represent relationships between
    information resources and the subjects that they
    are about
  • The links (or locators) are like page numbers in
    a back-of-book index
  • Occurrences canalso be typed (e.g.bio, map,
    synopsis)

16
Summary of core concepts
Lets look at some TAOsin the Omnigator
  • The TAO of Topic Maps

Plus topic types, association types, occurrence
types each of which are represented by topics...
17
About the Omnigator
  • A free topic map browser from Ontopia
  • Download from http//www.ontopia.net (part of
    OKS Samplers)
  • Java-based, runs on any computer
  • Completely generic
  • Not optimized for any particular ontology
  • Display and navigate any conforming topic map
  • A teaching aid
  • Not designed for end-users (no attempt to hide
    technical jargon)
  • Also used for prototyping and debugging
  • Not to be used for most real world applications!
  • These require custom interfaces based on a
    specific ontology
  • (see http//www.topicmaps.com for a good example)

18
Omnigator interface
a typical topic page
Demo
19
Typing topics revisited
  • Basic building blocks of the TAO model are
  • Topics e.g. Puccini, Lucca, Tosca
  • Associations e.g. Puccini was born in Lucca
  • Occurrences e.g. http//www.opera.net/puccini/bi
    o.htmlis a biography of Puccini
  • Each of these constructs can be typed
  • Topic types composer, city, opera
  • Association types born in, composed by
  • Occurrence types biography, street map,
    synopsis
  • All such types are also topics
  • The set of typing topics constitutes an ontology

20
Capabilities of the TAO model (1)
  • Represent subjects explicitly
  • Topics represent the things users are
    interested in
  • Capture relationships between subjects
  • Associations provide user-friendly navigation
    paths to information (navigation as we may
    think)
  • Associations also promote serendipitous knowledge
    discovery through browsing
  • Make information findable
  • Topics provide a one-stop-shop for everything
    that is known about a subject (collocation of
    information and knowledge)
  • Occurrences allow information about a common
    subject to be aggregated across multiple systems,
    irrespective of location

21
Capabilities of the TAO model (2)
  • Represent taxonomies and thesauri
  • Associations can (also) represent hierarchical
    relationships
  • With Topic Maps you can have multiple,
    interlinked hierarchies and faceted
    classification
  • Transcend simple hierarchies
  • Rich associative structures capture the
    complexity of knowledge and reflect the way
    people think
  • Manage knowledge
  • The topic map is the embodiment of
    organizational memory
  • Provides a structured way to capture peoples
    knowledge of things, events, relationships, etc.

22
Beyond the TAO
For more details, see Pepper 2009
  • Formal data model
  • Topic maps can be queried, e.g.
  • Give me all composers that composed operas that
    werebased on plays that were written by
    Shakespeare
  • Interchange syntax
  • Topic maps can be interchanged
  • Increased reuse added value
  • Robust identity model
  • Topic maps can be merged
  • Potential to federate knowledge
  • Scope
  • Topic maps can capture context
  • Reification
  • Topic maps can express different levels of detail
  • Similar to scaling in cartography

23
Break any questions so far?
  • After the break
  • Topic Maps and natural language towards a
    linguistic perspective

24
Parallels with natural language
  • Basic grammatical classes
  • Nouns and verbs
  • Nominals and nouns
  • Clauses and verbs
  • Valency
  • Semantic roles
  • Categories and schemas
  • Hyponymy
  • Synonymy and homonymy
  • Nominalization
  • Grounding / co-reference
  • Information structure
  • TAO model
  • Topics and associations
  • Topics and their types
  • Associations and their types
  • Arity
  • Association roles
  • Typing topics
  • Type hierarchies
  • Naming
  • Reification
  • Subject identity / collocation
  • Navigation

25
Basic principles, basic classes
  • In elementary school, I was taught that a noun is
    the name of a person, place, or thing. In
    college, I was taught the basic linguistic
    doctrine that a noun can only be defined in terms
    of grammatical behavior, conceptual definitions
    of grammatical classes being impossible. Here,
    several decades later, I demonstrate the
    inexorable progress of grammatical theory by
    claiming that a noun is the name of a
    thing.(Langacker 2008)
  • The basic grammatical classes are nouns and verbs
  • They prototypically profile things and
    relationships
  • They correspond to topics and associations

26
Grounding
Langacker 2008 259ff (esp. 264)
  • Grounding is characteristic of the structure
    referred to in CG as nominals and finite clauses.
    More specifically, a nominal or a finite clause
    profiles a grounded instance of a thing or
    process type.
  • A noun designates a type of thing, and a verb a
    type of process.
  • A nominal or a finite clause profiles a grounded
    instance of a thing or process type.
  • Nominal grounding (determiners and quantifiers)
  • the, this, that, some, a, each, every, no, any
  • Clausal grounding (mood and tense)
  • -s, -ed, may, will, should

27
Nouns and nominals
  • Topic types represent classes of topics
  • Conceptual groupings of things, e.g. composer,
    opera, city, ...
  • They correspond to Langackers nouns (types of
    thing)
  • However, topics can have multiple names
  • (This is how we handle synonymy and
    multilingualism)
  • In one sense it is topic names that correspond to
    nouns
  • Topic instances represent individual subjects
  • They correspond to Langackers nominals
    (instances of types)
  • Their names are typically proper nouns, e.g.
    Puccini, Tosca, Lucca

28
Verbs and clauses
  • Association types represent classes of
    relationships
  • They correspond to Langackers verbs (types of
    process)
  • (Often named accordingly, e.g. born in, composed
    by, killed by, ...)
  • Individual associations represent specific
    relationships
  • They correspond to Langackers clauses
    (instances of processes)
  • e.g. Puccini was born in Lucca Tosca was
    composed by Puccini
  • Langacker distinguishes processes (temporal) and
    non-processualrelationships (non-temporal). The
    latter are (prototypically) profiledby
    adjectives, adverbs, prepositions, and
    participles. This distinctionis not made
    explicitly in Topic Maps.
  • Note There are two predefined association types
  • type-instance (the relationship between a topic
    and its type)
  • supertype-subtype (a relationship between types,
    see Hyponymy)

29
Valency
  • Associations can involve one, two or more topics
  • Binary associations, e.g. Puccini composed Tosca,
    are most common and correspond to transitive
    verbs
  • Ternary associations, e.g. Tosca killed Scarpia
    with a knife, can correspond to ditransitive
    verbs
  • Unary associations, e.g. Turandot was unfinished,
    correspond (sort of) to intransitive verbs (or
    binary properties)
  • The arity of an association
  • Corresponds to the valency of a verb

30
Semantic roles
  • An association does not have directionality
  • Instead of direction, Topic Maps uses roles
  • Roles are classified by type
  • Role types specify the nature of each topics
    involvementin the relationship. They correspond
    to semantic roles.
  • (Role types are also topics)
  • Role types are different from topic types...

Puccini
Tosca
RDF
Topic Maps
composer
work
31
Roles and types
composer
composed
work
T
T
T
T
R
A
R
T
Puccini
Tosca
  • The role type can be
  • the same as the role playing topics topic type
    (composer composer)
  • a supertype of the topic type (work gt opera)
  • a subtype of the topic type (teacher lt person)
  • a subtype of the topic types supertype (source lt
    work)

32
Association roles Semantic roles
  • Italian Opera Topic Map
  • composed composer, work
  • born in person, place
  • appears in character, work
  • based on source, result
  • revision of source, result
  • part of part, whole
  • exponent of person, style
  • located in container, containee
  • pupil of teacher, pupil
  • Association roles tend to be much more specific
  • Variable practice as yet no established
    conventions
  • Might (cognitive) linguists have something to
    offer here?
  • (Frawley 1992)
  • (logical actors)agent, author, instrument
  • (logical recipients)patient, experiencer,
    benefactive
  • (spatial roles)theme, source, goal
  • (non-participant roles)locative, reason, purpose

33
Naming of associations
  • Intuitive naming requires flexibility
  • i.e. multiple AT names that change depending on
    the direction of the association
  • Puccini was born in Lucca
  • Lucca was the birthplace of Puccini
  • Alternative CG view
  • Naming should be based on whether the agent or
    the theme is in focus
  • The focus becomes the trajector
  • Point of focus Current topic
  • Some strategies...
  • Voice-based
  • Active / passive forms of the verb
  • composedVa / composedVp by
  • Works well in SVO languages. Less satisfactory
    with SOV.
  • Role-based
  • teacherN of/pupilN of
  • Nominalization
  • composition
  • Tends to be used by Japanese, Koreans (and
    Germans??)
  • Combinations
  • bornV in / birthplaceN of
  • partN of/consistsV of

34
Categories and prototypes
  • Topic types define categories of things
  • But are they Aristotelian or prototypical
    categories?
  • Aristotelian
  • Category membership is binary
  • All instances are equally representative. No
    standard notion of similarity.
  • Prototypical
  • Not defined by necessary and sufficient
    conditions (cf. OWL)
  • The decision is up to the conceptualizer (a.k.a.
    topic map author)
  • A topic can have more than one type
  • Boïto is a composer and a librettist
  • The same topic can be a topic type and a role
    type
  • e.g. Puccini is a composer Puccini plays the
    role of composer in
  • Should we establish conventions for goodness of
    example?
  • Could be useful in automated classification

35
Schemas and constraints
  • Other types can also be said to define categories
  • association types, (occurrence types, name types,
    role types)
  • But these are more schematic (in the CG sense)
  • Schemas are abstract templates obtained by
    reinforcing thecommonality inherent in a set of
    instances(Langacker 2008, p.23, in the context
    of grammatical rules)
  • Rules can be defined as templates and constraints

Puccini composed Tosca The composer Puccini
plays the role of composer in the composition
relationship in which the role of work is played
by the opera Tosca.
36
Hyponymy
  • Topic Maps has two predefined association types
  • type-instance (relationship between a topic and
    its type)
  • supertype-subtype (relationship between the
    denotations of a hyponym and its hyperonym)

37
Synonymy and homonymy
  • Synonyms
  • One subject, multiple names
  • In thesauri USE and USED FOR
  • TMs are subject-centric
  • A topic can have multiple names
  • Names can be typed
  • Typical name types
  • nickname, synonym, alternate name
  • Context can be expressed using scope
  • Typically names in different natural languages
  • composer, komponist, ???, ...
  • Names can also have variants
  • Often used to capture orthographic variation
  • Tchaikovsky, ??????????, Tsjajkovskij,
    Tschaikowski
  • Also useful for sort names, pronunciation, etc.
  • Homonyms
  • One name, multiple subjects
  • In thesauri problematic
  • TMs are based on identifiers
  • Same name can be used by more than one topic
  • Disambiguation in UI is left to the application
  • Two main disambiguation strategies
  • Default qualify by type, e.g.
  • Tosca (opera) vs. Tosca (character)
  • Fallback qualify by some other relationship,
    e.g.
  • Paris (France) vs. Paris (Texas)
  • La Bohème (Puccini) vs. La Bohème (Leoncavallo)

38
Nominalization
Derivation of nouns from other words, including
verbs, adjectives etc. e.g. meetV ? meetingN
  • (A topic map consists of assertions about
    subjects)
  • Assertions are made using statements
  • names, e.g. a certain subject has the name
    Tosca
  • associations, e.g. Tosca is set in Rome
  • occurrences, e.g. http//en.wikipedia.org/wiki/Ro
    me is a web page about Rome
  • Any statement can be reified
  • Reification results in a topic that has the same
    referent as the reified statement
  • e.g. Tosca is set in RomeA ? The setting of
    Tosca in RomeT
  • The (new) reifying topic can have names and
    occurrences,and it can play roles in associations

39
Subjects and topics
  • Topics represent subjects
  • the topic is the representation
  • the subject is the referent
  • Or, in Saussures terms
  • signifiant and signifié
  • A subject can be anything
  • A subject is any thing whatsoever, whether or
    not it exists or has any other specific
    characteristics, about which anything whatsoever
    may be asserted by any means whatsoever.
  • Is the topic/subject pairing a symbolic assembly?

40
Co-reference and collocation
  • Grounding singles out referents and enables
    co-reference
  • between speaker and listener
  • across a sequence of utterances
  • In Topic Maps the central objective is
    collocation
  • By definition, each topic represents a single
    subject (one subject per topic)
  • A topic is intended to be a point of collocation
    for everything that is known about a particular
    subject
  • Therefore the goal is to have only one topic per
    subject
  • To achieve that we need to know which subject a
    topic represents
  • (This is sometimes referred to as the
    intentionality of the relation between a symbol
    and its referent.
  • We call it subject identity.

41
Subject identity
SUBJECTS
TOPICS
  • The identity of a subject is expressed using
    globally unique identifiers called subject
    identifiers
  • If two topics share a subject identifier, they
    are deemed to represent the same subject and must
    be merged

42
Subject identifiers
  • The subject is identified by a URL
  • The URL is called asubject identifier

Machines use the identifier The link is not
resolved. Instead simple lexical comparison is
used. If the strings are identical, the subject
is deemed to be the same and the topics are
merged.
  • The URL is the address of a web page
  • The web page describes the subject such that a
    human can know what subject is referred to
  • This web page is called a subject descriptor

Humans use the descriptor By inspecting the web
page the person responsible for assigning the
identifier can be sure that it does not refer to,
say, Giacomos grandfather Domenico (who was also
a composer of operas)
Is the subject identifier/ subject descriptor
pairing a symbolic assembly?
43
Information structure
  • Intuitive navigation is a key feature of Topic
    Maps
  • But what is its cognitive basis?
  • I claim that it corresponds to the way we think
    (i.e., associatively)
  • Can linguistics back up this claim?
  • topic vs. comment in linguistics (Bussmann, 487)
  • Analysis of sentences according to communicative
    criteria into the topic (what is being talked
    about) and the comment (what is being said about
    the topic)
  • Analysis of utterances according to the
    communicative criteria of given/known information
    vs. new information
  • Cf. theme vs. rheme in Hallidays functional
    grammar
  • Consider our earlier tour of Italian opera...

44
Navigation as narrative
  • Giacomo Puccini was a composer. He was born in
    Lucca in 1858.
  • Lucca is a city, located in Italy. It was the
    birthplace of Puccini and Catalani.
  • Catalani was a composer who composed 5 operas. He
    died in Milan.
  • Milan is the home of La Scala, which was the
    venue for many premiére performances, including
    that of Madam Butterfly.
  • Madam Butterfly is set in Nagasaki, which is
    located in Japan.
  • Japan is (also) the setting for Iris, which is
    an opera which was composed by Mascagni, who
    was a pupil of Ponchielli who was (also) the
    teacher of Puccini...

Giacomo Puccini was a composer. He was born in
Lucca in 1858. Lucca is a city, located in Italy.
It was the birthplace of Puccini and
Catalani. Catalani was a composer who composed 5
operas. He died in Milan. Milan is the home of La
Scala, which was the venue for many premiére
performances, including that of Madam
Butterfly. Madam Butterfly is set in Nagasaki,
which is located in Japan. Japan is (also) the
setting for Iris, which is an opera which was
composed by Mascagni, who was a pupil of
Ponchielli who was (also) the teacher of
Puccini... THEME new theme continuing
themeRHEME predicate with potential new theme
45
Discussion
  • Questions, comments, corrections?
  • What have I missed? Where else should I look?
  • What might linguists contribute?
  • A better understanding of the nature of roles?
  • Approaches to representing temporal knowledge?
  • ...
  • Can Topic Maps inform linguistics?
  • After all, it is a technology that captures (some
    degree of) (some form of) knowledge
  • It seems to have a reasonable cognitive basis
  • It emerged through usage (librarians, indexers,
    etc.)
  • And last but not least, it works!

46
References
  • Bussman, H. Routledge Dictionary of Language and
    Linguistics (London 1996)
  • Frawley, W. Linguistic Semantics (Hillsdale 1992)
  • Langacker, R. Cognitive Grammar (Oxford 2008)
  • Pepper, S. Italian Opera Topic Map
  • http//www.ontopedia.net/ItalianOpera
  • Pepper, S. Topic Maps in Bates, M.J. and Maack,
    M.N. (eds) Encyclopedia of Library and
    Information Sciences (CRC Press, forthcoming
    2009)
  • http//www.ontopedia.net/pepper/papers/ELIS-TopicM
    aps.pdf

47
Questions
48
Notes
Write a Comment
User Comments (0)
About PowerShow.com