Title: Computing and Linguistics A Cognitive Approach or, Computing
1Computing and LinguisticsA Cognitive Approach
or, Computing As We May Think
- Steve Pepper
- pepper.steve_at_gmail.com
- University of Oslo, 2009-04-21
2Todays research questions
- How can linguistics and in particular cognitive
linguistics inform our work with Topic Maps? - Can Topic Maps contribute in any way to the
cognitive linguistics project? - Plan of action
- I tell you about Topic Maps (conceptual model)
- I draw some parallels with natural language
- You correct me, elaborate and suggest new
directions
3Relevance to you as linguists
- As users of the technology
- organizing data collected in your research
- As consultants to users of the technology
- e.g. universities, government agencies, private
enterprise - As contributors to the standard
- clarify some of the cognitive issues, establish
best practices, help extend the standard - As lobbyists to the University of Oslo
- if you think the new UiO web site should be based
onTopic Maps, please make your views known to
the project group http//www.admin.uio.no/prosjek
ter/nyuioweb/
4Relevance in general
- We need to organize information in a new way
- The summation of human experience is
beingexpanded at a prodigious rate, and the
means weuse for threading through the consequent
mazeto the momentarily important item is the
sameas was used in the days of square-rigged
ships.(Vannevar Bush, As We May Think, 1945) - We need new ways of managing knowledge
- In todays global knowledge economy, knowledge
isthe key asset in many organizations... - Topic Maps makes major contributions in both
areas - See the use cases presented at recent Topic Maps
conferences http//www.topicmaps.com
5What is Topic Maps?
- An ISO standard for computer-based
informationand knowledge management - Provides the ability to control infoglut and
share knowledgeby connecting any kind of
information from any kind of source based on its
meaning - A semantic technology
- Cf. Semantic Web (RDF, OWL)
- A form of knowledge representation (primitive
perhaps, but useful) - Widely used for web-based delivery of information
- Plus Information Integration, eLearning,
Business Process Modeling, Product Configuration,
Business Rules Management, Asset Management,
Knowledge Management,
6The problem with computing...
- ...is that its inside-out!
- People used to think the sun revolved around the
earth - Copernicus heliocentric theoryturned this idea
inside out and revolutionized our
understandingof the universe - Today we face a similar situation in computing
- Our computing universe has computers,
applications and documents at the centre - The concepts that our information is about are
somewhere in outer space where they cant be found
7A subject-centric revolution
- This is wrong, because it does not reflect how
humans think - We think in terms of interrelated concepts (or
subjects) - Subjects are what interest us, not documents or
applications - And so subjects must be givencentre stage
- We need a subject-centric revolution
- This has ramifications for every aspect of
human-computer interaction, including user
interfaces, operating systems, file systems, etc. - Consider the typical user desktop...
8- Today our desktops are application-centric and
document-centric - Icons represent applications and documents
9- Why cant they be subject-centric, with icons
that represent the subjects we are interested in? - With links between related icons?
- And with context menus that allow us to find
everything related to a particular subject?
gambia
K185
opera
topic maps
LING 2110
OOXML
tm2008
rana
INF 2820
janacek
bantu semantics
keynote
bayreuth
håkon
10Computing As We May Think
- Bushs solution to information overload
- Organize information As We May Think, i.e.
associatively - His vision spawned the hypertext movement
- Doug Engelbart, Ted Nelson, Bill Atkinson, Tim
Berners-Lee, ... - The World Wide Web is its greatest triumph to
date - But hypertext does not correspond to how we think
- Our heads are not full of millions of interlinked
documents - They are full of interlinked concepts (or
subjects) - Topic Maps provides a close approximation to this
- It is a technology that is based on cognitive
principles
11Background to Topic Maps
- Emerged from the SGML community in 1990s
- Use case How to merge (digital) back-of-book
indexes - Some input from library science
- No input from linguists
- Precious little input from computer scientists
before 2001 - Most of the SGML community came from the
humanities - ISO 13250 first published in 2000 (recently
revised) - A model for representing knowledge organization
structures (indexes, glossaries, thesauri,
encyclopedias) - Plus interchange syntax, query language,
constraint language, ... - Widely adopted in Norway (esp. public sector)
- And gaining ground elsewhere
12The TAO of Topic Maps
Callas, Maria 42 Cavalleria Rusticana
71, 203-204 Mascagni, Pietro Cavalleria
Rusticana . 71, 203-204 Pavarotti, Luciano
45 Puccini, Giacomo . 23, 26-31 Tosca
. 65, 201-202 Rustic Chivalry, see
Cavalleria Rusticana singers .
39-52 baritone . 46 bass
.. 46-47 soprano 41-42, 337
tenor . 44-45 see also Callas,
Pavarotti Tosca 65, 201-202
- The core concepts are derived from the
back-of-book index - Extended and generalized for use with digital
information - Consider a two-layer model consisting of
- a set of information resources (below)
- a knowledge map (above)
- This is like the division of a book into content
and index
(INDEX)
knowledge layer
information layer
(CONTENT)
13(1) The information layer
- The lower layer contains the content
- usually digital, but need not be
- can be in any format or notation or location
- can be text, graphics, video, audio whatever
- This is like the content of the book to which
theback-of-book index belongs
information layer
(CONTENT)
14(2) The knowledge layer
- The upper layer consists of (typed) topics and
associations - Topics represent the subjects that the
information is about - Like the list of topics that forms a back-of-book
index - Associations represent relationships between
those subjects - Like see also relationships in a back-of-book
index
composed by
Domain Italian opera
composed by
Tosca
Puccini
MadameButterfly
born in
knowledge layer
Lucca
(INDEX)
15Occurrences link the layers
- Occurrences represent relationships between
information resources and the subjects that they
are about - The links (or locators) are like page numbers in
a back-of-book index - Occurrences canalso be typed (e.g.bio, map,
synopsis)
16Summary of core concepts
Lets look at some TAOsin the Omnigator
Plus topic types, association types, occurrence
types each of which are represented by topics...
17About the Omnigator
- A free topic map browser from Ontopia
- Download from http//www.ontopia.net (part of
OKS Samplers) - Java-based, runs on any computer
- Completely generic
- Not optimized for any particular ontology
- Display and navigate any conforming topic map
- A teaching aid
- Not designed for end-users (no attempt to hide
technical jargon) - Also used for prototyping and debugging
- Not to be used for most real world applications!
- These require custom interfaces based on a
specific ontology - (see http//www.topicmaps.com for a good example)
18Omnigator interface
a typical topic page
Demo
19Typing topics revisited
- Basic building blocks of the TAO model are
- Topics e.g. Puccini, Lucca, Tosca
- Associations e.g. Puccini was born in Lucca
- Occurrences e.g. http//www.opera.net/puccini/bi
o.htmlis a biography of Puccini - Each of these constructs can be typed
- Topic types composer, city, opera
- Association types born in, composed by
- Occurrence types biography, street map,
synopsis - All such types are also topics
- The set of typing topics constitutes an ontology
20Capabilities of the TAO model (1)
- Represent subjects explicitly
- Topics represent the things users are
interested in - Capture relationships between subjects
- Associations provide user-friendly navigation
paths to information (navigation as we may
think) - Associations also promote serendipitous knowledge
discovery through browsing - Make information findable
- Topics provide a one-stop-shop for everything
that is known about a subject (collocation of
information and knowledge) - Occurrences allow information about a common
subject to be aggregated across multiple systems,
irrespective of location
21Capabilities of the TAO model (2)
- Represent taxonomies and thesauri
- Associations can (also) represent hierarchical
relationships - With Topic Maps you can have multiple,
interlinked hierarchies and faceted
classification - Transcend simple hierarchies
- Rich associative structures capture the
complexity of knowledge and reflect the way
people think - Manage knowledge
- The topic map is the embodiment of
organizational memory - Provides a structured way to capture peoples
knowledge of things, events, relationships, etc.
22Beyond the TAO
For more details, see Pepper 2009
- Formal data model
- Topic maps can be queried, e.g.
- Give me all composers that composed operas that
werebased on plays that were written by
Shakespeare - Interchange syntax
- Topic maps can be interchanged
- Increased reuse added value
- Robust identity model
- Topic maps can be merged
- Potential to federate knowledge
- Scope
- Topic maps can capture context
- Reification
- Topic maps can express different levels of detail
- Similar to scaling in cartography
23Break any questions so far?
- After the break
- Topic Maps and natural language towards a
linguistic perspective
24Parallels with natural language
- Basic grammatical classes
- Nouns and verbs
- Nominals and nouns
- Clauses and verbs
- Valency
- Semantic roles
- Categories and schemas
- Hyponymy
- Synonymy and homonymy
- Nominalization
- Grounding / co-reference
- Information structure
- TAO model
- Topics and associations
- Topics and their types
- Associations and their types
- Arity
- Association roles
- Typing topics
- Type hierarchies
- Naming
- Reification
- Subject identity / collocation
- Navigation
25Basic principles, basic classes
- In elementary school, I was taught that a noun is
the name of a person, place, or thing. In
college, I was taught the basic linguistic
doctrine that a noun can only be defined in terms
of grammatical behavior, conceptual definitions
of grammatical classes being impossible. Here,
several decades later, I demonstrate the
inexorable progress of grammatical theory by
claiming that a noun is the name of a
thing.(Langacker 2008) - The basic grammatical classes are nouns and verbs
- They prototypically profile things and
relationships - They correspond to topics and associations
26Grounding
Langacker 2008 259ff (esp. 264)
- Grounding is characteristic of the structure
referred to in CG as nominals and finite clauses.
More specifically, a nominal or a finite clause
profiles a grounded instance of a thing or
process type. - A noun designates a type of thing, and a verb a
type of process. - A nominal or a finite clause profiles a grounded
instance of a thing or process type. - Nominal grounding (determiners and quantifiers)
- the, this, that, some, a, each, every, no, any
- Clausal grounding (mood and tense)
- -s, -ed, may, will, should
27Nouns and nominals
- Topic types represent classes of topics
- Conceptual groupings of things, e.g. composer,
opera, city, ... - They correspond to Langackers nouns (types of
thing) - However, topics can have multiple names
- (This is how we handle synonymy and
multilingualism) - In one sense it is topic names that correspond to
nouns - Topic instances represent individual subjects
- They correspond to Langackers nominals
(instances of types) - Their names are typically proper nouns, e.g.
Puccini, Tosca, Lucca
28Verbs and clauses
- Association types represent classes of
relationships - They correspond to Langackers verbs (types of
process) - (Often named accordingly, e.g. born in, composed
by, killed by, ...) - Individual associations represent specific
relationships - They correspond to Langackers clauses
(instances of processes) - e.g. Puccini was born in Lucca Tosca was
composed by Puccini - Langacker distinguishes processes (temporal) and
non-processualrelationships (non-temporal). The
latter are (prototypically) profiledby
adjectives, adverbs, prepositions, and
participles. This distinctionis not made
explicitly in Topic Maps. - Note There are two predefined association types
- type-instance (the relationship between a topic
and its type) - supertype-subtype (a relationship between types,
see Hyponymy)
29Valency
- Associations can involve one, two or more topics
- Binary associations, e.g. Puccini composed Tosca,
are most common and correspond to transitive
verbs - Ternary associations, e.g. Tosca killed Scarpia
with a knife, can correspond to ditransitive
verbs - Unary associations, e.g. Turandot was unfinished,
correspond (sort of) to intransitive verbs (or
binary properties) - The arity of an association
- Corresponds to the valency of a verb
30Semantic roles
- An association does not have directionality
- Instead of direction, Topic Maps uses roles
- Roles are classified by type
- Role types specify the nature of each topics
involvementin the relationship. They correspond
to semantic roles. - (Role types are also topics)
- Role types are different from topic types...
Puccini
Tosca
RDF
Topic Maps
composer
work
31Roles and types
composer
composed
work
T
T
T
T
R
A
R
T
Puccini
Tosca
- The role type can be
- the same as the role playing topics topic type
(composer composer) - a supertype of the topic type (work gt opera)
- a subtype of the topic type (teacher lt person)
- a subtype of the topic types supertype (source lt
work)
32Association roles Semantic roles
- Italian Opera Topic Map
- composed composer, work
- born in person, place
- appears in character, work
- based on source, result
- revision of source, result
- part of part, whole
- exponent of person, style
- located in container, containee
- pupil of teacher, pupil
- Association roles tend to be much more specific
- Variable practice as yet no established
conventions - Might (cognitive) linguists have something to
offer here?
- (Frawley 1992)
- (logical actors)agent, author, instrument
- (logical recipients)patient, experiencer,
benefactive - (spatial roles)theme, source, goal
- (non-participant roles)locative, reason, purpose
33Naming of associations
- Intuitive naming requires flexibility
- i.e. multiple AT names that change depending on
the direction of the association - Puccini was born in Lucca
- Lucca was the birthplace of Puccini
- Alternative CG view
- Naming should be based on whether the agent or
the theme is in focus - The focus becomes the trajector
- Point of focus Current topic
- Some strategies...
- Voice-based
- Active / passive forms of the verb
- composedVa / composedVp by
- Works well in SVO languages. Less satisfactory
with SOV. - Role-based
- teacherN of/pupilN of
- Nominalization
- composition
- Tends to be used by Japanese, Koreans (and
Germans??) - Combinations
- bornV in / birthplaceN of
- partN of/consistsV of
34Categories and prototypes
- Topic types define categories of things
- But are they Aristotelian or prototypical
categories? - Aristotelian
- Category membership is binary
- All instances are equally representative. No
standard notion of similarity. - Prototypical
- Not defined by necessary and sufficient
conditions (cf. OWL) - The decision is up to the conceptualizer (a.k.a.
topic map author) - A topic can have more than one type
- Boïto is a composer and a librettist
- The same topic can be a topic type and a role
type - e.g. Puccini is a composer Puccini plays the
role of composer in - Should we establish conventions for goodness of
example? - Could be useful in automated classification
35Schemas and constraints
- Other types can also be said to define categories
- association types, (occurrence types, name types,
role types) - But these are more schematic (in the CG sense)
- Schemas are abstract templates obtained by
reinforcing thecommonality inherent in a set of
instances(Langacker 2008, p.23, in the context
of grammatical rules) - Rules can be defined as templates and constraints
Puccini composed Tosca The composer Puccini
plays the role of composer in the composition
relationship in which the role of work is played
by the opera Tosca.
36Hyponymy
- Topic Maps has two predefined association types
- type-instance (relationship between a topic and
its type) - supertype-subtype (relationship between the
denotations of a hyponym and its hyperonym)
37Synonymy and homonymy
- Synonyms
- One subject, multiple names
- In thesauri USE and USED FOR
- TMs are subject-centric
- A topic can have multiple names
- Names can be typed
- Typical name types
- nickname, synonym, alternate name
- Context can be expressed using scope
- Typically names in different natural languages
- composer, komponist, ???, ...
- Names can also have variants
- Often used to capture orthographic variation
- Tchaikovsky, ??????????, Tsjajkovskij,
Tschaikowski - Also useful for sort names, pronunciation, etc.
- Homonyms
- One name, multiple subjects
- In thesauri problematic
- TMs are based on identifiers
- Same name can be used by more than one topic
- Disambiguation in UI is left to the application
- Two main disambiguation strategies
- Default qualify by type, e.g.
- Tosca (opera) vs. Tosca (character)
- Fallback qualify by some other relationship,
e.g. - Paris (France) vs. Paris (Texas)
- La Bohème (Puccini) vs. La Bohème (Leoncavallo)
38Nominalization
Derivation of nouns from other words, including
verbs, adjectives etc. e.g. meetV ? meetingN
- (A topic map consists of assertions about
subjects) - Assertions are made using statements
- names, e.g. a certain subject has the name
Tosca - associations, e.g. Tosca is set in Rome
- occurrences, e.g. http//en.wikipedia.org/wiki/Ro
me is a web page about Rome - Any statement can be reified
- Reification results in a topic that has the same
referent as the reified statement - e.g. Tosca is set in RomeA ? The setting of
Tosca in RomeT - The (new) reifying topic can have names and
occurrences,and it can play roles in associations
39Subjects and topics
- Topics represent subjects
- the topic is the representation
- the subject is the referent
- Or, in Saussures terms
- signifiant and signifié
- A subject can be anything
- A subject is any thing whatsoever, whether or
not it exists or has any other specific
characteristics, about which anything whatsoever
may be asserted by any means whatsoever. - Is the topic/subject pairing a symbolic assembly?
40Co-reference and collocation
- Grounding singles out referents and enables
co-reference - between speaker and listener
- across a sequence of utterances
- In Topic Maps the central objective is
collocation - By definition, each topic represents a single
subject (one subject per topic) - A topic is intended to be a point of collocation
for everything that is known about a particular
subject - Therefore the goal is to have only one topic per
subject - To achieve that we need to know which subject a
topic represents - (This is sometimes referred to as the
intentionality of the relation between a symbol
and its referent. - We call it subject identity.
41Subject identity
SUBJECTS
TOPICS
- The identity of a subject is expressed using
globally unique identifiers called subject
identifiers - If two topics share a subject identifier, they
are deemed to represent the same subject and must
be merged
42Subject identifiers
- The subject is identified by a URL
- The URL is called asubject identifier
Machines use the identifier The link is not
resolved. Instead simple lexical comparison is
used. If the strings are identical, the subject
is deemed to be the same and the topics are
merged.
- The URL is the address of a web page
- The web page describes the subject such that a
human can know what subject is referred to - This web page is called a subject descriptor
Humans use the descriptor By inspecting the web
page the person responsible for assigning the
identifier can be sure that it does not refer to,
say, Giacomos grandfather Domenico (who was also
a composer of operas)
Is the subject identifier/ subject descriptor
pairing a symbolic assembly?
43Information structure
- Intuitive navigation is a key feature of Topic
Maps - But what is its cognitive basis?
- I claim that it corresponds to the way we think
(i.e., associatively) - Can linguistics back up this claim?
- topic vs. comment in linguistics (Bussmann, 487)
- Analysis of sentences according to communicative
criteria into the topic (what is being talked
about) and the comment (what is being said about
the topic) - Analysis of utterances according to the
communicative criteria of given/known information
vs. new information - Cf. theme vs. rheme in Hallidays functional
grammar - Consider our earlier tour of Italian opera...
44Navigation as narrative
- Giacomo Puccini was a composer. He was born in
Lucca in 1858. - Lucca is a city, located in Italy. It was the
birthplace of Puccini and Catalani. - Catalani was a composer who composed 5 operas. He
died in Milan. - Milan is the home of La Scala, which was the
venue for many premiére performances, including
that of Madam Butterfly. - Madam Butterfly is set in Nagasaki, which is
located in Japan. - Japan is (also) the setting for Iris, which is
an opera which was composed by Mascagni, who
was a pupil of Ponchielli who was (also) the
teacher of Puccini...
Giacomo Puccini was a composer. He was born in
Lucca in 1858. Lucca is a city, located in Italy.
It was the birthplace of Puccini and
Catalani. Catalani was a composer who composed 5
operas. He died in Milan. Milan is the home of La
Scala, which was the venue for many premiére
performances, including that of Madam
Butterfly. Madam Butterfly is set in Nagasaki,
which is located in Japan. Japan is (also) the
setting for Iris, which is an opera which was
composed by Mascagni, who was a pupil of
Ponchielli who was (also) the teacher of
Puccini... THEME new theme continuing
themeRHEME predicate with potential new theme
45Discussion
- Questions, comments, corrections?
- What have I missed? Where else should I look?
- What might linguists contribute?
- A better understanding of the nature of roles?
- Approaches to representing temporal knowledge?
- ...
- Can Topic Maps inform linguistics?
- After all, it is a technology that captures (some
degree of) (some form of) knowledge - It seems to have a reasonable cognitive basis
- It emerged through usage (librarians, indexers,
etc.) - And last but not least, it works!
46References
- Bussman, H. Routledge Dictionary of Language and
Linguistics (London 1996) - Frawley, W. Linguistic Semantics (Hillsdale 1992)
- Langacker, R. Cognitive Grammar (Oxford 2008)
- Pepper, S. Italian Opera Topic Map
- http//www.ontopedia.net/ItalianOpera
- Pepper, S. Topic Maps in Bates, M.J. and Maack,
M.N. (eds) Encyclopedia of Library and
Information Sciences (CRC Press, forthcoming
2009) - http//www.ontopedia.net/pepper/papers/ELIS-TopicM
aps.pdf
47Questions
48Notes