1 of 142

About This Presentation

Title:

1 of 142

Description:

... 'Find me information about all flights between Malta and London' ... E.g., 'Find me information about cheap flights between Malta and London' University of Malta ... – PowerPoint PPT presentation

Number of Views:176

Avg rating:3.0/5.0

Slides: 143

Provided by: chr193

Category:

Tags:

more less

Transcript and Presenter's Notes

Title: 1 of 142

1
CSA4080Adaptive Hypertext Systems II
Topic 6 Information and Knowledge Representation

Dr. Christopher Staff
Department of Computer Science AI
University of Malta

2
Aims and Objectives

Models of Information Retrieval
Vector Space Model
Probabilistic Model
Relevance Feedback
Query Reformulation

3
Aims and Objectives

Dealing with General Knowledge
Programs that reason
Conceptual Graphs
Intelligent Tutoring Systems

4
Background

Weve talked about how user information can be
represented
We need to be able to represent information about
the domain so that we can reason about what the
users interests are, etc.
We covered the difference between data,
information, and knowledge in CSA3080...

5
Background

In 1945, Vannevar Bush writes As We May Think
Gives rise to seeking intelligent solutions to
information retrieval, etc.
In 1949, Warren Weaver writes that if Chinese is
English codification, then machine translation
should be possible
Leads to surface-based/statistical techniques

6
Background

Even today, nearly 60 years later, there is
significant effort in both directions
For years, intelligent solutions were hampered by
the lack of fast enough hardware, software
Doesnt seem to be an issue any longer, and the
Semantic Web may be testimony to that
But there are sceptics

7
Background

Take IR as an example
At the dumb end we have reasonable generic
systems, but at other end, systems are domain
specific, more expensive, but do they give
better results?

8
Background

At what point does it cease to be cost effective
to attempt more intelligent solutions to the IR
problem?

9
Background

Is Information Retrieval a misnomer?
Consider your favourite Web-based IR system...
does it retrieve information?
Can you ask Find me information about all
flights between Malta and London?
And what would you get back?
Can you ask Who was the first man on the moon?

10
Background

With many IR systems that we use, the
intelligence is firmly rooted in the user
We must learn how to construct our queries so
that we get the information we seek
We sift through relevant and non-relevant
documents in the results list
What we can hope for is that patterns can be
identified to make life easier for us - e.g.,
recommender systems

11
Background

Surface-based techniques tend to look for and
re-use patterns as heuristics, without attempting
to encode meaning
The Semantic Web, and other intelligent
approaches, try to encode meaning so that it can
be reasoned with and about
Cynics/sceptics/opponents believe that there is
more success to be had in giving users more
support, than to encode meaning into documents to
support automation

12
However...

We will cover both surface-based and some
knowledge-based approaches to supporting the user
in his or her task

13
Information Retrieval

We will discuss two IR models...
Vector Space Model
Probabilistic Model
... and surface-based techniques that can improve
their usability
Relevance Feedback
Query Reformulation
Question-Answering

14
Knowledge

Conceptual graphs support the encoding and
matching of concepts
Conceptual graphs are more intelligent and can
be used to overcome some problems like the
Vocabulary Problem

15
Reasoning on the Web

REWERSE (FP6 NoE) is an attempt to represent
meaning contained in documents and to reason with
and about it so that a single high-level user
request may be carried out even if it contains
several sub-tasks
E.g., Find me information about cheap flights
between Malta and London

16
Vector-Space Model

Recommended Reading
p18-wong (Generalised Vector Space Model).pdf -
look at refs 1,2,3 for original work

17
Vector-Space Model

Documents are represented as m-dimensional
vectors or bags of words
m is the size of the vocabulary
wk 1, indicates term is present in document
wk 0, indicates term is absent
dj lt1,0,0,1,...,0,0gt

18
Vector-Space Model
19
Vector-Space Model

The query is then plotted into m-dimensional
space and the nearest neighbours are the most
relevant
However, the results set is usually presented as
a list ranked by similarity to the query

20
Vector-Space Model

Cosine Similarity Measure (from IR vector space
model.pdf)

21
Vector-Space Model

Calculating term weights
Term weights may be binary, integers, or reals
Binary values are thresholded, rather than simply
indicating presence or absence
Integers or reals will be measure of relative
significance of term in document
Usually, term weight is TFxIDF

22
Vector-Space Model

Steps in calculating term weights
Remove stop words
Stem remaining words
Count term frequency (TF)
Count number of documents containing term (DF)
Invert it (log(C/DF)), where C is total number of
documents in collection

23
Vector-Space Model

Normalising weights for vector length
Documents with longer vectors have a better
chance of being retrieved than short ones (simply
because there are a larger number of terms that
they will match in a query)
IR should treat all relevant documents as
important for retrieval purposes
Solution , where w is weight of term t

24
Vector-Space Model

Why does this work?
Term discrimination
Assumes that terms with high TF and low DF are
good discriminators of relevant documents
Because documents are ranked, documents do not
need to contain precisely the terms expressed in
the query
We cannot say anything (in VSM) about terms that
occur in relevant and non-relevant documents -
though we can in probabilistic IR

25
Vector-Space Model

Vector-Space Model is also used by Recommender
Systems to index user profiles and product, or
item, features
Apart from ranking documents, results lists can
be controlled (to list top n relevant documents),
and query can be automatically reformulated based
on relevance feedback

26
Relevance Feedback

When a user is shown a list of retrieved
documents, user can give relevance judgements
System can take original query and relevance
judgements and re-compute the query
Rocchio...

27
Relevance Feedback

Basic Assumptions
Similar docs are near each other in vector space
Starting from some initial query, the query can
be reformulated to reflect subjective relevance
judgements given by the user
By reformulating the query we can move the query
closer to more relevant docs and further away
from nonrelevant docs

28
Relevance Feedback

In VSM, reformulating query means re-weighting
terms in query
Not failsafe may move query towards nonrelevant
docs!

29
Relevance Feedback

The Ideal Query
If we know the answer set rel, then the ideal
query is

30
Relevance Feedback

In reality, a typical interaction will be
User formulates query and submits it
IR system retrieves set of documents
User selects R and N
where 0 lt ??????? lt 1 (and vector magnitude
usually dropped...)

31
Relevance Feedback

What are the values of ?? ? and ??
??is typically given a value of 0.75, but this
can vary. Also, after a number of iterations, the
original weights of terms can be highly reduced
If ? and ? have equal weight, then relevant and
nonrelevant docs make equal contribution to
reformulated query
If ? 1, ? 0, then only relevant docs are used
in reformulated query
Usually, use ? 0.75, ? 0.25

32
Relevance Feedback

Example
Q (5, 0, 3, 0, 1)
R (2, 1, 2, 0, 0) N (1, 0, 0, 0, 2)
? 0.75, ? 0.50, ? 0.25
Q 0.75Q 0.5R 0.25N
0.75(5, 0, 3, 0, 1)0.5(2, 1, 2, 0,
0)0.25(1,0, 0, 0, 2)
(4.5, 0.5, 3.25, 0, 0.25)

33
Relevance Feedback

How many docs to use in R and N?
Use all docs selected by user
Use all rel docs and highest ranking nonrel docs
Usually, user selects only relevant docs...
Should entire document vector be used?
Really want to identify the significant terms...
Use terms with high-frequency/weight
Use terms in doc adjacent to terms from query
Use only common terms in R (and N)

34
Automatic Relevance Feedback

Users tend not to select nonrelevant documents,
and rarely choose more than one relevant document
(http//www.dlib.org/dlib/november95/11croft.html)
This makes it difficult to use relevance feedback
Current research uses automatic relevance
feedback techniques...

35
Automatic Relevance Feedback

Two main approaches
To improve precision
To improve recall

36
Automatic Relevance Feedback

Reasons for low precision
Documents contain query terms, but documents are
not about the concept or topic the user is
interested in
E.g., user wants documents in which a cat chases
a dog but the query ltcat, chase, doggt also
retrieves docs in which dogs chase cats
Term ambiguity

37
Automatic Relevance Feedback

Improving precision
Want to promote relevant documents in the results
list
Assume that top-n (typically 20) documents are
relevant, and assume docs ranked 500-1000 are
nonrelevant
Choose co-occurring discriminatory terms
Re-rank docs ranked 21-499 using (modified)
Rocchio method

p206-mitra.pdf
38
Automatic Relevance Feedback

Improving precision
Does improve precision by 6-13 at p-21 to p-100
But remember that precision is to do with the
ratio of relevant to nonrelevant documents
retrieved
There may be many relevant documents that were
never retrieved (i.e., low recall)

39
Automatic Relevance Feedback

Reasons for low recall
Concept or topic that user is interested in
can be described using terms additional to those
express by user in query
E.g., think of all the different ways in which
you can express car, including manufacturers
names (e.g., Ford, Vauxhall, etc.)
There is only a small probability that user and
author use the same term to describe the same
concept

40
Automatic Relevance Feedback

Reasons for low recall
Imprudent query term expansion improves
recall, simply because more documents are
retrieved, but hurts precision!

41
Automatic Relevance Feedback

Improving recall
Manually or automatically generated thesaurus
used to expand query terms before query is
submitted
Were currently working on other techniques to
pick synonyms that are likely to be relevant
Semantic Web attempts to encode semantic meaning
into documents

p61-voorhees.pdf, qiu94improving.pdf,
MandalaSigir99EvComboWordNet.pdf
42
Indexing Documents

Obviously, comparing a query vector to each
document vector to determine the similarity is
expensive
So how can we do it efficiently, especially for
gigantic document collections, like the Web?

43
Indexing Documents

Inverted indices
An inverted index is a list of terms in the
vocabulary together with a postings list for each
term
A postings list is a list of documents containing
the term

44
Indexing Documents

Inverted index
Several pieces of information can be stored in
the postings list
term weight
location of the term in the document (to support
proximity operators)

45
Indexing Documents

Results set is obtained using set operators
Once documents in results set are known, their
vectors can be retrieved to perform ranking
operations on them
The document vectors also allow automatic query
reformulation to occur following relevance
feedback
See brin.pdf and p2-arasu.pdf

46
Probabilistic IR

VSM assumes that a document that contains some
term x is about that term
PIR compares the probability of seeing term x in
a relevant document as opposed to a nonrelevant
document
Binary Independence Retrieval Model proposed by
Robertson Sparck Jones, 1976

robertson97simple.pdf, SparckJones98.pdf
47
BIR

BIR Fundamentals
Given a user query there is a set of documents
which contains exactly the relevant documents and
no other
the ideal answer set
Given the ideal answer set, a query can be
constructed that retrieves exactly this set
Assumes that relevant documents are clustered,
and that terms used adequately discriminate
against non-relevant documents

48
BIR

We do not know what are, in general, the
properties of the ideal answer set
All we know is that documents have terms which
capture semantic meaning
When user submits a query, guess what might be
the ideal answer set
Allow user to interact, to describe the
probabilistic description of the ideal answer set
(by marking docs as relevant/non-relevant)

49
BIR

Probabilistic Principle Assumption
Given a user query q and a document dj in the
collection
Estimate the probability that the user will find
dj relevant to q
Rank documents in order of their probability of
relevance to the query (Probability Ranking
Principle)

50
BIR

Model assumes that probability of relevance
depends on q and doc representations only
Assumes that there is an ideal answer set!
Assumes that terms are distributed differently in
relevant and non-relevant documents

51
BIR

Whether or not a document x is retrieved depends
on
Pr(relx) the probability that x is relevant
Pr(nonrelx) ... that x isnt relevant

52
BIR

Document Ranking Function document x will be
retrieved if
where a2 is the cost of not retrieving a
relevant document, and a1 is the cost of
retrieving a non-relevant document
If we knew Pr(relx) (or Pr(nonrelx)), solution
would be trivial, but...

53
BIR

Use Bayes Theorem to rewrite Pr(relx)
Pr(x) probability of observing x
P(rel) a priori probability of relevance (ie,
probability of observing a set of relevant
documents)
probability that x is in the given set of
relevant docs

54
BIR

Can do the same for

55
BIR

The document ranking function can be rewritten
as
and simplified as
Pr(x rel) and Pr(x nonrel) are still
unknown, so we will replace them in terms of
keywords in the document!

56
BIR

We assume that terms occur independently in
relevant and non-
relevant docs...
probability that term xi is present in a
document randomly selected from the ideal answer
set
probability that term xi is present in
a document randomly selected from outside the
ideal answer set

57
BIR

Considering document , where di is the
weight of term i,
where is the probability that a relevant
document contains term xi (similarly for )

58
BIR

When di 0 we want the contribution of term i to
g(x) to be 0

59
BIR

The term relevance weight of term xi is
Weight of term i in document j is

60
BIR

Estimation of term occurrence probability
Given a query, a document collection can be
partitioned into a relevant and non-relevant set
The importance of a term j is its discriminatory
power in distinguishing between relevant and
nonrelevant documents

61
BIR

With complete
information about
the relevant non-
relevant document
sets we can estimate
pj and qj
Approximation

62
BIR

Term Occurrence Probability Without Relevance
Information
What do we do because we dont know rj?
since most docs are nonrelevant
pj 0.5 (arbitrary)
does this remind you of anything?

63
BIR

Reminder... Ranking Function
where,
pi Pr(xidirel)
qi Pr(xidinonrel)
and di is the weight of term i

64
Relevance Feedback in BIR

Want to add more terms to the query so the query
will resemble documents marked as relevant (note
difference from VSM)
How do we select which terms to add to the query?

65
Relevance Feedback in BIR

Rank terms in marked documents and add the first
m terms
where
N no. of docs in the collection
ni document frequency of term i
R no. of relevant docs selected
ri no. of docs in R containing term i
Compares frequency of occurrence of term in R
with document frequency

66
Question-Answering on the Web

Two aspects to IR
Coverage (find all relevant documents)
Question-Answering (find the answer to specific
query)
In QA we want one answer to our question
How much NLP do we need to use to answer
fact-based questions
Answers that require reasoning are much harder!

67
Question Answering

Most IR tasks assume that user can predict what
terms a relevant document will contain
But sometimes what we want is the answer to a
direct question
Who was the first man on the moon?
Do we really want a list of millions of documents
that contain first, man, moon?
And do we really want to have to read them to
find the answer?

68
Question Answering

All we want is one document, or one statement,
that contains the answer
Can we take advantage of IR on the Web to do
this?
Taking advantage of redundancy on the Web
E.g., Mulder, Dumais

69
Mulder

Uses Web as collection of answers to factual
questions
Who was the first man on the moon?
What is the capital of Italy?
Where is the Taj Mahal?

kwok01scaling.pdf
70
Mulder

Three parts to a QA system
Retrieval Engine
Indexes documents in a collection and retrieves
them
Query Formulator
Converts NL question into formal query
Answer Extractor
Locates answer in text

71
Mulder

Six parts to Mulder
Question Parsing
Question Classification
Query Formulation
Search Engine
Answer Extraction
Answer Selection

72
Dumais et al

Takes advantage of multiple, differently phrased,
answer occurrences on Web
Doesnt need to find all answer phrases
Just the ones that match the query pattern
Rules for converting questions, finding answers
are mostly handwritten

p291-dumais
73
Dumais et al

Steps
Rewrite question into weighted query patterns
Use POS tagger lexicon to seek alternative word
forms
Search
Mine N-grams in summaries
Filter and re-weight N-grams
Tile N-grams to yield longer answers

74
Azzopardi

Joel Azzopardi, 2004, Template-Based Fact
Finding on the Web FYP report, CSAI
Can find factoids about a series of queries
relating to a particular topic using majority
polling (voting) to decide amongst competing
answers
Series of topic sensitive query patterns stored
in template

75
Azzopardi

Template is learned by comparing a sample of
documents about a topic
Commonly occurring phrases (trigrams) extracted
and turned into partial query in template,
together with answer type

76
Azzopardi

When user wants information regarding a topic,
use appropriate template together with subject
(e.g., persons name)
Subject is appended to partial queries in
template - queries are submitted to Google
Top-n documents retrieved and processed to
identify candidate answers
Uses voting to decide on most frequently
occurring answer

77
Summary

Weve discussed a couple of popular models of IR
that are more intelligent that plain old
Extended Boolean Information Retrieval
They still treat terms as atoms that are
representative of the semantic meaning of the
document

78
Summary

But word order generally insignificant (bag of
words)
Cannot distinguish between dog chased cat and
cat chased dog
unless phrase matching also used, but then cannot
tell that cat chased dog and dog was chased by
cat are semantically equivalent
What about information extraction?
George W. Bush President of the United States
of America

79
Summary

More intelligent approaches have been used
And more intelligence is being put into the
Web
Personalisation and user-adaptivity also require
high accuracy in determining which documents are
relevant to a user

80
Summary

Sowas conceptual graphs and McCarthys
Generality in AI/Notes on Contextual Reasoning
are seminal works that underpin much that is
happening in the Semantic Web
CGs represent semantic content of utterances in
interchangeable format (KIF)
McCarthy claims that it is hard to make correct
inferences in the absence of contextual
information

81
Summary

Because of the expense of CGs, they are still
very much domain specific
SemWeb hopes that by bringing massive numbers of
people together there will be a proliferation of
ontologies to make it happen
Guha did his PhD Contexts A Formalisation and
Some Applications at Stanford, under John
McCarthy. His work on Cyc underpins RDF, DAMLOIL

82
Dealing with General Knowledge

Why did Mary hit the piggy bank with a hammer?

83
Dealing with General Knowledge

Do computer systems need general knowledge?
How do computer systems represent general
knowledge?

84
Dealing with General Knowledge

Do we need general knowledge?
How do we represent general knowledge?

85
Dealing with General Knowledge

As usual, has its roots in philosophy
(epistemology)
Early (i.e., Greek) revolved around Absolute and
Universal Ideas and Forms (Plato)
Aristotle Logic for representing and reasoning
about knowledge

http//pespmc1.vub.ac.be/EPISTEMI.html
86
Dealing with General Knowledge

Following Renaissance, two main schools of
thought
Empiricists
Knowledge as product of sensory perception
Rationalists
Product of rational reflection

87
Dealing with General Knowledge

Kantian Synthesis of empiricism and reflectionism
Knowledge results from the organization of
perceptual data on the basis of inborn cognitive
structures, called "categories".
Categories include space, time, objects and
causality.
(viz. Chomskys Universal Grammar)

88
Dealing with General Knowledge

Pragmatism
Knowledge consists of models that attempt to
represent the environment to simplify
problem-solving
Assumption Models are rich. No model can ever
hope to capture all relevant information, and
even if such a complete model would exist, it
would be too complicated to use in any practical
way.

89
Dealing with General Knowledge

Pragmatism (contd.)
The model which is to be chosen depends on the
problems that are to be solved (context).
But see also discussions on pragmatic vs.
cognitive contexts! (Topic 3)
Basic criterion model should produce correct (or
approximate) (testable) predictions or
problem-solutions, and be as simple as possible.
This is the approach mainly used in CS/AI today

90
Dealing with General Knowledge

The first theories of knowledge stressed its
absolute, permanent character, whereas the later
theories put the emphasis on its relativity or
situation-dependence, its continuous development
or evolution, and its active interference with
the world and its subjects and objects. The whole
trend moves from a static, passive view of
knowledge towards a more and more adaptive and
active one.

http//pespmc1.vub.ac.be/EPISTEMI.html
91
Dealing with General Knowledge

Well look at four overviews of and approaches to
knowledge in computer systems
McCarthy (1959, mcc.pdf)
Sowa (1979, p79-1010.pdf)
McCarthy (1987, p1030-mccarthy.pdf)
Brézillon Pomerol (2001, is-context-a-kind.pdf)

92
Dealing with General Knowledge

McCarthy, J. 1959. Programs with Common Sense
a program has common sense if it automatically
deduces for itself a sufficiently wide class of
immediate consequences of anything it is told and
what it already knows.

93
Dealing with General Knowledge

Objective to make programs that learn from
their experience as effectively as humans do
To learn to improve how to learn
And to do it in logic using a logical
representation

94
Dealing with General Knowledge

Minimum features required of a machine that can
evolve intelligence approaching that of humans
Representation of all behaviours
Interesting changes in behaviour must be
expressible
All aspects of behaviour must be improvable
Must have notion of partial success
System must be able to create/learn subroutines

95
Dealing with General Knowledge

Bar-Hillels biggest complaint (in my opinion) is
A deductive argument, where you have first to
find out what are the relevant premises, is
something which many humans are not always able
to carry out successfully. I do not see the
slightest reason to believe that at present
machines should be able to perform things that
humans find trouble in doing
Well return to this in Closed vs. Open World
Assumption

96
Dealing with General Knowledge

Sowa, J. 1979. Semantics of Conceptual Graphs
Logic used by McCarthy as representation of
statements about the world as well as theorem
prover to infer/deduce new knowledge
(assumptions) about the world
Sowa uses CG as a language for representing
knowledge and patterns for constructing models

97
Dealing with General Knowledge

Sowa proposes CGs as better alternative to
semantic networks and predicate calculus
SemNets have no well-defined semantics
PC is adequate for describing mathematical
theories with a closed set of axioms... But the
real world is messy, incompletely explored, and
full of unexpected surprises

98
Dealing with General Knowledge

CGs serve two purposes
They can be used as canonical representations of
meaning in Natural Language
They can be used to construct abstract structures
that serve as models in the model-theoretic sense
(e.g., microtheories)

99
Dealing with General Knowledge

To understand a sentence
Convert utterance to CG
Join CG to graphs that help resolve ambiguities
and incorporate background information
Resulting graph is nucleus for constructing
models (of worlds) in which utterance is true
Laws of world block illegal extensions
If model could be extended infinitely, result
would be complete standard model

100
Dealing with General Knowledge

Mary hit the piggy bank with a hammer

101
Dealing with General Knowledge

Linearizing the conceptual graph
PERSONMary-gt(AGNT)-gtHITc1lt-(INST) lt-
HAMMER
HITc1lt-(PTNT)lt-PIGGY-BANKi22103

102
Dealing with General Knowledge

Context-sensitive logical operators
Allow building models of possible worlds and
checking their consistency
Def A sequent is a collection of conceptual
graphs divided into two sets, called the
conditions u1,..., un and the assertions v1,...,
vm. It is written u1,..., un -gtv1,..., vm.

103
Dealing with General Knowledge

Cases of sequents
simple assertion no conditions, one assertion
(-gtv)
disjunction no conditions, one or more
assertions
(-gtv1,..., vm)
simple denial one condition, no assertions (u-gt)
compound denial 2 or more conditions, no
assertions (u1,..., un-gt)
conditional assertion u1,..., un -gtv1,..., vm
empty clause -gt
Horn clause anything with at most one assertion
(inc. 0)

104
Dealing with General Knowledge

McCarthy, J. 1987. Generality in Artificial
Intelligence (1971 Turing Award Lecture)
no one knows how to make a general database of
commonsense knowledge that could be used by any
program that needed the knowledge
Examples robots moving things around, what we
know about families, buying and selling...

105
Dealing with General Knowledge

In my opinion, getting a language my italics
for expressing general commonsense knowledge for
inclusion in a general database is the key
problem of generality in AI.

106
Dealing with General Knowledge

How can we write programs that can learn to
modify their own behaviour, including improving
the way they learn?
Friedberg (A Learning Machine, c. 1958)
Newell, Simon, Shaw (General Problem Solver, c.
1957-1969)
Newell, Simon (Production Machines, 1950-1972)
McCarthy (Logical Representation, c. 1958)
McCarthy (Formalising Context, 1987)

107
Dealing with General Knowledge

A Learning Machine
Learns by making random modifications to a
program
Discard flawed programs
Learnt to move a bit from one memory cell to
another
In 1987, was demonstrated to be inferior to
simply re-writing the entire program

108
Dealing with General Knowledge

General Problem Solver
Represent problems of some class as problems of
transforming one expression into another using a
set of allowed rules
First system to separate problem structure from
the domain
McCarthy claims problem in representing
commonsense knowledge as transformations

109
Dealing with General Knowledge

Production (Expert) Systems
Represent knowledge as facts and rules
Facts contain no variables or quantifiers
New facts are produced by inference, observation
and user input
Rules are usually coded by programmer/expert
Rules are usually not learnt or generated by
system (but see data mining)

110
Dealing with General Knowledge

Logical Representation
Representing information declaratively
Although Prolog can represent facts in logical
representation and reason using logic, it cannot
do universal generalization, and so cannot modify
its own behaviour enough
So McCarthy built Lisp...

111
Dealing with General Knowledge

Logical Representation
McCarthys dream is that commonsense knowledge
possessed by humans could be written as logical
sentences and stored in a db
Facts about the effects of actions is essential
(when we hear the squeal of types we expect a
bang...)
Necessary to say that an action changes only
features of the situation to which it refers

112
Dealing with General Knowledge

Context
We understand under-qualified utterances because
we understand them in context
The book is on the table
Where is the book?

113
Dealing with General Knowledge

Context
Can you fetch me the book, please?
Up until the last utterance, the physical
location of the book was not significant, and we
were able to have a short dialogue about it
Fully qualified utterances are too unwieldy to
use in conversation
Occasionally gives rise to misunderstandings...

114
Dealing with General Knowledge

Context
The book is on the table is valid for a large
number of different contexts, in which the
specific book and the specific table, and perhaps
even the location of the specific table can be
significant and can also change over time
Utterances are understood in context

115
Dealing with General Knowledge

Is Context a ... collective Tacit Knowledge?
How does data become knowledge?

116
Dealing with General Knowledge

Is Context a ... collective Tacit Knowledge?
Context is the collection of relevant conditions
and surrounding influences that make a situation
unique and comprehensible

117
Dealing with General Knowledge

Where is context?

118
Dealing with General Knowledge

Closed world vs. Open World assumption
Closed World
I assume that anything I dont know the truth of
is false I know everything that is true
Open World
I assume that anything I dont know the truth of
is unknown Some things I dont know may be true
I dont know everything

119
Dealing with General Knowledge

Prolog, for instance, will return false about
any fact that is missing from its database, or
for which it cannot derive a truth-value
A three-valued logic permits assertions to be
true, false, or unknown
However, reasoning and truth-maintenance become
expensive in the open world

120
Dealing with General Knowledge

The Web is an open world so the Semantic Web
needs to reason within an open world (perhaps
even across ontologies)
Doesnt mean that to solve some problems, SW
cannot temporarily assume a closed-world (within
an agreed ontology)

ekaw2004.pdf
121
Teaching Knowledge

Intelligent Tutoring Systems need to model both
the user and the domain to create a learning path
based on the students prior knowledge and goals,
and to monitor the students progress
AHSs developed partly by using hypertext systems
as domain representations for ITSs - basically,
when intelligent tutoring moved to the Web

122
Intelligent Tutoring Systems

Overview
Modern ITS development began in 1987, after a
review by Wenger
Wenger, E. (1987). Artificial Intelligence and
Tutoring Systems Computational and Cognitive
Approaches to the Communication of Knowledge. Los
Altos, CA Morgan Kaufmann Publishers, Inc.
This was the first attempt to examine the
implicit and explicit goals of ITS designers

123
Intelligent Tutoring Systems

Wenger described ITS as a part of "knowledge
communication" and his review focused on
cognitive and learning aspects as well as the AI
issues

124
Intelligent Tutoring Systems

"... consider again the example of books they
have certainly outperformed people in the
precision and permanence of their memory, and the
reliability of their patience. For this reason,
they have been invaluable to humankind. Now
imagine active books that can interact with the
reader to communicate knowledge at the
appropriate level, selectively highlighting the
interconnectedness and ramifications of items,
recalling relevant information, probing
understanding, explaining difficult areas in more
depth, skipping over seemingly known material ...
intelligent knowledge communication systems are
indeed an attractive dream." (p. 6).

125
Intelligent Tutoring Systems

Motivations underlying ITSs (and education in
general)
to teach about something (abstract)
to teach how to do something (practical)

126
Intelligent Tutoring Systems

How can learning be achieved?
By rote
By mimicry (observation)
By application

127
Intelligent Tutoring Systems

When student performs task correctly, assume
student understands concept and/or its
application
When student performs task incorrectly, how can
the tutor help?
Simply tell the student the correct answer
Tell student the correct answer and state why
it's correct
Explain to the student why his/her answer is
incorrect

128
Intelligent Tutoring Systems

Explanation-based correction is HARD!
Tutor must first understand why the student gave
the incorrect answer
Student lacks knowledge
Incorrect application of correct procedure
Misinterpretation of task
Misconception of principle

129
Intelligent Tutoring Systems

How to tutor?
Originally Computer-Aided Instruction (CAI) used
non-interactive "classroom" techniques.
All students were taught in the same manner
(e.g., through flash cards) and then assessed.
If a student failed, student had to work through
the same material again, to "learn it better"
Access to human tutor to address difficulties
This type of learning, although self-paced, is
ineffective

130
Intelligent Tutoring Systems

The goal of an ITS
A student learns from ITS by solving problems.
The ITS selects a problem and compares its
solution with that of the student
It performs a diagnosis based on the differences.
After giving feedback, system reassesses and
updates the student skills model and entire cycle
is repeated.

131
Intelligent Tutoring Systems

The goal of an ITS (continued)
As the system assesses what the student knows, it
also considers what the student needs to know,
which part of the curriculum is to be taught
next, and how to present the material.
It then selects the next problem/s.

132
Intelligent Tutoring Systems

Basic issues in
knowledge
communication

133
Intelligent Tutoring Systems

Domain Expertise
Rather than being represented by chunks of
information, the domain should be represented
using a model and a set of rules which allows the
system to "reason"
Typical domain model representations (make closed
world assumption!)
If - Then Rules
If - Then Rules with uncertainty measures
Semantic Networks
Frame based representations

134
Intelligent Tutoring Systems

Student Model
According to Wenger, student models have three
tasks. They must
Gather information about the student (implicitly
or explicitly)
Create a representation of the student's
knowledge and learning process (often as buggy
models)
Perform a diagnosis to determine what the student
knows and to determine how the student should be
taught and to identify misconceptions

135
Intelligent Tutoring Systems

Student model architectures
Overlay student models
Differential student models
Perturbation student models

136
Intelligent Tutoring Systems

Student model diagnosis
Performance measuring
Model tracing
Issue tracing
Expert systems

137
Intelligent Tutoring Systems

Pedagogical expertise
Used to decide how to
present/sequence information
answer questions/give explanations
provide help/guidance/remediation

138
Intelligent Tutoring Systems

According to Wenger, when "learning is viewed as
successive transitions between knowledge states,
the purpose of teaching is accordingly to
facilitate the student's traversal of the space
of knowledge states." (p. 365)
The ITS must model the student's current
knowledge and support the transition to a new
knowledge state.

139
Intelligent Tutoring Systems

ITSs must alternate between diagnostic and
didactic support.
Diagnostic support
Information about a student's state is inferred
on 3 levels
Behavioural - ignores learner's knowledge, and
concentrates on observed behaviour
Epistemic - attempts to infer learner's knowledge
state based on learner's behaviour
Individual - cognitive model of learner's state,
attitudes (to self, world, ITS), motivation

140
Intelligent Tutoring Systems

Didactic support
Concerned with the "delivery" aspect of teaching

141
Intelligent Tutoring Systems

Interface
The interface is the layer through which the
learner and ITS communicate
The design of an interface which enhances
learning is essential
Web-based ITSs tend to rely on the Web browser to
provide the interface
Hypermedia-based ITSs in general must provide
adaptive presentation and adaptive navigation
facilities, if they are to extend beyond
knowledge exploration environments

142
Intelligent Tutoring Systems

Write a Comment

User Comments (0)