Title: 1 of 142
1CSA4080Adaptive Hypertext Systems II
Topic 6 Information and Knowledge Representation
- Dr. Christopher Staff
- Department of Computer Science AI
- University of Malta
2Aims and Objectives
- Models of Information Retrieval
- Vector Space Model
- Probabilistic Model
- Relevance Feedback
- Query Reformulation
3Aims and Objectives
- Dealing with General Knowledge
- Programs that reason
- Conceptual Graphs
- Intelligent Tutoring Systems
4Background
- Weve talked about how user information can be
represented - We need to be able to represent information about
the domain so that we can reason about what the
users interests are, etc. - We covered the difference between data,
information, and knowledge in CSA3080...
5Background
- In 1945, Vannevar Bush writes As We May Think
- Gives rise to seeking intelligent solutions to
information retrieval, etc. - In 1949, Warren Weaver writes that if Chinese is
English codification, then machine translation
should be possible - Leads to surface-based/statistical techniques
6Background
- Even today, nearly 60 years later, there is
significant effort in both directions - For years, intelligent solutions were hampered by
the lack of fast enough hardware, software - Doesnt seem to be an issue any longer, and the
Semantic Web may be testimony to that - But there are sceptics
7Background
- Take IR as an example
- At the dumb end we have reasonable generic
systems, but at other end, systems are domain
specific, more expensive, but do they give
better results?
8Background
- At what point does it cease to be cost effective
to attempt more intelligent solutions to the IR
problem?
9Background
- Is Information Retrieval a misnomer?
- Consider your favourite Web-based IR system...
does it retrieve information? - Can you ask Find me information about all
flights between Malta and London? - And what would you get back?
- Can you ask Who was the first man on the moon?
10Background
- With many IR systems that we use, the
intelligence is firmly rooted in the user - We must learn how to construct our queries so
that we get the information we seek - We sift through relevant and non-relevant
documents in the results list - What we can hope for is that patterns can be
identified to make life easier for us - e.g.,
recommender systems
11Background
- Surface-based techniques tend to look for and
re-use patterns as heuristics, without attempting
to encode meaning - The Semantic Web, and other intelligent
approaches, try to encode meaning so that it can
be reasoned with and about - Cynics/sceptics/opponents believe that there is
more success to be had in giving users more
support, than to encode meaning into documents to
support automation
12However...
- We will cover both surface-based and some
knowledge-based approaches to supporting the user
in his or her task
13Information Retrieval
- We will discuss two IR models...
- Vector Space Model
- Probabilistic Model
- ... and surface-based techniques that can improve
their usability - Relevance Feedback
- Query Reformulation
- Question-Answering
14Knowledge
- Conceptual graphs support the encoding and
matching of concepts - Conceptual graphs are more intelligent and can
be used to overcome some problems like the
Vocabulary Problem
15Reasoning on the Web
- REWERSE (FP6 NoE) is an attempt to represent
meaning contained in documents and to reason with
and about it so that a single high-level user
request may be carried out even if it contains
several sub-tasks - E.g., Find me information about cheap flights
between Malta and London
16Vector-Space Model
- Recommended Reading
- p18-wong (Generalised Vector Space Model).pdf -
look at refs 1,2,3 for original work
17Vector-Space Model
- Documents are represented as m-dimensional
vectors or bags of words - m is the size of the vocabulary
- wk 1, indicates term is present in document
- wk 0, indicates term is absent
- dj lt1,0,0,1,...,0,0gt
18Vector-Space Model
19Vector-Space Model
- The query is then plotted into m-dimensional
space and the nearest neighbours are the most
relevant - However, the results set is usually presented as
a list ranked by similarity to the query
20Vector-Space Model
- Cosine Similarity Measure (from IR vector space
model.pdf)
21Vector-Space Model
- Calculating term weights
- Term weights may be binary, integers, or reals
- Binary values are thresholded, rather than simply
indicating presence or absence - Integers or reals will be measure of relative
significance of term in document - Usually, term weight is TFxIDF
22Vector-Space Model
- Steps in calculating term weights
- Remove stop words
- Stem remaining words
- Count term frequency (TF)
- Count number of documents containing term (DF)
- Invert it (log(C/DF)), where C is total number of
documents in collection
23Vector-Space Model
- Normalising weights for vector length
- Documents with longer vectors have a better
chance of being retrieved than short ones (simply
because there are a larger number of terms that
they will match in a query) - IR should treat all relevant documents as
important for retrieval purposes - Solution , where w is weight of term t
24Vector-Space Model
- Why does this work?
- Term discrimination
- Assumes that terms with high TF and low DF are
good discriminators of relevant documents - Because documents are ranked, documents do not
need to contain precisely the terms expressed in
the query - We cannot say anything (in VSM) about terms that
occur in relevant and non-relevant documents -
though we can in probabilistic IR
25Vector-Space Model
- Vector-Space Model is also used by Recommender
Systems to index user profiles and product, or
item, features - Apart from ranking documents, results lists can
be controlled (to list top n relevant documents),
and query can be automatically reformulated based
on relevance feedback
26Relevance Feedback
- When a user is shown a list of retrieved
documents, user can give relevance judgements - System can take original query and relevance
judgements and re-compute the query - Rocchio...
27Relevance Feedback
- Basic Assumptions
- Similar docs are near each other in vector space
- Starting from some initial query, the query can
be reformulated to reflect subjective relevance
judgements given by the user - By reformulating the query we can move the query
closer to more relevant docs and further away
from nonrelevant docs
28Relevance Feedback
- In VSM, reformulating query means re-weighting
terms in query - Not failsafe may move query towards nonrelevant
docs!
29Relevance Feedback
- The Ideal Query
- If we know the answer set rel, then the ideal
query is
30Relevance Feedback
- In reality, a typical interaction will be
- User formulates query and submits it
- IR system retrieves set of documents
- User selects R and N
-
- where 0 lt ??????? lt 1 (and vector magnitude
usually dropped...)
31Relevance Feedback
- What are the values of ?? ? and ??
- ??is typically given a value of 0.75, but this
can vary. Also, after a number of iterations, the
original weights of terms can be highly reduced - If ? and ? have equal weight, then relevant and
nonrelevant docs make equal contribution to
reformulated query - If ? 1, ? 0, then only relevant docs are used
in reformulated query - Usually, use ? 0.75, ? 0.25
32Relevance Feedback
- Example
- Q (5, 0, 3, 0, 1)
- R (2, 1, 2, 0, 0) N (1, 0, 0, 0, 2)
- ? 0.75, ? 0.50, ? 0.25
- Q 0.75Q 0.5R 0.25N
- 0.75(5, 0, 3, 0, 1)0.5(2, 1, 2, 0,
0)0.25(1,0, 0, 0, 2) - (4.5, 0.5, 3.25, 0, 0.25)
33Relevance Feedback
- How many docs to use in R and N?
- Use all docs selected by user
- Use all rel docs and highest ranking nonrel docs
- Usually, user selects only relevant docs...
- Should entire document vector be used?
- Really want to identify the significant terms...
- Use terms with high-frequency/weight
- Use terms in doc adjacent to terms from query
- Use only common terms in R (and N)
34Automatic Relevance Feedback
- Users tend not to select nonrelevant documents,
and rarely choose more than one relevant document
(http//www.dlib.org/dlib/november95/11croft.html)
- This makes it difficult to use relevance feedback
- Current research uses automatic relevance
feedback techniques...
35Automatic Relevance Feedback
- Two main approaches
- To improve precision
- To improve recall
36Automatic Relevance Feedback
- Reasons for low precision
- Documents contain query terms, but documents are
not about the concept or topic the user is
interested in - E.g., user wants documents in which a cat chases
a dog but the query ltcat, chase, doggt also
retrieves docs in which dogs chase cats - Term ambiguity
37Automatic Relevance Feedback
- Improving precision
- Want to promote relevant documents in the results
list - Assume that top-n (typically 20) documents are
relevant, and assume docs ranked 500-1000 are
nonrelevant - Choose co-occurring discriminatory terms
- Re-rank docs ranked 21-499 using (modified)
Rocchio method
p206-mitra.pdf
38Automatic Relevance Feedback
- Improving precision
- Does improve precision by 6-13 at p-21 to p-100
- But remember that precision is to do with the
ratio of relevant to nonrelevant documents
retrieved - There may be many relevant documents that were
never retrieved (i.e., low recall)
39Automatic Relevance Feedback
- Reasons for low recall
- Concept or topic that user is interested in
can be described using terms additional to those
express by user in query - E.g., think of all the different ways in which
you can express car, including manufacturers
names (e.g., Ford, Vauxhall, etc.) - There is only a small probability that user and
author use the same term to describe the same
concept
40Automatic Relevance Feedback
- Reasons for low recall
- Imprudent query term expansion improves
recall, simply because more documents are
retrieved, but hurts precision!
41Automatic Relevance Feedback
- Improving recall
- Manually or automatically generated thesaurus
used to expand query terms before query is
submitted - Were currently working on other techniques to
pick synonyms that are likely to be relevant - Semantic Web attempts to encode semantic meaning
into documents
p61-voorhees.pdf, qiu94improving.pdf,
MandalaSigir99EvComboWordNet.pdf
42Indexing Documents
- Obviously, comparing a query vector to each
document vector to determine the similarity is
expensive - So how can we do it efficiently, especially for
gigantic document collections, like the Web?
43Indexing Documents
- Inverted indices
- An inverted index is a list of terms in the
vocabulary together with a postings list for each
term - A postings list is a list of documents containing
the term
44Indexing Documents
- Inverted index
- Several pieces of information can be stored in
the postings list - term weight
- location of the term in the document (to support
proximity operators)
45Indexing Documents
- Results set is obtained using set operators
- Once documents in results set are known, their
vectors can be retrieved to perform ranking
operations on them - The document vectors also allow automatic query
reformulation to occur following relevance
feedback - See brin.pdf and p2-arasu.pdf
46Probabilistic IR
- VSM assumes that a document that contains some
term x is about that term - PIR compares the probability of seeing term x in
a relevant document as opposed to a nonrelevant
document - Binary Independence Retrieval Model proposed by
Robertson Sparck Jones, 1976
robertson97simple.pdf, SparckJones98.pdf
47BIR
- BIR Fundamentals
- Given a user query there is a set of documents
which contains exactly the relevant documents and
no other - the ideal answer set
- Given the ideal answer set, a query can be
constructed that retrieves exactly this set - Assumes that relevant documents are clustered,
and that terms used adequately discriminate
against non-relevant documents
48BIR
- We do not know what are, in general, the
properties of the ideal answer set - All we know is that documents have terms which
capture semantic meaning - When user submits a query, guess what might be
the ideal answer set - Allow user to interact, to describe the
probabilistic description of the ideal answer set
(by marking docs as relevant/non-relevant)
49BIR
- Probabilistic Principle Assumption
- Given a user query q and a document dj in the
collection - Estimate the probability that the user will find
dj relevant to q - Rank documents in order of their probability of
relevance to the query (Probability Ranking
Principle)
50BIR
- Model assumes that probability of relevance
depends on q and doc representations only - Assumes that there is an ideal answer set!
- Assumes that terms are distributed differently in
relevant and non-relevant documents
51BIR
- Whether or not a document x is retrieved depends
on - Pr(relx) the probability that x is relevant
- Pr(nonrelx) ... that x isnt relevant
52BIR
- Document Ranking Function document x will be
retrieved if - where a2 is the cost of not retrieving a
relevant document, and a1 is the cost of
retrieving a non-relevant document - If we knew Pr(relx) (or Pr(nonrelx)), solution
would be trivial, but...
53BIR
- Use Bayes Theorem to rewrite Pr(relx)
- Pr(x) probability of observing x
- P(rel) a priori probability of relevance (ie,
probability of observing a set of relevant
documents) - probability that x is in the given set of
relevant docs
54BIR
55BIR
- The document ranking function can be rewritten
as - and simplified as
- Pr(x rel) and Pr(x nonrel) are still
unknown, so we will replace them in terms of
keywords in the document!
56BIR
- We assume that terms occur independently in
relevant and non- - relevant docs...
- probability that term xi is present in a
document randomly selected from the ideal answer
set - probability that term xi is present in
a document randomly selected from outside the
ideal answer set
57BIR
- Considering document , where di is the
weight of term i, - where is the probability that a relevant
document contains term xi (similarly for )
58BIR
- When di 0 we want the contribution of term i to
g(x) to be 0
59BIR
- The term relevance weight of term xi is
- Weight of term i in document j is
60BIR
- Estimation of term occurrence probability
- Given a query, a document collection can be
partitioned into a relevant and non-relevant set - The importance of a term j is its discriminatory
power in distinguishing between relevant and
nonrelevant documents
61BIR
- With complete
- information about
- the relevant non-
- relevant document
- sets we can estimate
- pj and qj
-
- Approximation
62BIR
- Term Occurrence Probability Without Relevance
Information - What do we do because we dont know rj?
- since most docs are nonrelevant
- pj 0.5 (arbitrary)
- does this remind you of anything?
63BIR
- Reminder... Ranking Function
- where,
- pi Pr(xidirel)
- qi Pr(xidinonrel)
- and di is the weight of term i
64Relevance Feedback in BIR
- Want to add more terms to the query so the query
will resemble documents marked as relevant (note
difference from VSM) - How do we select which terms to add to the query?
65Relevance Feedback in BIR
- Rank terms in marked documents and add the first
m terms - where
- N no. of docs in the collection
- ni document frequency of term i
- R no. of relevant docs selected
- ri no. of docs in R containing term i
- Compares frequency of occurrence of term in R
with document frequency
66Question-Answering on the Web
- Two aspects to IR
- Coverage (find all relevant documents)
- Question-Answering (find the answer to specific
query) - In QA we want one answer to our question
- How much NLP do we need to use to answer
fact-based questions - Answers that require reasoning are much harder!
67Question Answering
- Most IR tasks assume that user can predict what
terms a relevant document will contain - But sometimes what we want is the answer to a
direct question - Who was the first man on the moon?
- Do we really want a list of millions of documents
that contain first, man, moon? - And do we really want to have to read them to
find the answer?
68Question Answering
- All we want is one document, or one statement,
that contains the answer - Can we take advantage of IR on the Web to do
this? - Taking advantage of redundancy on the Web
- E.g., Mulder, Dumais
69Mulder
- Uses Web as collection of answers to factual
questions - Who was the first man on the moon?
- What is the capital of Italy?
- Where is the Taj Mahal?
kwok01scaling.pdf
70Mulder
- Three parts to a QA system
- Retrieval Engine
- Indexes documents in a collection and retrieves
them - Query Formulator
- Converts NL question into formal query
- Answer Extractor
- Locates answer in text
71Mulder
- Six parts to Mulder
- Question Parsing
- Question Classification
- Query Formulation
- Search Engine
- Answer Extraction
- Answer Selection
72Dumais et al
- Takes advantage of multiple, differently phrased,
answer occurrences on Web - Doesnt need to find all answer phrases
- Just the ones that match the query pattern
- Rules for converting questions, finding answers
are mostly handwritten
p291-dumais
73Dumais et al
- Steps
- Rewrite question into weighted query patterns
- Use POS tagger lexicon to seek alternative word
forms - Search
- Mine N-grams in summaries
- Filter and re-weight N-grams
- Tile N-grams to yield longer answers
74Azzopardi
- Joel Azzopardi, 2004, Template-Based Fact
Finding on the Web FYP report, CSAI - Can find factoids about a series of queries
relating to a particular topic using majority
polling (voting) to decide amongst competing
answers - Series of topic sensitive query patterns stored
in template
75Azzopardi
- Template is learned by comparing a sample of
documents about a topic - Commonly occurring phrases (trigrams) extracted
and turned into partial query in template,
together with answer type
76Azzopardi
- When user wants information regarding a topic,
use appropriate template together with subject
(e.g., persons name) - Subject is appended to partial queries in
template - queries are submitted to Google - Top-n documents retrieved and processed to
identify candidate answers - Uses voting to decide on most frequently
occurring answer
77Summary
- Weve discussed a couple of popular models of IR
that are more intelligent that plain old
Extended Boolean Information Retrieval - They still treat terms as atoms that are
representative of the semantic meaning of the
document
78Summary
- But word order generally insignificant (bag of
words) - Cannot distinguish between dog chased cat and
cat chased dog - unless phrase matching also used, but then cannot
tell that cat chased dog and dog was chased by
cat are semantically equivalent - What about information extraction?
- George W. Bush President of the United States
of America
79Summary
- More intelligent approaches have been used
- And more intelligence is being put into the
Web - Personalisation and user-adaptivity also require
high accuracy in determining which documents are
relevant to a user
80Summary
- Sowas conceptual graphs and McCarthys
Generality in AI/Notes on Contextual Reasoning
are seminal works that underpin much that is
happening in the Semantic Web - CGs represent semantic content of utterances in
interchangeable format (KIF) - McCarthy claims that it is hard to make correct
inferences in the absence of contextual
information
81Summary
- Because of the expense of CGs, they are still
very much domain specific - SemWeb hopes that by bringing massive numbers of
people together there will be a proliferation of
ontologies to make it happen - Guha did his PhD Contexts A Formalisation and
Some Applications at Stanford, under John
McCarthy. His work on Cyc underpins RDF, DAMLOIL
82Dealing with General Knowledge
- Why did Mary hit the piggy bank with a hammer?
83Dealing with General Knowledge
- Do computer systems need general knowledge?
- How do computer systems represent general
knowledge?
84Dealing with General Knowledge
- Do we need general knowledge?
- How do we represent general knowledge?
85Dealing with General Knowledge
- As usual, has its roots in philosophy
(epistemology) - Early (i.e., Greek) revolved around Absolute and
Universal Ideas and Forms (Plato) - Aristotle Logic for representing and reasoning
about knowledge
http//pespmc1.vub.ac.be/EPISTEMI.html
86Dealing with General Knowledge
- Following Renaissance, two main schools of
thought - Empiricists
- Knowledge as product of sensory perception
- Rationalists
- Product of rational reflection
87Dealing with General Knowledge
- Kantian Synthesis of empiricism and reflectionism
- Knowledge results from the organization of
perceptual data on the basis of inborn cognitive
structures, called "categories". - Categories include space, time, objects and
causality. - (viz. Chomskys Universal Grammar)
88Dealing with General Knowledge
- Pragmatism
- Knowledge consists of models that attempt to
represent the environment to simplify
problem-solving - Assumption Models are rich. No model can ever
hope to capture all relevant information, and
even if such a complete model would exist, it
would be too complicated to use in any practical
way.
89Dealing with General Knowledge
- Pragmatism (contd.)
- The model which is to be chosen depends on the
problems that are to be solved (context). - But see also discussions on pragmatic vs.
cognitive contexts! (Topic 3) - Basic criterion model should produce correct (or
approximate) (testable) predictions or
problem-solutions, and be as simple as possible. - This is the approach mainly used in CS/AI today
90Dealing with General Knowledge
- The first theories of knowledge stressed its
absolute, permanent character, whereas the later
theories put the emphasis on its relativity or
situation-dependence, its continuous development
or evolution, and its active interference with
the world and its subjects and objects. The whole
trend moves from a static, passive view of
knowledge towards a more and more adaptive and
active one.
http//pespmc1.vub.ac.be/EPISTEMI.html
91Dealing with General Knowledge
- Well look at four overviews of and approaches to
knowledge in computer systems - McCarthy (1959, mcc.pdf)
- Sowa (1979, p79-1010.pdf)
- McCarthy (1987, p1030-mccarthy.pdf)
- Brézillon Pomerol (2001, is-context-a-kind.pdf)
92Dealing with General Knowledge
- McCarthy, J. 1959. Programs with Common Sense
- a program has common sense if it automatically
deduces for itself a sufficiently wide class of
immediate consequences of anything it is told and
what it already knows.
93Dealing with General Knowledge
- Objective to make programs that learn from
their experience as effectively as humans do - To learn to improve how to learn
- And to do it in logic using a logical
representation
94Dealing with General Knowledge
- Minimum features required of a machine that can
evolve intelligence approaching that of humans - Representation of all behaviours
- Interesting changes in behaviour must be
expressible - All aspects of behaviour must be improvable
- Must have notion of partial success
- System must be able to create/learn subroutines
95Dealing with General Knowledge
- Bar-Hillels biggest complaint (in my opinion) is
- A deductive argument, where you have first to
find out what are the relevant premises, is
something which many humans are not always able
to carry out successfully. I do not see the
slightest reason to believe that at present
machines should be able to perform things that
humans find trouble in doing - Well return to this in Closed vs. Open World
Assumption
96Dealing with General Knowledge
- Sowa, J. 1979. Semantics of Conceptual Graphs
- Logic used by McCarthy as representation of
statements about the world as well as theorem
prover to infer/deduce new knowledge
(assumptions) about the world - Sowa uses CG as a language for representing
knowledge and patterns for constructing models
97Dealing with General Knowledge
- Sowa proposes CGs as better alternative to
semantic networks and predicate calculus - SemNets have no well-defined semantics
- PC is adequate for describing mathematical
theories with a closed set of axioms... But the
real world is messy, incompletely explored, and
full of unexpected surprises
98Dealing with General Knowledge
- CGs serve two purposes
- They can be used as canonical representations of
meaning in Natural Language - They can be used to construct abstract structures
that serve as models in the model-theoretic sense
(e.g., microtheories)
99Dealing with General Knowledge
- To understand a sentence
- Convert utterance to CG
- Join CG to graphs that help resolve ambiguities
and incorporate background information - Resulting graph is nucleus for constructing
models (of worlds) in which utterance is true - Laws of world block illegal extensions
- If model could be extended infinitely, result
would be complete standard model
100Dealing with General Knowledge
- Mary hit the piggy bank with a hammer
101Dealing with General Knowledge
- Linearizing the conceptual graph
- PERSONMary-gt(AGNT)-gtHITc1lt-(INST) lt-
HAMMER - HITc1lt-(PTNT)lt-PIGGY-BANKi22103
102Dealing with General Knowledge
- Context-sensitive logical operators
- Allow building models of possible worlds and
checking their consistency - Def A sequent is a collection of conceptual
graphs divided into two sets, called the
conditions u1,..., un and the assertions v1,...,
vm. It is written u1,..., un -gtv1,..., vm.
103Dealing with General Knowledge
- Cases of sequents
- simple assertion no conditions, one assertion
(-gtv) - disjunction no conditions, one or more
assertions - (-gtv1,..., vm)
- simple denial one condition, no assertions (u-gt)
- compound denial 2 or more conditions, no
assertions (u1,..., un-gt) - conditional assertion u1,..., un -gtv1,..., vm
- empty clause -gt
- Horn clause anything with at most one assertion
(inc. 0)
104Dealing with General Knowledge
- McCarthy, J. 1987. Generality in Artificial
Intelligence (1971 Turing Award Lecture) - no one knows how to make a general database of
commonsense knowledge that could be used by any
program that needed the knowledge - Examples robots moving things around, what we
know about families, buying and selling...
105Dealing with General Knowledge
- In my opinion, getting a language my italics
for expressing general commonsense knowledge for
inclusion in a general database is the key
problem of generality in AI.
106Dealing with General Knowledge
- How can we write programs that can learn to
modify their own behaviour, including improving
the way they learn? - Friedberg (A Learning Machine, c. 1958)
- Newell, Simon, Shaw (General Problem Solver, c.
1957-1969) - Newell, Simon (Production Machines, 1950-1972)
- McCarthy (Logical Representation, c. 1958)
- McCarthy (Formalising Context, 1987)
107Dealing with General Knowledge
- A Learning Machine
- Learns by making random modifications to a
program - Discard flawed programs
- Learnt to move a bit from one memory cell to
another - In 1987, was demonstrated to be inferior to
simply re-writing the entire program
108Dealing with General Knowledge
- General Problem Solver
- Represent problems of some class as problems of
transforming one expression into another using a
set of allowed rules - First system to separate problem structure from
the domain - McCarthy claims problem in representing
commonsense knowledge as transformations
109Dealing with General Knowledge
- Production (Expert) Systems
- Represent knowledge as facts and rules
- Facts contain no variables or quantifiers
- New facts are produced by inference, observation
and user input - Rules are usually coded by programmer/expert
- Rules are usually not learnt or generated by
system (but see data mining)
110Dealing with General Knowledge
- Logical Representation
- Representing information declaratively
- Although Prolog can represent facts in logical
representation and reason using logic, it cannot
do universal generalization, and so cannot modify
its own behaviour enough - So McCarthy built Lisp...
111Dealing with General Knowledge
- Logical Representation
- McCarthys dream is that commonsense knowledge
possessed by humans could be written as logical
sentences and stored in a db - Facts about the effects of actions is essential
(when we hear the squeal of types we expect a
bang...) - Necessary to say that an action changes only
features of the situation to which it refers
112Dealing with General Knowledge
- Context
- We understand under-qualified utterances because
we understand them in context - The book is on the table
- Where is the book?
113Dealing with General Knowledge
- Context
- Can you fetch me the book, please?
- Up until the last utterance, the physical
location of the book was not significant, and we
were able to have a short dialogue about it - Fully qualified utterances are too unwieldy to
use in conversation - Occasionally gives rise to misunderstandings...
114Dealing with General Knowledge
- Context
- The book is on the table is valid for a large
number of different contexts, in which the
specific book and the specific table, and perhaps
even the location of the specific table can be
significant and can also change over time - Utterances are understood in context
115Dealing with General Knowledge
- Is Context a ... collective Tacit Knowledge?
- How does data become knowledge?
116Dealing with General Knowledge
- Is Context a ... collective Tacit Knowledge?
- Context is the collection of relevant conditions
and surrounding influences that make a situation
unique and comprehensible
117Dealing with General Knowledge
118Dealing with General Knowledge
- Closed world vs. Open World assumption
- Closed World
- I assume that anything I dont know the truth of
is false I know everything that is true - Open World
- I assume that anything I dont know the truth of
is unknown Some things I dont know may be true
I dont know everything
119Dealing with General Knowledge
- Prolog, for instance, will return false about
any fact that is missing from its database, or
for which it cannot derive a truth-value - A three-valued logic permits assertions to be
true, false, or unknown - However, reasoning and truth-maintenance become
expensive in the open world
120Dealing with General Knowledge
- The Web is an open world so the Semantic Web
needs to reason within an open world (perhaps
even across ontologies) - Doesnt mean that to solve some problems, SW
cannot temporarily assume a closed-world (within
an agreed ontology)
ekaw2004.pdf
121Teaching Knowledge
- Intelligent Tutoring Systems need to model both
the user and the domain to create a learning path
based on the students prior knowledge and goals,
and to monitor the students progress - AHSs developed partly by using hypertext systems
as domain representations for ITSs - basically,
when intelligent tutoring moved to the Web
122Intelligent Tutoring Systems
- Overview
- Modern ITS development began in 1987, after a
review by Wenger - Wenger, E. (1987). Artificial Intelligence and
Tutoring Systems Computational and Cognitive
Approaches to the Communication of Knowledge. Los
Altos, CA Morgan Kaufmann Publishers, Inc. - This was the first attempt to examine the
implicit and explicit goals of ITS designers
123Intelligent Tutoring Systems
- Wenger described ITS as a part of "knowledge
communication" and his review focused on
cognitive and learning aspects as well as the AI
issues
124Intelligent Tutoring Systems
- "... consider again the example of books they
have certainly outperformed people in the
precision and permanence of their memory, and the
reliability of their patience. For this reason,
they have been invaluable to humankind. Now
imagine active books that can interact with the
reader to communicate knowledge at the
appropriate level, selectively highlighting the
interconnectedness and ramifications of items,
recalling relevant information, probing
understanding, explaining difficult areas in more
depth, skipping over seemingly known material ...
intelligent knowledge communication systems are
indeed an attractive dream." (p. 6).
125Intelligent Tutoring Systems
- Motivations underlying ITSs (and education in
general) - to teach about something (abstract)
- to teach how to do something (practical)
126Intelligent Tutoring Systems
- How can learning be achieved?
- By rote
- By mimicry (observation)
- By application
127Intelligent Tutoring Systems
- When student performs task correctly, assume
student understands concept and/or its
application - When student performs task incorrectly, how can
the tutor help? - Simply tell the student the correct answer
- Tell student the correct answer and state why
it's correct - Explain to the student why his/her answer is
incorrect
128Intelligent Tutoring Systems
- Explanation-based correction is HARD!
- Tutor must first understand why the student gave
the incorrect answer - Student lacks knowledge
- Incorrect application of correct procedure
- Misinterpretation of task
- Misconception of principle
129Intelligent Tutoring Systems
- How to tutor?
- Originally Computer-Aided Instruction (CAI) used
non-interactive "classroom" techniques. - All students were taught in the same manner
(e.g., through flash cards) and then assessed. - If a student failed, student had to work through
the same material again, to "learn it better" - Access to human tutor to address difficulties
- This type of learning, although self-paced, is
ineffective
130Intelligent Tutoring Systems
- The goal of an ITS
- A student learns from ITS by solving problems.
- The ITS selects a problem and compares its
solution with that of the student - It performs a diagnosis based on the differences.
- After giving feedback, system reassesses and
updates the student skills model and entire cycle
is repeated.
131Intelligent Tutoring Systems
- The goal of an ITS (continued)
- As the system assesses what the student knows, it
also considers what the student needs to know,
which part of the curriculum is to be taught
next, and how to present the material. - It then selects the next problem/s.
132Intelligent Tutoring Systems
- Basic issues in
- knowledge
- communication
133Intelligent Tutoring Systems
- Domain Expertise
- Rather than being represented by chunks of
information, the domain should be represented
using a model and a set of rules which allows the
system to "reason" - Typical domain model representations (make closed
world assumption!) - If - Then Rules
- If - Then Rules with uncertainty measures
- Semantic Networks
- Frame based representations
134Intelligent Tutoring Systems
- Student Model
- According to Wenger, student models have three
tasks. They must - Gather information about the student (implicitly
or explicitly) - Create a representation of the student's
knowledge and learning process (often as buggy
models) - Perform a diagnosis to determine what the student
knows and to determine how the student should be
taught and to identify misconceptions
135Intelligent Tutoring Systems
- Student model architectures
- Overlay student models
- Differential student models
- Perturbation student models
136Intelligent Tutoring Systems
- Student model diagnosis
- Performance measuring
- Model tracing
- Issue tracing
- Expert systems
137Intelligent Tutoring Systems
- Pedagogical expertise
- Used to decide how to
- present/sequence information
- answer questions/give explanations
- provide help/guidance/remediation
138Intelligent Tutoring Systems
- According to Wenger, when "learning is viewed as
successive transitions between knowledge states,
the purpose of teaching is accordingly to
facilitate the student's traversal of the space
of knowledge states." (p. 365) - The ITS must model the student's current
knowledge and support the transition to a new
knowledge state.
139Intelligent Tutoring Systems
- ITSs must alternate between diagnostic and
didactic support. - Diagnostic support
- Information about a student's state is inferred
on 3 levels - Behavioural - ignores learner's knowledge, and
concentrates on observed behaviour - Epistemic - attempts to infer learner's knowledge
state based on learner's behaviour - Individual - cognitive model of learner's state,
attitudes (to self, world, ITS), motivation
140Intelligent Tutoring Systems
- Didactic support
- Concerned with the "delivery" aspect of teaching
141Intelligent Tutoring Systems
- Interface
- The interface is the layer through which the
learner and ITS communicate - The design of an interface which enhances
learning is essential - Web-based ITSs tend to rely on the Web browser to
provide the interface - Hypermedia-based ITSs in general must provide
adaptive presentation and adaptive navigation
facilities, if they are to extend beyond
knowledge exploration environments
142Intelligent Tutoring Systems