1 of 142 - PowerPoint PPT Presentation

About This Presentation
Title:

1 of 142

Description:

... 'Find me information about all flights between Malta and London' ... E.g., 'Find me information about cheap flights between Malta and London' University of Malta ... – PowerPoint PPT presentation

Number of Views:176
Avg rating:3.0/5.0
Slides: 143
Provided by: chr193
Category:
Tags:

less

Transcript and Presenter's Notes

Title: 1 of 142


1
CSA4080Adaptive Hypertext Systems II
Topic 6 Information and Knowledge Representation
  • Dr. Christopher Staff
  • Department of Computer Science AI
  • University of Malta

2
Aims and Objectives
  • Models of Information Retrieval
  • Vector Space Model
  • Probabilistic Model
  • Relevance Feedback
  • Query Reformulation

3
Aims and Objectives
  • Dealing with General Knowledge
  • Programs that reason
  • Conceptual Graphs
  • Intelligent Tutoring Systems

4
Background
  • Weve talked about how user information can be
    represented
  • We need to be able to represent information about
    the domain so that we can reason about what the
    users interests are, etc.
  • We covered the difference between data,
    information, and knowledge in CSA3080...

5
Background
  • In 1945, Vannevar Bush writes As We May Think
  • Gives rise to seeking intelligent solutions to
    information retrieval, etc.
  • In 1949, Warren Weaver writes that if Chinese is
    English codification, then machine translation
    should be possible
  • Leads to surface-based/statistical techniques

6
Background
  • Even today, nearly 60 years later, there is
    significant effort in both directions
  • For years, intelligent solutions were hampered by
    the lack of fast enough hardware, software
  • Doesnt seem to be an issue any longer, and the
    Semantic Web may be testimony to that
  • But there are sceptics

7
Background
  • Take IR as an example
  • At the dumb end we have reasonable generic
    systems, but at other end, systems are domain
    specific, more expensive, but do they give
    better results?

8
Background
  • At what point does it cease to be cost effective
    to attempt more intelligent solutions to the IR
    problem?

9
Background
  • Is Information Retrieval a misnomer?
  • Consider your favourite Web-based IR system...
    does it retrieve information?
  • Can you ask Find me information about all
    flights between Malta and London?
  • And what would you get back?
  • Can you ask Who was the first man on the moon?

10
Background
  • With many IR systems that we use, the
    intelligence is firmly rooted in the user
  • We must learn how to construct our queries so
    that we get the information we seek
  • We sift through relevant and non-relevant
    documents in the results list
  • What we can hope for is that patterns can be
    identified to make life easier for us - e.g.,
    recommender systems

11
Background
  • Surface-based techniques tend to look for and
    re-use patterns as heuristics, without attempting
    to encode meaning
  • The Semantic Web, and other intelligent
    approaches, try to encode meaning so that it can
    be reasoned with and about
  • Cynics/sceptics/opponents believe that there is
    more success to be had in giving users more
    support, than to encode meaning into documents to
    support automation

12
However...
  • We will cover both surface-based and some
    knowledge-based approaches to supporting the user
    in his or her task

13
Information Retrieval
  • We will discuss two IR models...
  • Vector Space Model
  • Probabilistic Model
  • ... and surface-based techniques that can improve
    their usability
  • Relevance Feedback
  • Query Reformulation
  • Question-Answering

14
Knowledge
  • Conceptual graphs support the encoding and
    matching of concepts
  • Conceptual graphs are more intelligent and can
    be used to overcome some problems like the
    Vocabulary Problem

15
Reasoning on the Web
  • REWERSE (FP6 NoE) is an attempt to represent
    meaning contained in documents and to reason with
    and about it so that a single high-level user
    request may be carried out even if it contains
    several sub-tasks
  • E.g., Find me information about cheap flights
    between Malta and London

16
Vector-Space Model
  • Recommended Reading
  • p18-wong (Generalised Vector Space Model).pdf -
    look at refs 1,2,3 for original work

17
Vector-Space Model
  • Documents are represented as m-dimensional
    vectors or bags of words
  • m is the size of the vocabulary
  • wk 1, indicates term is present in document
  • wk 0, indicates term is absent
  • dj lt1,0,0,1,...,0,0gt

18
Vector-Space Model
19
Vector-Space Model
  • The query is then plotted into m-dimensional
    space and the nearest neighbours are the most
    relevant
  • However, the results set is usually presented as
    a list ranked by similarity to the query

20
Vector-Space Model
  • Cosine Similarity Measure (from IR vector space
    model.pdf)

21
Vector-Space Model
  • Calculating term weights
  • Term weights may be binary, integers, or reals
  • Binary values are thresholded, rather than simply
    indicating presence or absence
  • Integers or reals will be measure of relative
    significance of term in document
  • Usually, term weight is TFxIDF

22
Vector-Space Model
  • Steps in calculating term weights
  • Remove stop words
  • Stem remaining words
  • Count term frequency (TF)
  • Count number of documents containing term (DF)
  • Invert it (log(C/DF)), where C is total number of
    documents in collection

23
Vector-Space Model
  • Normalising weights for vector length
  • Documents with longer vectors have a better
    chance of being retrieved than short ones (simply
    because there are a larger number of terms that
    they will match in a query)
  • IR should treat all relevant documents as
    important for retrieval purposes
  • Solution , where w is weight of term t

24
Vector-Space Model
  • Why does this work?
  • Term discrimination
  • Assumes that terms with high TF and low DF are
    good discriminators of relevant documents
  • Because documents are ranked, documents do not
    need to contain precisely the terms expressed in
    the query
  • We cannot say anything (in VSM) about terms that
    occur in relevant and non-relevant documents -
    though we can in probabilistic IR

25
Vector-Space Model
  • Vector-Space Model is also used by Recommender
    Systems to index user profiles and product, or
    item, features
  • Apart from ranking documents, results lists can
    be controlled (to list top n relevant documents),
    and query can be automatically reformulated based
    on relevance feedback

26
Relevance Feedback
  • When a user is shown a list of retrieved
    documents, user can give relevance judgements
  • System can take original query and relevance
    judgements and re-compute the query
  • Rocchio...

27
Relevance Feedback
  • Basic Assumptions
  • Similar docs are near each other in vector space
  • Starting from some initial query, the query can
    be reformulated to reflect subjective relevance
    judgements given by the user
  • By reformulating the query we can move the query
    closer to more relevant docs and further away
    from nonrelevant docs

28
Relevance Feedback
  • In VSM, reformulating query means re-weighting
    terms in query
  • Not failsafe may move query towards nonrelevant
    docs!

29
Relevance Feedback
  • The Ideal Query
  • If we know the answer set rel, then the ideal
    query is

30
Relevance Feedback
  • In reality, a typical interaction will be
  • User formulates query and submits it
  • IR system retrieves set of documents
  • User selects R and N
  • where 0 lt ??????? lt 1 (and vector magnitude
    usually dropped...)

31
Relevance Feedback
  • What are the values of ?? ? and ??
  • ??is typically given a value of 0.75, but this
    can vary. Also, after a number of iterations, the
    original weights of terms can be highly reduced
  • If ? and ? have equal weight, then relevant and
    nonrelevant docs make equal contribution to
    reformulated query
  • If ? 1, ? 0, then only relevant docs are used
    in reformulated query
  • Usually, use ? 0.75, ? 0.25

32
Relevance Feedback
  • Example
  • Q (5, 0, 3, 0, 1)
  • R (2, 1, 2, 0, 0) N (1, 0, 0, 0, 2)
  • ? 0.75, ? 0.50, ? 0.25
  • Q 0.75Q 0.5R 0.25N
  • 0.75(5, 0, 3, 0, 1)0.5(2, 1, 2, 0,
    0)0.25(1,0, 0, 0, 2)
  • (4.5, 0.5, 3.25, 0, 0.25)

33
Relevance Feedback
  • How many docs to use in R and N?
  • Use all docs selected by user
  • Use all rel docs and highest ranking nonrel docs
  • Usually, user selects only relevant docs...
  • Should entire document vector be used?
  • Really want to identify the significant terms...
  • Use terms with high-frequency/weight
  • Use terms in doc adjacent to terms from query
  • Use only common terms in R (and N)

34
Automatic Relevance Feedback
  • Users tend not to select nonrelevant documents,
    and rarely choose more than one relevant document
    (http//www.dlib.org/dlib/november95/11croft.html)
  • This makes it difficult to use relevance feedback
  • Current research uses automatic relevance
    feedback techniques...

35
Automatic Relevance Feedback
  • Two main approaches
  • To improve precision
  • To improve recall

36
Automatic Relevance Feedback
  • Reasons for low precision
  • Documents contain query terms, but documents are
    not about the concept or topic the user is
    interested in
  • E.g., user wants documents in which a cat chases
    a dog but the query ltcat, chase, doggt also
    retrieves docs in which dogs chase cats
  • Term ambiguity

37
Automatic Relevance Feedback
  • Improving precision
  • Want to promote relevant documents in the results
    list
  • Assume that top-n (typically 20) documents are
    relevant, and assume docs ranked 500-1000 are
    nonrelevant
  • Choose co-occurring discriminatory terms
  • Re-rank docs ranked 21-499 using (modified)
    Rocchio method

p206-mitra.pdf
38
Automatic Relevance Feedback
  • Improving precision
  • Does improve precision by 6-13 at p-21 to p-100
  • But remember that precision is to do with the
    ratio of relevant to nonrelevant documents
    retrieved
  • There may be many relevant documents that were
    never retrieved (i.e., low recall)

39
Automatic Relevance Feedback
  • Reasons for low recall
  • Concept or topic that user is interested in
    can be described using terms additional to those
    express by user in query
  • E.g., think of all the different ways in which
    you can express car, including manufacturers
    names (e.g., Ford, Vauxhall, etc.)
  • There is only a small probability that user and
    author use the same term to describe the same
    concept

40
Automatic Relevance Feedback
  • Reasons for low recall
  • Imprudent query term expansion improves
    recall, simply because more documents are
    retrieved, but hurts precision!

41
Automatic Relevance Feedback
  • Improving recall
  • Manually or automatically generated thesaurus
    used to expand query terms before query is
    submitted
  • Were currently working on other techniques to
    pick synonyms that are likely to be relevant
  • Semantic Web attempts to encode semantic meaning
    into documents

p61-voorhees.pdf, qiu94improving.pdf,
MandalaSigir99EvComboWordNet.pdf
42
Indexing Documents
  • Obviously, comparing a query vector to each
    document vector to determine the similarity is
    expensive
  • So how can we do it efficiently, especially for
    gigantic document collections, like the Web?

43
Indexing Documents
  • Inverted indices
  • An inverted index is a list of terms in the
    vocabulary together with a postings list for each
    term
  • A postings list is a list of documents containing
    the term

44
Indexing Documents
  • Inverted index
  • Several pieces of information can be stored in
    the postings list
  • term weight
  • location of the term in the document (to support
    proximity operators)

45
Indexing Documents
  • Results set is obtained using set operators
  • Once documents in results set are known, their
    vectors can be retrieved to perform ranking
    operations on them
  • The document vectors also allow automatic query
    reformulation to occur following relevance
    feedback
  • See brin.pdf and p2-arasu.pdf

46
Probabilistic IR
  • VSM assumes that a document that contains some
    term x is about that term
  • PIR compares the probability of seeing term x in
    a relevant document as opposed to a nonrelevant
    document
  • Binary Independence Retrieval Model proposed by
    Robertson Sparck Jones, 1976

robertson97simple.pdf, SparckJones98.pdf
47
BIR
  • BIR Fundamentals
  • Given a user query there is a set of documents
    which contains exactly the relevant documents and
    no other
  • the ideal answer set
  • Given the ideal answer set, a query can be
    constructed that retrieves exactly this set
  • Assumes that relevant documents are clustered,
    and that terms used adequately discriminate
    against non-relevant documents

48
BIR
  • We do not know what are, in general, the
    properties of the ideal answer set
  • All we know is that documents have terms which
    capture semantic meaning
  • When user submits a query, guess what might be
    the ideal answer set
  • Allow user to interact, to describe the
    probabilistic description of the ideal answer set
    (by marking docs as relevant/non-relevant)

49
BIR
  • Probabilistic Principle Assumption
  • Given a user query q and a document dj in the
    collection
  • Estimate the probability that the user will find
    dj relevant to q
  • Rank documents in order of their probability of
    relevance to the query (Probability Ranking
    Principle)

50
BIR
  • Model assumes that probability of relevance
    depends on q and doc representations only
  • Assumes that there is an ideal answer set!
  • Assumes that terms are distributed differently in
    relevant and non-relevant documents

51
BIR
  • Whether or not a document x is retrieved depends
    on
  • Pr(relx) the probability that x is relevant
  • Pr(nonrelx) ... that x isnt relevant

52
BIR
  • Document Ranking Function document x will be
    retrieved if
  • where a2 is the cost of not retrieving a
    relevant document, and a1 is the cost of
    retrieving a non-relevant document
  • If we knew Pr(relx) (or Pr(nonrelx)), solution
    would be trivial, but...

53
BIR
  • Use Bayes Theorem to rewrite Pr(relx)
  • Pr(x) probability of observing x
  • P(rel) a priori probability of relevance (ie,
    probability of observing a set of relevant
    documents)
  • probability that x is in the given set of
    relevant docs

54
BIR
  • Can do the same for

55
BIR
  • The document ranking function can be rewritten
    as
  • and simplified as
  • Pr(x rel) and Pr(x nonrel) are still
    unknown, so we will replace them in terms of
    keywords in the document!

56
BIR
  • We assume that terms occur independently in
    relevant and non-
  • relevant docs...
  • probability that term xi is present in a
    document randomly selected from the ideal answer
    set
  • probability that term xi is present in
    a document randomly selected from outside the
    ideal answer set

57
BIR
  • Considering document , where di is the
    weight of term i,
  • where is the probability that a relevant
    document contains term xi (similarly for )

58
BIR
  • When di 0 we want the contribution of term i to
    g(x) to be 0

59
BIR
  • The term relevance weight of term xi is
  • Weight of term i in document j is

60
BIR
  • Estimation of term occurrence probability
  • Given a query, a document collection can be
    partitioned into a relevant and non-relevant set
  • The importance of a term j is its discriminatory
    power in distinguishing between relevant and
    nonrelevant documents

61
BIR
  • With complete
  • information about
  • the relevant non-
  • relevant document
  • sets we can estimate
  • pj and qj
  • Approximation

62
BIR
  • Term Occurrence Probability Without Relevance
    Information
  • What do we do because we dont know rj?
  • since most docs are nonrelevant
  • pj 0.5 (arbitrary)
  • does this remind you of anything?

63
BIR
  • Reminder... Ranking Function
  • where,
  • pi Pr(xidirel)
  • qi Pr(xidinonrel)
  • and di is the weight of term i

64
Relevance Feedback in BIR
  • Want to add more terms to the query so the query
    will resemble documents marked as relevant (note
    difference from VSM)
  • How do we select which terms to add to the query?

65
Relevance Feedback in BIR
  • Rank terms in marked documents and add the first
    m terms
  • where
  • N no. of docs in the collection
  • ni document frequency of term i
  • R no. of relevant docs selected
  • ri no. of docs in R containing term i
  • Compares frequency of occurrence of term in R
    with document frequency

66
Question-Answering on the Web
  • Two aspects to IR
  • Coverage (find all relevant documents)
  • Question-Answering (find the answer to specific
    query)
  • In QA we want one answer to our question
  • How much NLP do we need to use to answer
    fact-based questions
  • Answers that require reasoning are much harder!

67
Question Answering
  • Most IR tasks assume that user can predict what
    terms a relevant document will contain
  • But sometimes what we want is the answer to a
    direct question
  • Who was the first man on the moon?
  • Do we really want a list of millions of documents
    that contain first, man, moon?
  • And do we really want to have to read them to
    find the answer?

68
Question Answering
  • All we want is one document, or one statement,
    that contains the answer
  • Can we take advantage of IR on the Web to do
    this?
  • Taking advantage of redundancy on the Web
  • E.g., Mulder, Dumais

69
Mulder
  • Uses Web as collection of answers to factual
    questions
  • Who was the first man on the moon?
  • What is the capital of Italy?
  • Where is the Taj Mahal?

kwok01scaling.pdf
70
Mulder
  • Three parts to a QA system
  • Retrieval Engine
  • Indexes documents in a collection and retrieves
    them
  • Query Formulator
  • Converts NL question into formal query
  • Answer Extractor
  • Locates answer in text

71
Mulder
  • Six parts to Mulder
  • Question Parsing
  • Question Classification
  • Query Formulation
  • Search Engine
  • Answer Extraction
  • Answer Selection

72
Dumais et al
  • Takes advantage of multiple, differently phrased,
    answer occurrences on Web
  • Doesnt need to find all answer phrases
  • Just the ones that match the query pattern
  • Rules for converting questions, finding answers
    are mostly handwritten

p291-dumais
73
Dumais et al
  • Steps
  • Rewrite question into weighted query patterns
  • Use POS tagger lexicon to seek alternative word
    forms
  • Search
  • Mine N-grams in summaries
  • Filter and re-weight N-grams
  • Tile N-grams to yield longer answers

74
Azzopardi
  • Joel Azzopardi, 2004, Template-Based Fact
    Finding on the Web FYP report, CSAI
  • Can find factoids about a series of queries
    relating to a particular topic using majority
    polling (voting) to decide amongst competing
    answers
  • Series of topic sensitive query patterns stored
    in template

75
Azzopardi
  • Template is learned by comparing a sample of
    documents about a topic
  • Commonly occurring phrases (trigrams) extracted
    and turned into partial query in template,
    together with answer type

76
Azzopardi
  • When user wants information regarding a topic,
    use appropriate template together with subject
    (e.g., persons name)
  • Subject is appended to partial queries in
    template - queries are submitted to Google
  • Top-n documents retrieved and processed to
    identify candidate answers
  • Uses voting to decide on most frequently
    occurring answer

77
Summary
  • Weve discussed a couple of popular models of IR
    that are more intelligent that plain old
    Extended Boolean Information Retrieval
  • They still treat terms as atoms that are
    representative of the semantic meaning of the
    document

78
Summary
  • But word order generally insignificant (bag of
    words)
  • Cannot distinguish between dog chased cat and
    cat chased dog
  • unless phrase matching also used, but then cannot
    tell that cat chased dog and dog was chased by
    cat are semantically equivalent
  • What about information extraction?
  • George W. Bush President of the United States
    of America

79
Summary
  • More intelligent approaches have been used
  • And more intelligence is being put into the
    Web
  • Personalisation and user-adaptivity also require
    high accuracy in determining which documents are
    relevant to a user

80
Summary
  • Sowas conceptual graphs and McCarthys
    Generality in AI/Notes on Contextual Reasoning
    are seminal works that underpin much that is
    happening in the Semantic Web
  • CGs represent semantic content of utterances in
    interchangeable format (KIF)
  • McCarthy claims that it is hard to make correct
    inferences in the absence of contextual
    information

81
Summary
  • Because of the expense of CGs, they are still
    very much domain specific
  • SemWeb hopes that by bringing massive numbers of
    people together there will be a proliferation of
    ontologies to make it happen
  • Guha did his PhD Contexts A Formalisation and
    Some Applications at Stanford, under John
    McCarthy. His work on Cyc underpins RDF, DAMLOIL

82
Dealing with General Knowledge
  • Why did Mary hit the piggy bank with a hammer?

83
Dealing with General Knowledge
  • Do computer systems need general knowledge?
  • How do computer systems represent general
    knowledge?

84
Dealing with General Knowledge
  • Do we need general knowledge?
  • How do we represent general knowledge?

85
Dealing with General Knowledge
  • As usual, has its roots in philosophy
    (epistemology)
  • Early (i.e., Greek) revolved around Absolute and
    Universal Ideas and Forms (Plato)
  • Aristotle Logic for representing and reasoning
    about knowledge

http//pespmc1.vub.ac.be/EPISTEMI.html
86
Dealing with General Knowledge
  • Following Renaissance, two main schools of
    thought
  • Empiricists
  • Knowledge as product of sensory perception
  • Rationalists
  • Product of rational reflection

87
Dealing with General Knowledge
  • Kantian Synthesis of empiricism and reflectionism
  • Knowledge results from the organization of
    perceptual data on the basis of inborn cognitive
    structures, called "categories".
  • Categories include space, time, objects and
    causality.
  • (viz. Chomskys Universal Grammar)

88
Dealing with General Knowledge
  • Pragmatism
  • Knowledge consists of models that attempt to
    represent the environment to simplify
    problem-solving
  • Assumption Models are rich. No model can ever
    hope to capture all relevant information, and
    even if such a complete model would exist, it
    would be too complicated to use in any practical
    way.

89
Dealing with General Knowledge
  • Pragmatism (contd.)
  • The model which is to be chosen depends on the
    problems that are to be solved (context).
  • But see also discussions on pragmatic vs.
    cognitive contexts! (Topic 3)
  • Basic criterion model should produce correct (or
    approximate) (testable) predictions or
    problem-solutions, and be as simple as possible.
  • This is the approach mainly used in CS/AI today

90
Dealing with General Knowledge
  • The first theories of knowledge stressed its
    absolute, permanent character, whereas the later
    theories put the emphasis on its relativity or
    situation-dependence, its continuous development
    or evolution, and its active interference with
    the world and its subjects and objects. The whole
    trend moves from a static, passive view of
    knowledge towards a more and more adaptive and
    active one.

http//pespmc1.vub.ac.be/EPISTEMI.html
91
Dealing with General Knowledge
  • Well look at four overviews of and approaches to
    knowledge in computer systems
  • McCarthy (1959, mcc.pdf)
  • Sowa (1979, p79-1010.pdf)
  • McCarthy (1987, p1030-mccarthy.pdf)
  • Brézillon Pomerol (2001, is-context-a-kind.pdf)

92
Dealing with General Knowledge
  • McCarthy, J. 1959. Programs with Common Sense
  • a program has common sense if it automatically
    deduces for itself a sufficiently wide class of
    immediate consequences of anything it is told and
    what it already knows.

93
Dealing with General Knowledge
  • Objective to make programs that learn from
    their experience as effectively as humans do
  • To learn to improve how to learn
  • And to do it in logic using a logical
    representation

94
Dealing with General Knowledge
  • Minimum features required of a machine that can
    evolve intelligence approaching that of humans
  • Representation of all behaviours
  • Interesting changes in behaviour must be
    expressible
  • All aspects of behaviour must be improvable
  • Must have notion of partial success
  • System must be able to create/learn subroutines

95
Dealing with General Knowledge
  • Bar-Hillels biggest complaint (in my opinion) is
  • A deductive argument, where you have first to
    find out what are the relevant premises, is
    something which many humans are not always able
    to carry out successfully. I do not see the
    slightest reason to believe that at present
    machines should be able to perform things that
    humans find trouble in doing
  • Well return to this in Closed vs. Open World
    Assumption

96
Dealing with General Knowledge
  • Sowa, J. 1979. Semantics of Conceptual Graphs
  • Logic used by McCarthy as representation of
    statements about the world as well as theorem
    prover to infer/deduce new knowledge
    (assumptions) about the world
  • Sowa uses CG as a language for representing
    knowledge and patterns for constructing models

97
Dealing with General Knowledge
  • Sowa proposes CGs as better alternative to
    semantic networks and predicate calculus
  • SemNets have no well-defined semantics
  • PC is adequate for describing mathematical
    theories with a closed set of axioms... But the
    real world is messy, incompletely explored, and
    full of unexpected surprises

98
Dealing with General Knowledge
  • CGs serve two purposes
  • They can be used as canonical representations of
    meaning in Natural Language
  • They can be used to construct abstract structures
    that serve as models in the model-theoretic sense
    (e.g., microtheories)

99
Dealing with General Knowledge
  • To understand a sentence
  • Convert utterance to CG
  • Join CG to graphs that help resolve ambiguities
    and incorporate background information
  • Resulting graph is nucleus for constructing
    models (of worlds) in which utterance is true
  • Laws of world block illegal extensions
  • If model could be extended infinitely, result
    would be complete standard model

100
Dealing with General Knowledge
  • Mary hit the piggy bank with a hammer

101
Dealing with General Knowledge
  • Linearizing the conceptual graph
  • PERSONMary-gt(AGNT)-gtHITc1lt-(INST) lt-
    HAMMER
  • HITc1lt-(PTNT)lt-PIGGY-BANKi22103

102
Dealing with General Knowledge
  • Context-sensitive logical operators
  • Allow building models of possible worlds and
    checking their consistency
  • Def A sequent is a collection of conceptual
    graphs divided into two sets, called the
    conditions u1,..., un and the assertions v1,...,
    vm. It is written u1,..., un -gtv1,..., vm.

103
Dealing with General Knowledge
  • Cases of sequents
  • simple assertion no conditions, one assertion
    (-gtv)
  • disjunction no conditions, one or more
    assertions
  • (-gtv1,..., vm)
  • simple denial one condition, no assertions (u-gt)
  • compound denial 2 or more conditions, no
    assertions (u1,..., un-gt)
  • conditional assertion u1,..., un -gtv1,..., vm
  • empty clause -gt
  • Horn clause anything with at most one assertion
    (inc. 0)

104
Dealing with General Knowledge
  • McCarthy, J. 1987. Generality in Artificial
    Intelligence (1971 Turing Award Lecture)
  • no one knows how to make a general database of
    commonsense knowledge that could be used by any
    program that needed the knowledge
  • Examples robots moving things around, what we
    know about families, buying and selling...

105
Dealing with General Knowledge
  • In my opinion, getting a language my italics
    for expressing general commonsense knowledge for
    inclusion in a general database is the key
    problem of generality in AI.

106
Dealing with General Knowledge
  • How can we write programs that can learn to
    modify their own behaviour, including improving
    the way they learn?
  • Friedberg (A Learning Machine, c. 1958)
  • Newell, Simon, Shaw (General Problem Solver, c.
    1957-1969)
  • Newell, Simon (Production Machines, 1950-1972)
  • McCarthy (Logical Representation, c. 1958)
  • McCarthy (Formalising Context, 1987)

107
Dealing with General Knowledge
  • A Learning Machine
  • Learns by making random modifications to a
    program
  • Discard flawed programs
  • Learnt to move a bit from one memory cell to
    another
  • In 1987, was demonstrated to be inferior to
    simply re-writing the entire program

108
Dealing with General Knowledge
  • General Problem Solver
  • Represent problems of some class as problems of
    transforming one expression into another using a
    set of allowed rules
  • First system to separate problem structure from
    the domain
  • McCarthy claims problem in representing
    commonsense knowledge as transformations

109
Dealing with General Knowledge
  • Production (Expert) Systems
  • Represent knowledge as facts and rules
  • Facts contain no variables or quantifiers
  • New facts are produced by inference, observation
    and user input
  • Rules are usually coded by programmer/expert
  • Rules are usually not learnt or generated by
    system (but see data mining)

110
Dealing with General Knowledge
  • Logical Representation
  • Representing information declaratively
  • Although Prolog can represent facts in logical
    representation and reason using logic, it cannot
    do universal generalization, and so cannot modify
    its own behaviour enough
  • So McCarthy built Lisp...

111
Dealing with General Knowledge
  • Logical Representation
  • McCarthys dream is that commonsense knowledge
    possessed by humans could be written as logical
    sentences and stored in a db
  • Facts about the effects of actions is essential
    (when we hear the squeal of types we expect a
    bang...)
  • Necessary to say that an action changes only
    features of the situation to which it refers

112
Dealing with General Knowledge
  • Context
  • We understand under-qualified utterances because
    we understand them in context
  • The book is on the table
  • Where is the book?

113
Dealing with General Knowledge
  • Context
  • Can you fetch me the book, please?
  • Up until the last utterance, the physical
    location of the book was not significant, and we
    were able to have a short dialogue about it
  • Fully qualified utterances are too unwieldy to
    use in conversation
  • Occasionally gives rise to misunderstandings...

114
Dealing with General Knowledge
  • Context
  • The book is on the table is valid for a large
    number of different contexts, in which the
    specific book and the specific table, and perhaps
    even the location of the specific table can be
    significant and can also change over time
  • Utterances are understood in context

115
Dealing with General Knowledge
  • Is Context a ... collective Tacit Knowledge?
  • How does data become knowledge?

116
Dealing with General Knowledge
  • Is Context a ... collective Tacit Knowledge?
  • Context is the collection of relevant conditions
    and surrounding influences that make a situation
    unique and comprehensible

117
Dealing with General Knowledge
  • Where is context?

118
Dealing with General Knowledge
  • Closed world vs. Open World assumption
  • Closed World
  • I assume that anything I dont know the truth of
    is false I know everything that is true
  • Open World
  • I assume that anything I dont know the truth of
    is unknown Some things I dont know may be true
    I dont know everything

119
Dealing with General Knowledge
  • Prolog, for instance, will return false about
    any fact that is missing from its database, or
    for which it cannot derive a truth-value
  • A three-valued logic permits assertions to be
    true, false, or unknown
  • However, reasoning and truth-maintenance become
    expensive in the open world

120
Dealing with General Knowledge
  • The Web is an open world so the Semantic Web
    needs to reason within an open world (perhaps
    even across ontologies)
  • Doesnt mean that to solve some problems, SW
    cannot temporarily assume a closed-world (within
    an agreed ontology)

ekaw2004.pdf
121
Teaching Knowledge
  • Intelligent Tutoring Systems need to model both
    the user and the domain to create a learning path
    based on the students prior knowledge and goals,
    and to monitor the students progress
  • AHSs developed partly by using hypertext systems
    as domain representations for ITSs - basically,
    when intelligent tutoring moved to the Web

122
Intelligent Tutoring Systems
  • Overview
  • Modern ITS development began in 1987, after a
    review by Wenger
  • Wenger, E. (1987). Artificial Intelligence and
    Tutoring Systems Computational and Cognitive
    Approaches to the Communication of Knowledge. Los
    Altos, CA Morgan Kaufmann Publishers, Inc.
  • This was the first attempt to examine the
    implicit and explicit goals of ITS designers

123
Intelligent Tutoring Systems
  • Wenger described ITS as a part of "knowledge
    communication" and his review focused on
    cognitive and learning aspects as well as the AI
    issues

124
Intelligent Tutoring Systems
  • "... consider again the example of books they
    have certainly outperformed people in the
    precision and permanence of their memory, and the
    reliability of their patience. For this reason,
    they have been invaluable to humankind. Now
    imagine active books that can interact with the
    reader to communicate knowledge at the
    appropriate level, selectively highlighting the
    interconnectedness and ramifications of items,
    recalling relevant information, probing
    understanding, explaining difficult areas in more
    depth, skipping over seemingly known material ...
    intelligent knowledge communication systems are
    indeed an attractive dream." (p. 6).

125
Intelligent Tutoring Systems
  • Motivations underlying ITSs (and education in
    general)
  • to teach about something (abstract)
  • to teach how to do something (practical)

126
Intelligent Tutoring Systems
  • How can learning be achieved?
  • By rote
  • By mimicry (observation)
  • By application

127
Intelligent Tutoring Systems
  • When student performs task correctly, assume
    student understands concept and/or its
    application
  • When student performs task incorrectly, how can
    the tutor help?
  • Simply tell the student the correct answer
  • Tell student the correct answer and state why
    it's correct
  • Explain to the student why his/her answer is
    incorrect

128
Intelligent Tutoring Systems
  • Explanation-based correction is HARD!
  • Tutor must first understand why the student gave
    the incorrect answer
  • Student lacks knowledge
  • Incorrect application of correct procedure
  • Misinterpretation of task
  • Misconception of principle

129
Intelligent Tutoring Systems
  • How to tutor?
  • Originally Computer-Aided Instruction (CAI) used
    non-interactive "classroom" techniques.
  • All students were taught in the same manner
    (e.g., through flash cards) and then assessed.
  • If a student failed, student had to work through
    the same material again, to "learn it better"
  • Access to human tutor to address difficulties
  • This type of learning, although self-paced, is
    ineffective

130
Intelligent Tutoring Systems
  • The goal of an ITS
  • A student learns from ITS by solving problems.
  • The ITS selects a problem and compares its
    solution with that of the student
  • It performs a diagnosis based on the differences.
  • After giving feedback, system reassesses and
    updates the student skills model and entire cycle
    is repeated.

131
Intelligent Tutoring Systems
  • The goal of an ITS (continued)
  • As the system assesses what the student knows, it
    also considers what the student needs to know,
    which part of the curriculum is to be taught
    next, and how to present the material.
  • It then selects the next problem/s.

132
Intelligent Tutoring Systems
  • Basic issues in
  • knowledge
  • communication

133
Intelligent Tutoring Systems
  • Domain Expertise
  • Rather than being represented by chunks of
    information, the domain should be represented
    using a model and a set of rules which allows the
    system to "reason"
  • Typical domain model representations (make closed
    world assumption!)
  • If - Then Rules
  • If - Then Rules with uncertainty measures
  • Semantic Networks
  • Frame based representations

134
Intelligent Tutoring Systems
  • Student Model
  • According to Wenger, student models have three
    tasks. They must
  • Gather information about the student (implicitly
    or explicitly)
  • Create a representation of the student's
    knowledge and learning process (often as buggy
    models)
  • Perform a diagnosis to determine what the student
    knows and to determine how the student should be
    taught and to identify misconceptions

135
Intelligent Tutoring Systems
  • Student model architectures
  • Overlay student models
  • Differential student models
  • Perturbation student models

136
Intelligent Tutoring Systems
  • Student model diagnosis
  • Performance measuring
  • Model tracing
  • Issue tracing
  • Expert systems

137
Intelligent Tutoring Systems
  • Pedagogical expertise
  • Used to decide how to
  • present/sequence information
  • answer questions/give explanations
  • provide help/guidance/remediation

138
Intelligent Tutoring Systems
  • According to Wenger, when "learning is viewed as
    successive transitions between knowledge states,
    the purpose of teaching is accordingly to
    facilitate the student's traversal of the space
    of knowledge states." (p. 365)
  • The ITS must model the student's current
    knowledge and support the transition to a new
    knowledge state.

139
Intelligent Tutoring Systems
  • ITSs must alternate between diagnostic and
    didactic support.
  • Diagnostic support
  • Information about a student's state is inferred
    on 3 levels
  • Behavioural - ignores learner's knowledge, and
    concentrates on observed behaviour
  • Epistemic - attempts to infer learner's knowledge
    state based on learner's behaviour
  • Individual - cognitive model of learner's state,
    attitudes (to self, world, ITS), motivation

140
Intelligent Tutoring Systems
  • Didactic support
  • Concerned with the "delivery" aspect of teaching

141
Intelligent Tutoring Systems
  • Interface
  • The interface is the layer through which the
    learner and ITS communicate
  • The design of an interface which enhances
    learning is essential
  • Web-based ITSs tend to rely on the Web browser to
    provide the interface
  • Hypermedia-based ITSs in general must provide
    adaptive presentation and adaptive navigation
    facilities, if they are to extend beyond
    knowledge exploration environments

142
Intelligent Tutoring Systems
Write a Comment
User Comments (0)
About PowerShow.com