BURC: Bootstrapping Using ResearchCyc - PowerPoint PPT Presentation

About This Presentation
Title:

BURC: Bootstrapping Using ResearchCyc

Description:

http://www.cs.rochester.edu/~schubert/projects/world-knowledge-mining.html. Lenhart K. Schubert and Matthew Tong, 'Extracting and evaluating general world ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 30
Provided by: kinoco
Category:

less

Transcript and Presenter's Notes

Title: BURC: Bootstrapping Using ResearchCyc


1
BURC Bootstrapping Using ResearchCyc
  • By Kino Coursey

2
Introduction to the Problem
  • Goal To extend Cycs knowledge base using
    relationships implied to be possible, normal or
    commonplace in the world
  • Prior work with Cyc knowledge entry has been
    manually oriented
  • How will we collect commonsense without a body
    and manual labor?
  • Read, Parse, Mine!
  • Proposal Read text, Parse into a database,
    Extract relations between words, Propose
    hypothetical relations between concepts

3
Basic Analogy
  • The Shotgun approach to the Human Genome
  • Extract millions of fragments then knit them back
    together by finding commonalities
  • Will it work for the Human Menome?

4
What is Cyc?
  • the world's largest and most complete general
    knowledge base and commonsense reasoning engine
  • Started in mid 1980s (should take only 10
    years.)
  • Logic Based
  • LISP oriented
  • For WordNet users, each Concept Synset
  • Available from http//www.opencyc.org
  • http//researchcyc.cyc.com
  • Big (ResearchCyc v0.8)
  • Constants 89,379
  • Assertions 968,985
  • Deduction 361,185
  • Sample Collection Extents
  • EnglishWord 18,007
  • Event 6,050
  • PartiallyTangible 24,387
  • Microtheory 1,688

5
Example of what Cyc currently knows about fingers
  • Collection Finger
  • GAF Arg 1
  • Mt UniversalVocabularyMtisa
    AnimalBodyPartType
  • quotedIsa DensoOntologyConstant
  • genls Digit-AnatomicalPart
  • comment "The collection of all digits of all
    Hands (q.v.). Fingers are (typically) flexibly
    jointed and are necessary to enabling the hand
    (and its owner) to perform grasping and
    manipulation actions."
  • Mt BaseKBdefiningMt AnimalPhysiologyVocabula
    ryMt
  • Mt AnimalPhysiologyMtproperPhysicalPartTypes
    Fingernail
  • Mt WordNetMappingMt (synonymousExternalConcept
    Finger WordNet-Version2_0 "N05247839")
    (synonymousExternalConcept Finger
    WordNet-1997Version "N04312497")
  • GAF Arg 2
  • Mt UniversalVocabularyMt (genls LittleFinger
    Finger) (genls IndexFinger Finger) (genls Thumb
    Finger) (genls RingFinger Finger) (genls
    MiddleFinger Finger)
  • Mt HumanActivitiesMt (bodyPartsUsed-TypeType
    Typing Finger)
  • Mt HumanSocialLifeMt (bodyPartsUsed-TypeType
    PointingAFinger Finger)

6
Example of what Cyc currently knows about fingers
- 2
  • GAF Arg 3
  • Mt HumanPhysiologyMt (relationAllExists
    anatomicalParts HomoSapiens Finger)
  • Mt VertebratePhysiologyMt (relationAllExistsCou
    nt physicalParts Hand Finger 5)
  • Mt UniversalVocabularyMt (relationAllOnly
    wornOn Ring-Jewelry Finger)
  • Mt AnimalPhysiologyMt (relationExistsAll
    physicalParts Hand Finger)
  • GAF Arg 4
  • Mt GeneralEnglishMt (denotation Finger-TheWord
    CountNoun 0 Finger)
  • Mt AnimalPhysiologyMt
  • -(conceptuallyRelated Fingernail Finger)
    (properPhysicalPartTypes Hand Finger)
    (relationAllInstance age Finger       
    (YearsDuration 0 200)) (relationAllInstance
    widthOfObject Finger        (Meter 0.001 0.2))
    (relationAllInstance heightOfObject Finger
           (Meter 0.001 0.2)) (relationAllInstance
    lengthOfObject Finger        (Meter 0.01 0.5))
    (relationAllInstance massOfObject Finger       
    (Kilogram 0.001 1))

7
Bootstrapping with ResearchCyc
  • Cyc has vocabulary about objects in the world and
    relationships
  • Cyc could still use more common relationships
  • BURC uses what Cyc already has lots of parsed
    text to create new Cyc entries for common
    relationships found in the text
  • Lenats Bootstrap Hypothesis once Cyc reaches a
    certain level/scale it can help in its own
    development and start using NLP to augment its
    knowledge base
  • BURC should help test this hypothesis

8
The BURC Process From seedsHypothe-seeds
  • Use the link grammar parser for bulk parsing of
    text, primarily narratives based in worlds like
    ours. Other text styles could be included.
  • Operates in two directions
  • Forward from text to CycL
  • Backwards from existing CycL to the text to find
    new forward patterns

9
BURC Process - 2
  • Load the link fragments into a database (1 and 2
    link fragments), and compute frequency of
    fragment occurrences. The database will be in a
    SQL format so multiple queries can be formed
    dynamically.
  • Using Cyc knowledge as a starting point (the
    seeds), extract knowledge for use in Cyc
  • Given a set of seed facts in Cyc, identify how
    those facts are represented as link fragments in
    the database
  • Generate conjectures as to new knowledge AND new
    knowledge extraction patterns using the fragment
    patterns.

10
BURC Process - 3
  • Use Cyc knowledge directly to conjecture new
    statements
  • Cyc has lexical knowledge, which can be used as
    templates against the DB to form new statements
  • For example, common adjectives applied to noun
    classes
  • Cyc knows WhiteColor and Blouse but does not
    know that white is a common blouse color,
    although it becomes apparent after reading some
    text
  • Optionally, gather supporting background
    statistics for hypothesis verification using
    other sources
  • Perhaps Google desktop with a larger than fully
    parsed corpus
  • Perhaps check against answer extraction engines

11
KNEXT (KNowledge EXtraction from Text)
  • Deriving general world knowledge from texts and
    taxonomies
  • http//www.cs.rochester.edu/schubert/projects/wor
    ld-knowledge-mining.html
  • Lenhart K. Schubert and Matthew Tong, "Extracting
    and evaluating general world knowledge from the
    Brown Corpus", Proc. of the HLT-NAACL Workshop on
    Text Meaning, May 31, 2003, Edmonton, Alberta,
    pp. 7-13.
  • System extracts commonsense relationships from
    text
  • Limited to the pre-parsed Penn Treebank
  • Generated 117,326 propositions (about 2 per
    sentence)
  • About 60 judged reasonable by any given judge

12
KNEXT (Example)
  • (BLANCHE KNEW 0 SOMETHING MUST BE CAUSING STANLEY
    'S NEW, STRANGE BEHAVIOR BUT SHE NEVER ONCE
    CONNECTED IT WITH KITTI WALKER.)
  • A FEMALE-INDIVIDUAL MAY KNOW A PROPOSITION.
  • SOMETHING MAY CAUSE A BEHAVIOR.
  • A MALE-INDIVIDUAL MAY HAVE A BEHAVIOR.
  • A BEHAVIOR CAN BE NEW.
  • A BEHAVIOR CAN BE STRANGE.
  • A FEMALE-INDIVIDUAL MAY CONNECT A
    THING-REFERRED-TO WITH A FEMALE-INDIVIDUAL.
  • ((I (Q DET FEMALE-INDIVIDUAL) KNOWV (Q DET
    PROPOS))
  • (I (F K SOMETHINGN) CAUSEV (Q THE
    BEHAVIORN))
  • (I (Q DET MALE-INDIVIDUAL) HAVEV (Q DET
    BEHAVIORN))
  • (I (Q DET BEHAVIORN) NEWA)
  • (I (Q DET BEHAVIORN) STRANGEA)
  • (I (Q DET FEMALE-INDIVIDUAL) CONNECTV (Q
    DET THING-REFERRED-TO)
  • (P WITHP (Q DET FEMALE-INDIVIDUAL))))

13
Other Extraction Pattern Research
  • Towards Terascale Knowledge Acquisition (Pantel,
    Ravichandran and Hovy, 2004)
  • Learning Surface Text Patterns for a Question
    Answering System (Ravichandran Hovy, 2002)
  • Defined Pattern Precision P Ca/Co
  • Ca total number of patterns with answer term
    present
  • Co Total number of patterns with any term
    present
  • DIRT Discovery of Inference Rules from Text
    (Lin Pantel, 2001)

14
Other Lexical Knowledge Research
  • VerbOcean (Chklovski Pantel) Collecting pairs
    and searching to verify relationships
  • Lexical Acquisition via Constraint Solving
    (Pedersen Chen) Acquiring syntactic and
    semantic classification rules of unknown words
    for LGP
  • Information Extraction Using Link Grammar papers
  • Automatic Meaning Discovery Using Google

15
The General Backwards Model
  • Given some Cyc relation Pred(?X,?Y)
  • Create SQL search query
  • Lookup in Cyc lexical entries for X Y ? LX, LY
  • Select from LGPTable where Term1"ltLXgt" and
    Term3"ltLYgt
  • System returns records LX Link1 Term2
    Link2 LY (Freq)
  • Generate new hypothetical extraction patterns
  • Select from LGPTable where Link1"ltL1gt" and
    Link2"ltL2gt" and Term2"ltT2gt
  • L1 T2 L2 ? generate hypothetical record (
    Pred ?S1?S3 )
  • Frequency information is propagated forward

16
The General Backwards Model - 2
  • Optional Search Cyc for ?PRED (X,Y) and use the
    set to form a local ambiguity class to reduce
    search labor and identify ambiguity. One rule ?
    multiple relations.
  • Stored as SQLTemplate \ Pattern \
    Pred1/Pred2//PRedN
  • Need to explore (canidateBinaryPred ARG1 ARG2
    RELN)
  • Optional Form more specific patterns for
    Pred(X,_) and Pred(_,Y)

17
Update the LGParsers CycL Rules
  • ltrulegt
  • ltpatterngt Link1 Term2 Link2lt/patterngt
  • ltdefinegt?ITEMr lt/definegt
  • ltbodygt(is-node ?ITEMr "R")lt/bodygt
  • ltdefinegt?ITEMl lt/definegt
  • ltbodygt(is-node ?ITEMl "R")lt/bodygt
  • ltbodygt(?PRED1 ?ITEMl ?TERMr)lt/bodygt
  • ...
  • ltbodygt(?PREDN ?ITEMl ?TERMr)lt/bodygt
  • lt/rulegt
  • There are rules for translation of LGP output
    into CycL
  • If the frequency information warrants it then we
    can generate new LGP rules
  • Results in expanded parser precision

18
Forward Mining Adjective Relations
  • There are 1941 GAFs on adjSemTrans, the primary
    lexical adjective predicate
  • Find applicable fragments and use definitions
  • Select from LGPTable Where NumLinks1 and
    Link1'a' and Term1 like '.a' and Term2 like
    '.n
  • Returns records Term1.a a Term2.n
  • Potentially test using either an internal or
    search engine based relevancy metric
  • Query Cyc for (adjSemTrans ltterm1gt-TheWord ?N
    RegularAdjFrame (?Pred NOUN ?Val))
  • Generate (plausiblePredValOFType ltterm2gt lt?Predgt
    lt?Valgt)
  • Possibly generate parsing rule

19
Mining Adjective Knowledge Example
  • white blouse as factoid
  • white.a a blouse.n
  • Potentially test using an internal or search
    engine relevancy metric GC70400
  • (adjSemTrans White-TheWord 11 RegularAdjFrame
    (mainColorOfObject NOUN WhiteColor))
  • Hypothesis (plausiblePredValueOfType Blouse
    mainColorOfObject WhiteColor)

20
Update the LGParsers CycL Rules - 2
  • ltrulegt
  • ltpatterngt Term1.a a lt/patterngt
  • ltdefinegt?ITEMr lt/definegt
  • ltbodygt(is-node ?ITEMr "R")lt/bodygt
  • ltdefinegt?ITEMl lt/definegt
  • ltbodygt(?PRED ?ITEMr ?VAL)lt/bodygt
  • lt/rulegt
  • There are rules for translation of LGP output
    into CycL
  • We can use the adjSemTrans data to generate new
    translation rules
  • Results in expanded parser precision
  • ltrulegt
  • ltpatterngt white.a a lt/patterngt
  • ltdefinegt?ITEMr lt/definegt
  • ltbodygt(is-node ?ITEMr "R")lt/bodygt
  • ltdefinegt?ITEMl lt/definegt
  • ltbodygt(mainColorOfObject ?ITEMr
    WhiteColor)lt/bodygt
  • lt/rulegt

21
Mined Finger Descriptions
  • 000010(plausiblePredValueOfType Finger
    feelsSensation (PositiveAmountFn
    LevelOfSoreness))
  • 000037(plausiblePredValueOfType Finger
    forceCapacity Strong)
  • 000025(plausiblePredValueOfType Finger
    forceCapacity Strong)
  • 000025(plausiblePredValueOfType Finger
    hardnessOfObject Hard)
  • 000037(plausiblePredValueOfType Finger
    hardnessOfObject (MediumToVeryHighAmountFn
    Hardness))
  • 000037(plausiblePredValueOfType Finger
    hardnessOfObject (MediumToVeryHighAmountFn
    Hardness))
  • 000002(plausiblePredValueOfType Finger
    hasEvaluativeQuantity (MediumToVeryHighAmountF
    n Goodness-Generic))
  • 000002(plausiblePredValueOfType Finger
    hasPhysicalAttractiveness GoodLooking)
  • 000047(plausiblePredValueOfType Finger isa
    (LeftObjectOfPairFn REPLACE))
  • 000015(plausiblePredValueOfType Finger isa
    (RightObjectOfPairFn REPLACE))
  • 000155(plausiblePredValueOfType Finger
    lengthOfObject (RelativeGenericValueFn
    lengthOfObject REPLACE highAmountOf))
  • 000155(plausiblePredValueOfType Finger
    lengthOfObject (RelativeGenericValueFn
    lengthOfObject REPLACE highToVeryHighAmountOf
    ))
  • 000003(plausiblePredValueOfType Finger
    mainColorOfObject BlackColor)
  • 000010(plausiblePredValueOfType Finger
    mainColorOfObject LightYellowishBrown-Color)
  • 000010(plausiblePredValueOfType Finger
    mainColorOfObject ModerateYellowishBrown-Color
    )
  • 000010(plausiblePredValueOfType Finger
    mainColorOfObject SunTan-FleshColor)
  • 000002(plausiblePredValueOfType Finger
    possessiveRelation SuddenChange)

22
Mined Finger Descriptions
  • 000006(plausiblePredValueOfType Finger
    possessiveRelation (HighAmountFn Speed))
  • 000094(plausiblePredValueOfType Finger
    rigidityOfObject (HighAmountFn Rigidity))
  • 000060(plausiblePredValueOfType Finger
    sizeParameterOfObject (RelativeGenericValueFn
    sizeParameterOfObject REPLACE highAmountOf))
  • 000052(plausiblePredValueOfType Finger
    sizeParameterOfObject (RelativeGenericValueFn
    sizeParameterOfObject REPLACE
    highToVeryHighAmountOf))
  • 000060(plausiblePredValueOfType Finger
    sizeParameterOfObject (RelativeGenericValueFn
    sizeParameterOfObject REPLACE
    highToVeryHighAmountOf))
  • 000285(plausiblePredValueOfType Finger
    sizeParameterOfObject (RelativeGenericValueFn
    sizeParameterOfObject REPLACE
    veryLowToLowAmountOf))
  • 000074(plausiblePredValueOfType Finger
    sizeParameterOfObject (RelativeGenericValueFn
    sizeParameterOfObject REPLACE
    veryLowToLowAmountOf))
  • 000029(plausiblePredValueOfType Finger
    speedOfObject-Underspecified (LowAmountFn
    Speed))
  • 000138(plausiblePredValueOfType Finger
    surfaceFeatureOfObj Slippery)
  • 000074(plausiblePredValueOfType Finger
    temperatureOfObject Warm)
  • 000004(plausiblePredValueOfType Finger
    textureOfObject Rough)
  • 000168(plausiblePredValueOfType Finger
    thicknessOfObject (RelativeGenericValueFn
    thicknessOfObject REPLACE highAmountOf))
  • 000168(plausiblePredValueOfType Finger
    thicknessOfObject (RelativeGenericValueFn
    thicknessOfObject REPLACE highToVeryHighAmoun
    tOf))
  • 000182(plausiblePredValueOfType Finger
    wetnessOfObject Wet)

23
Verb Semantic Filtering -1Discovering what a
finger can do
  • A similar process can be used finding information
    based on verb semantic parsing frames
  • For each potential ltNOUNWORDgt-ltVERBgt pair query
    Cyc to find basic relationships using the verb
    semantic templates
  • (and
  • (denotation ltNOUNWORDgt ?NOUNTYPE ?N ?CYCTERM)
  • (wordForms ?WORD ?PRED ""ltVERBgt"")
  • (speechPartPreds ?POS ?PRED)
  • (semTransPredForPOS ?POS ?SEMTRANSPRED)
  • (?SEMTRANSPRED ?WORD ?NUM ?FRAME ?TEMPLATE))
  • Verify for each potential relationship (ltSPREDgt
    ltVERTERMgt ltCYCTERMgt) derivable from ?TEMPLATE
    that it makes sense in the ontology
  • (and
  • (arg1Isa ltSPREDgt ?VTYP)
  • (arg2Isa ltSPREDgt ?CTYP)
  • (genls ltCYCTERMgt ?CTYP)
  • (genls ltVERBTERMgt ?VTYP) )

24
Verb Semantic Filtering -2Templates of Movement
  • (verbSemTrans Move-TheWord 0 IntransitiveVerbFram
    e        (and            (isa ACTION
    MovementEvent)            (primaryObjectMoving
    ACTION SUBJECT)))
  • (verbSemTrans Move-TheWord 1 IntransitiveVerbFram
    e        (and            (isa ACTION
    ChangeOfResidence)            (performedBy
    ACTION SUBJECT)))
  • (verbSemTrans Move-TheWord 2 TransitiveNPFrame
           (and            (isa ACTION
    CausingAnotherObjectsTranslationalMotion)
               (objectActedOn ACTION OBJECT)
               (doneBy ACTION SUBJECT)))
  • (arg1Isa performedBy Action)
  • (arg2Isa performedBy Agent-Generic)

25
Verb Semantic Filtering - 3
  • BURC can use Cycs knowledge of what things can
    perform what actions or have what attributes to
    filter out implausible relationships.
  • (behaviorCapableOf Finger CausingAnotherObje
    ctsTranslationalMotion doneBy)
  • (behaviorCapableOf Finger ChangeOfResidence
    performedBy)
  • (behaviorCapableOf Finger Inspecting
    performedBy)
  • (behaviorCapableOf Finger Movement-Translati
    onEvent primaryObjectMoving)
  • (behaviorCapableOf Finger MovementEvent
    primaryObjectMoving)
  • (behaviorCapableOf Finger PushingAnObject
    providerOfMotiveForce)
  • (behaviorCapableOf Finger Sliding-Generic
    objectMoving)
  • (behaviorCapableOf Finger Sliding-Generic
    primaryObjectMoving)
  • (behaviorCapableOf Finger Slipping
    objectMoving)
  • (behaviorCapableOf Finger Slipping
    primaryObjectMoving)
  • Cyc can help in its own knowledge entry process.
    62 of generated hypothesis were filtered out
    using semantic role filtering.

26
Other Direct Extraction Rules
  • Some underspecified patterns exist just based
    on the links
  • This could be used to extract ConceptNet like
    output directly from link records
  • Examples
  • ltobj1gtssltactgt.vosltobj2gt ? capableOf(ltobj1gt,
    ltactgt ltobj2gt)
  • ltactgt.v osltobjgt ? CapableOfReveivingAction(ltob
    jgt,ltactgt)
  • ltobjgtsltactgt.v ? capableOf(ltobjgt,ltactgt)

27
Quest for Metrics
  • Percentage of hypothesis that make sense to a
    panel of judges
  • Percentages of hypothesis that are already known
    to Cyc
  • Percentage of hypothesis that are known in other
    knowledge sources (WordNet, Sumo/Milo, VerbOcean,
    MIT OpenMind)
  • Number of hypothesis generated vs. number of
    records
  • What percentage of relations in Cyc can be found
    in the fragment pool
  • The Pattern Precision measure
  • Maybe compare against KNEXT but need to see if
    they return real numbers
  • Unfortunately we dont know all possible
    knowledge (otherwise we wouldnt be doing this),
    because if we did we could measure recall and
    precision.
  • Simple space estimate (2.3K binary predicates
    85K constants 85K constants 16.617500 T
    simple possibilities)

28
Desired Outputs
  • Version of link grammar for bulk reading and
    generating fragments
  • Database control program to queue texts, monitor
    their processing, and merge the fragment results
  • The database of fragments with fragment counts
    for some corpus
  • The hypothesis set generated by the system
  • Optionally an OpenMind / ConceptNet like set of
    commonsense factoids
  • Open enough that others could duplicate

29
Did any of that make sense?
  • Comments?
  • Questions?
  • Suggestions?
Write a Comment
User Comments (0)
About PowerShow.com