Artificial Intelligence Paula Matuszek

About This Presentation
Title:

Artificial Intelligence Paula Matuszek

Description:

Artificial Intelligence Paula Matuszek – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Artificial Intelligence Paula Matuszek


1
Artificial IntelligencePaula Matuszek
2
What is Artificial Intelligence
  • Definitions
  • The science and engineering of making intelligent
    machines, especially intelligent computer
    programs. It is related to the similar task of
    using computers to understand human intelligence,
    but AI does not have to confine itself to methods
    that are biologically observable. (McCarthy,
    2002)
  • The exciting new effort to make computers think
    ... machines with minds, in the full and literal
    sense (Haugeland, 1985)
  • The automation of activities that we associate
    with human thinking, activities such as
    decision-making, problem solving, learning ...
    (Bellman, 1978)
  • Strong AI and Weak AI
  • Turing Test

3
What Methods Does AI Use?
  • AI can also be defined in terms of what kinds of
    methods it uses
  • Search
  • Knowledge Representation
  • Inference
  • Logic
  • Pattern recognition
  • Machine Learning

4
Typical AI Domains
  • Games
  • Natural Language Processing
  • Planning
  • Perception
  • Robotics
  • Expert Systems
  • Intelligent Agents

5
So when WILL we decide that computers are
intelligent?
6
How Do We Know When We're There?
  • Some requirements I think any test we use must
    meet
  • Whatever test we use must not exclude the
    majority of adult humans. I can't play chess at
    a grand master level!
  • Whatever test we use must produce an observable
    result. "Isn't intelligent because it doesn't
    have a mind" is perhaps a topic for interesting
    philosophical debate, but it's not of any
    practical help.

7
What can AI systems do?
  • Here are some example applications
  • Computer vision face recognition from a large
    set
  • Robotics autonomous (mostly) car
  • Natural language processing simple machine
    translation
  • Expert systems medical diagnosis in narrow
    domain
  • Spoken language systems 1000 word continuous
    speech
  • Planning and scheduling Hubble Telescope
    experiments
  • Learning text categorization into 1000 topics
  • User modeling Bayesian reasoning in Windows help
  • Games Grand Master level in chess (world
    champion), checkers, etc.

8
What cant AI systems do yet?
  • Understand natural language robustly (e.g., read
    and understand articles in a newspaper)
  • Surf the web
  • Interpret an arbitrary visual scene
  • Learn a natural language
  • Play Go well
  • Construct plans in dynamic real-time domains
  • Refocus attention in complex environments
  • Perform life-long learning

9
AI Uses in Information Science
  • Retrieval
  • Ontologies
  • Intelligent Agents
  • Text Mining

10
Challenges and Possibilities
  • Information overload. Theres too much. We
    would like
  • Better retrieval
  • Help with handling documents we have
  • Help finding specific pieces of information
    without having to read documents
  • What might help?
  • Statistical techniques
  • Natural language processing techniques
  • Knowledge domain based techniques

11
Retrieval
  • Find correct documents, with high precision and
    high recall.
  • AI used extensively for
  • Determining relevance heuristic rules capture
    human intuition about importance. Improves
    precision
  • Using domain models using domain
    models/ontologies with synonyms and classes
    improves recall.

12
Retrieval Some Current Directions
  • Intelligent spiders
  • Can't cover all of the web it's too big!
  • Determine relevance as documents are retrieved
    spider only those with high relevance
  • Goal is to improve precision AND recall
  • Intelligent disambiguation
  • When you search for "bank" do you mean the
    financial institution or the side of a river?
  • Use ontologies to find multiple meanings
  • Scan for related words to choose meaning
  • Semantic web
  • Add meta-information as you create web pages.
    Intelligent data instead of intelligent tools.

13
Ontologies
  • Definition An ontology is a formal description
    or specification of the concepts and
    relationships in a domain.
  • Synonyms, hierarchy of terms, richer relations.
  • Example cat
  • Synonyms pussy, feline, kitty
  • Is a mammal, pet
  • Subclass Persian, Siamese, tabby
  • Has characteristics carnivorous, purrs

14
Ontology Another Example
  • Example Panadol
  • Broader term chemical drug substance
  • Narrower term acetaminophen tablet
  • synonyms Tylenol, acetaminophen, paracetamol
  • Preferred term paracetamol
  • Trademarked in country UK, US, EU.
  • Company-holding-trademark SmithKline
  • Ingredient-in Contac
  • USAN acetaminophen
  • BAN paracetamol
  • Therapeutic class analgesic agent, antipyretic
    agent

15
Intelligent Agents
  • Definition a software program which autonomously
    gathers information or performs some task for a
    user.
  • Communicative
  • Capable
  • Autonomous
  • Adaptive

16
Some Current Intelligent Agent Tasks
  • Screen out junk mail
  • Understand what makes mail junk Hand-built
    rules or machine learning
  • Shopbots Find the best price for X
  • Know about and access shopping sites
  • Know about and understand costing Price for
    items, discounts, shipping fees
  • News and mail alerts
  • Understand what I am interested in
  • Watch relevant sources to find those things and
    bring them to my attention
  • Recommender systems
  • What movies or books might I be interested in?
  • Collaborative systems, faceted or
    characteristic-based systems.

17
Intelligent Agents The Vision
  • Lucy calls her brother Pete "Mom needs to see a
    specialist and then has to have a series of
    physical therapy sessions. I'm going to have my
    agent set up the appointments." Pete agrees to
    share driving.
  • At the MD office, Lucy instructs her agent
    through her handheld browser. The agent
  • retrieves information about Mom's prescribed
    treatment from the doctor's agent
  • looks up several lists of providers
  • checks for the ones in-plan for Mom's insurance
    within a 20-mile radius of her home and with a
    rating of excellent or very good on trusted
    rating services
  • finds a match between available appointment times
    (supplied by the agents of individual providers
    through their Web sites) and Pete's and Lucy's
    busy schedules.
  • The agent presents a plan. Pete doesn't like it
    too much driving, and at rush hour, and has his
    agent redo the search with stricter preferences
    about location and time. Lucy's agent, having
    complete trust in Pete's agent in the context of
    the present task, supplies the data it has
    already sorted through.
  • A new plan is presented a closer clinic and
    earlier timeswith warning notes.
  • Pete will have to reschedule a couple of his less
    important appointments.
  • The insurance company's list does not include
    this provider under physical therapists "Service
    type and insurance plan status securely verified
    by other means. (Details?)"
  • Lucy and Pete agree and the agent makes the
    appointments.
  • Pete asks his agent to explain how it had found
    that provider even though it wasn't on the proper
    list.
  • Example taken from Scientific American article on
    the Semantic Web, May, 2001. http//www.scientific
    american.com/article.cfm?articleID00048144-10D2-1
    C70-84A9809EC588EF21catID2

18
Text Mining
  • Common theme information exists, but in
    unstructured text.
  • Text mining is the general term for a set of
    techniques for analyzing unstructured text in
    order to process it better
  • Document-based
  • Content-based

19
Document-Based
  • Techniques which are concerned with documents as
    a whole, rather than details of the contents
  • Document retrieval find documents
  • Document categorization sort documents into
    known groups
  • Document classification cluster documents into
    similar classes which are not predefined
  • Visualization visually display relationships
    among documents

20
Document Categorization
  • Document categorization
  • Assign documents to pre-defined categories
  • Examples
  • Process email into work, personal, junk
  • Process documents from a newsgroup into
    interesting, not interesting, spam and
    flames
  • Process transcripts of bugged phone calls into
    relevant and irrelevant
  • Issues
  • Real-time?
  • How many categories/document? Flat or
    hierarchical?
  • Categories defined automatically or by hand?

21
Categorization -- Automatic
  • Statistical approaches similar to search engine
  • Set of training documents define categories
  • Underlying representation of document is bag of
    words (BOW) looking at frequencies, not at
    order
  • Category description is created using neural
    nets, regression trees, other Machine Learning
    techniques
  • Individual documents categorized by net, inferred
    rules
  • Requires relatively little effort to create
    categories
  • Accuracy is heavily dependent on "good" training
    examples
  • Typically limited to flat, mutually exclusive
    categories

22
Categorization Manual
  • Natural Language/linguistic techniques
  • Categories are defined by people
  • underlying representation of document is stream
    of tokens
  • category description contains
  • ontology of terms and relations
  • pattern-matching rules
  • individual documents categorized by
    pattern-matching
  • Defining categories can be very time-consuming
  • Typically takes some experimentation to "get it
    right"
  • Can handle much more complex structures

23
Document Classification
  • Document classification
  • Cluster documents based on similarity
  • Examples
  • Group samples of writing in an attempt to
    determine author(s)
  • Look for hot spots in customer feedback
  • Find new trends in a document collection
    (outliers, hard to classify)
  • Getting into areas where we dont know ahead of
    time what we will have true mining

24
Document Classification -- How
  • Typical process is
  • Describe each document
  • Assess similarities among documents
  • Establish classification scheme which creates
    optimal "separation"
  • One typical approach
  • document is represented as term vector
  • cosine similarity for measuring association
  • bottom-up pairwise combining of documents to get
    clusters
  • Assumes you have the corpus in hand

25
Document Clustering
  • Approaches vary a great deal in
  • document characteristics used to describe
    document (linguistic or semantic? bow?
  • methods used to define "similar"
  • methods used to create clusters
  • Other relevant factors
  • Number of clusters to extract is variable
  • Often combined with visualization tools based on
    similarity and/or clusters
  • Sometimes important that approach be incremental
  • Useful approach when you don't have a handle on
    the domain or it's changing

26
Document Visualization
  • Visualization
  • Visually display relationships among documents
  • Examples
  • hyperbolic viewer based on document similarity
    browse a field of scientific documents
  • map based techniques showing peaks, valleys,
    outliers
  • Faceted search results showing document counts
    for different categorizations, with browsing
  • Highly interactive, intended to aid a human in
    finding interrelationships and new knowledge in
    the document set.

27
Content-Based Text Mining
  • Methods which focus in a specific document rather
    than a corpus of documents
  • Document Summarization summarize document
  • Feature Extraction find specific features
  • Information Extraction find detailed
    information
  • Often not interested in document itself

28
Document Summarization
  • Document Summarization
  • Provide meaningful summary for each document
  • Examples
  • Search tool returns context
  • Monthly progress reports from multiple projects
  • Summaries of news articles on the human genome
  • Often part of a document retrieval system, to
    enable user judge documents better
  • Surprisingly hard to make sophisticated

29
Document Summarization -- How
  • Two general approaches
  • Extract representative sentences/clauses
    extractive
  • Capture document in generic representation and
    generate summary abstractive
  • Extractive
  • If in response to search, keywords. Easy,
    effective
  • Otherwise term frequency, position, etc
  • Broadly applicable, gets "general feel. Current
    state of art.
  • Abstractive
  • Create "template" or "frame"
  • NL processing to fill in frame
  • Generation based on template
  • Good if well-defined domain, clearcut
    information needs. Hard.

30
Feature Extraction
  • Group individual terms into more complex entities
    (which then become tokens)
  • Examples
  • Dates, times, names, places
  • URLs, HREFs and IMG tags
  • Relationships like X is president of Y
  • Can involve quite high-level features language
  • Enables more sophisticated queries
  • Show me all the people mentioned in the news
    today
  • Show me every mention of New York
  • Also refers to extracting aspects of document
    which somehow characterize it length, vocab,
    etc

31
Information Extraction
  • Retrieve some specific information which is
    located somewhere in this set of documents.
  • Dont want the document itself, just the info.
  • Information may occur multiple times in many
    documents, but we just need to find it once
  • Often what is really wanted from a web search.
  • Tools not typically designed to be interactive
    not fast enough for interactive processing of a
    large number of documents
  • Often first step in creating a more structured
    representation of the information

32
Some Examples of Information Extraction
  • Financial Information
  • Who is the CEO/CTO of a company?
  • What were the dividend payments for stocks Im
    interested in for the last five years?
  • Biological Information
  • Are there known inhibitors of enzymes in a
    pathway?
  • Are there chromosomally located point mutations
    that result in a described phenotype?
  • Other typical questions
  • who is familiar with or working on a domain?
  • what patent information is available?

33
Information Extraction -- How
  • Create a model of information to be extracted
  • Create knowledge base of rules for extraction
  • concepts
  • relations among concepts
  • Find information
  • Word-matching template. "Open door".
  • Shallow parsing simple syntax. "Open door with
    key"
  • Deep Parsing produce parse tree from document
  • Process information (into database, for instance)
  • Involves some level of domain modeling and
    natural language processing

34
Why Text Is Hard
  • Natural language processing is AI-Complete.
  • Abstract concepts are difficult to represent
  • LOTS of possible relationships among concepts
  • Many ways to represent similar concepts
  • Tens or hundreds or thousands of
    features/dimensions
  • http//www.sims.berkeley.edu/hearst/talks/dm-talk
    /

35
Text is Hard
  • I saw Pathfinder on Mars with a telescope.
  • Pathfinder photographed Mars.
  • The Pathfinder photograph mars our perception of
    a lifeless planet.
  • The Pathfinder photograph from Ford has arrived.
  • The Pathfinder forded the river without marring
    its paint job.

36
Why Text is Easy
  • Highly redundant when you have a lot of it
  • Many relatively crude methods provide fairly good
    results
  • Pull out important phrases
  • Find meaningfully related words
  • Create summary from document
  • grep
  • Evaluating results is not easy need to know the
    question!
Write a Comment
User Comments (0)