Semantic Web In Industry - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Semantic Web In Industry

Description:

Semantic Web as distributed AI. Small problem ... the AI problem is not yet solved ... Is starting to see traction in industry. Integration: The new buzzword ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 23
Provided by: Guha8
Category:
Tags: ai | industry | semantic | web

less

Transcript and Presenter's Notes

Title: Semantic Web In Industry


1
Semantic Web In Industry
  • R. Guha

2
Two Levels of the Semantic Web
  • Deep Semantic Web
  • Intelligent agents performing inference
  • Semantic Web as distributed AI
  • Small problem the AI problem is not yet solved
  • Shallow Semantic Web using SW/Knowledge
    Representation techniques for
  • Data integration
  • Search
  • Is starting to see traction in industry

3
Integration The new buzzword in bussiness
  • Huge explosion in the number of new databases,
    applications, documents, in the 90s
  • Lots of redundancy, duplication gt high
    inefficiency
  • Economic pressures forcing consolidation and
    efforts to reduce inefficiency
  • Two aspects to integration Process Data
  • Process integration depends on data integration

4
Data Integration for Science
  • Many experimental fields will generate more data
    in the next 2 years than exists today
  • Large part of research consists of writing
    programs to analyze data, e.g., NASA
  • Tools to normalize, share, integrate data stuck
    in the 80s (ftp, perl, )
  • Semantic Web could create a web of data that
    changes all this.
  • Example of the Internet Observatory

5
Varieties of Data Integration Data Transformation
  • Data Transformation Example
  • Contact Information in SAP, Siebel, PeopleSoft,
  • We want to reflect updates in one data source
    into another

XSLT, etc.
App. Server
Clarify
Siebel
PeopleSoft
6
Varieties of Data Integration Data Aggregation
  • Data Aggregation Example
  • Clinical trial data at Stanford, UCSF, Mayo
  • We want to give a Meta-analyst a uniform view of
    data from these different clinical trials
  • Example of how this would have helped recent meta
    studies such as the estrogen study

Relational Views
DBMS
Meta-Analyst
UCSF
Stanford
Mayo
7
Data Integration Layers
  • Coping with software from different vendors
  • Oracle vs. DB2 vs. SQL Server this is a solved
    problem
  • Coping with different formats
  • Relational vs. XML vs. ISAM this too is a solved
    problem
  • Coping with different schemas
  • Solved for the small case where one person
    understands all the schemas
  • No products for the case where it is truly
    distributed
  • We know how to do it in theory, but lots of
    practical problems
  • Coping with data from unknown sources
  • Wide open lots of unsolved problems

8
Typical Data Integration Methodology
  • Use a common namespace of terms for the concepts
    in the domain of the data sources being
    integrated, e.g., Employee, Customer, Patient,
    weight, height, bodyTemperature,
  • Mappings relate data items in data sources to
    terms in namespace
  • Transformation algorithms map queries in terms of
    common namespace into corresponding queries in
    terms of data source vocabularies
  • Background knowledge about terms essential for
    transformations e.g., Employee subClassOf
    Person, 2 people with the same last name, first
    name and street address are likely to be the
    same, I.e., common namespace is really an
    Ontology
  • Mappings and common namespace are the workhorse

9
Role of Semantic Web in Data Integration
  • The XML stack (XML, XSD, XPath, XQuery, ) does
    not have the concepts (objects, classes,
    properties, ) required for representing
    ontologies
  • RDF/S does
  • Neither of the them have a language for
    expressing mappings
  • But RDF/S, being closer to logic, has more of the
    machinery that is required

10
Kinds of Mappings
  • Simple structural
  • DB1.patient.weight corresponds to Patients
    weight
  • Conditional structural
  • If DB1.patient.type equals Outpatient then
    DB1.patient.foo corresponds to Patients visits
    duration
  • Term mappings
  • CA in DB1 corresponds to California in domain
    namespace
  • Object with ssn 7687667 in database 1 corresponds
    to object with id aksdks in database 2

11
Challenges and non-challenges in data integration
  • Non-challenge algorithms for doing the
    transformations (ISI, MCC, SU ATT)
  • Engineering Challenges
  • Creating large, useful ontologies that are shared
    by many
  • Creating mappings
  • Research Challenges
  • Semantic Drift
  • Fuzzy terms, probabilistic mappings
  • Trust

12
Engineering Challenges
  • Creating large, detailed ontologies is complex
    and expensive
  • But it is happening CrossWorlds for business
    concepts, MAGE, etc. for medicine
  • Danger some of them might turn out to be
    proprietary
  • Creating mappings is tedious and time consuming
  • Object mappings pose special challenges
  • Mappings need to be dynamic and constantly updated

13
Research Challenges with mappings
  • Semantic Drift
  • The meaning of terms as interpreted by different
    members of a community, over time could drift
  • Cyc experience shows that Description Logic
    mechanisms are not adequate for either detecting
    or fixing these
  • Fuzzy mappings
  • E.g., walmarts concept of chair is similar to
    but not the same as MOMAs concept of chair
  • Probabilistic mappings
  • There is a 82 likelihood that Michael Jordan in
    database 1 is the same as Michael Jordan in
    database 2

14
Other data web related challenges
  • Trust How should the program know whether to
    trust some new data source?
  • Without this, we will only have closed systems
  • Options centralized approaches like UDDI or
    decentralized approaches like WOTs
  • Inverse trust how can I trust you not to
    indiscriminately distribute my data? A big issue
    in fresh scientific data
  • Systems challenges
  • Caching
  • Preventing accidental DOS attacks

15
Forecast for SW and Data Integration
  • We already have a number of data integration
    tools on the market
  • We are seeing the first generation of ontology
    based data integration tools from small companies
  • At least some of the big players will probably
    have some offerings for doing data integration
    based on Semantic Web concepts in the near future
  • Whether they use Semantic Web formats and
    acronyms is an open question
  • These common vocabularies will exhibit very
    strong network effects

16
Semantic Web for Search Going beyond search as
Location Bar
  • Keywords ? a particular page
  • Typically a home page or well known hub page
  • United airlines ? www.united.com
  • Unix ? gnu.org, linux.org, freebsd.org
  • Search as a smarter location bar
  • Page rank is ideally suited for this
  • This is largely a solved problem

17
Varieties of Search Research searches
  • User is searching for info about something
  • Could be directed user is looking for a
    particular property
  • Price of something, location of some event,
  • Or undirected user is looking for some general
    class of properties
  • Reviews/feedback on product, info on person or
    country
  • If there is no hub page on the thing, existing
    search engines perform very poorly
  • New focus is on this class of searches

18
Semantic Web for Search
  • Keyword based approaches havent made significant
    advances since PageRank
  • Improvements may be gained by adding a modicum
    of understanding about the object denoted by
    the search query
  • Improvements not just in search itself but also
    in the relevance of search related advertising

19
Basic Issues
  • Need database of potential objects user may be
    referring to, along with some properties of the
    object e.g., its type
  • Too many objects to manually construct DB
  • At least 300 million distinct object references
    on Web
  • If it does know something more about the search
    terms denotation, (e.g., it denotes a musician),
    how can the search engine do better?

20
Building the Web KB
  • Many different automated approaches
  • Simple natural language processing (Riloff, TAP,
    )
  • Scrappers
  • Machine Learning
  • Most commercial efforts lead to proprietary KBs
  • Huge opportunity for wider SW community
  • Collaborate to actually create the KB

21
Using the KB
  • Word Sense Disambiguation., e.g., MSN Search,
    Teoma
  • Incorporating data feeds into search results.
    E.g., MSN with popular musicians
  • Incorporating object type specific actions. E.g.,
    Google with addresses and stock symbols
  • Coming soon KB construction driven by ads

22
Conclusions
  • Please help Eric miller
Write a Comment
User Comments (0)
About PowerShow.com