Title: Semantic Web In Industry
1Semantic Web In Industry
2Two Levels of the Semantic Web
- Deep Semantic Web
- Intelligent agents performing inference
- Semantic Web as distributed AI
- Small problem the AI problem is not yet solved
- Shallow Semantic Web using SW/Knowledge
Representation techniques for - Data integration
- Search
- Is starting to see traction in industry
3Integration The new buzzword in bussiness
- Huge explosion in the number of new databases,
applications, documents, in the 90s - Lots of redundancy, duplication gt high
inefficiency - Economic pressures forcing consolidation and
efforts to reduce inefficiency - Two aspects to integration Process Data
- Process integration depends on data integration
4Data Integration for Science
- Many experimental fields will generate more data
in the next 2 years than exists today - Large part of research consists of writing
programs to analyze data, e.g., NASA - Tools to normalize, share, integrate data stuck
in the 80s (ftp, perl, ) - Semantic Web could create a web of data that
changes all this. - Example of the Internet Observatory
5Varieties of Data Integration Data Transformation
- Data Transformation Example
- Contact Information in SAP, Siebel, PeopleSoft,
- We want to reflect updates in one data source
into another
XSLT, etc.
App. Server
Clarify
Siebel
PeopleSoft
6Varieties of Data Integration Data Aggregation
- Data Aggregation Example
- Clinical trial data at Stanford, UCSF, Mayo
- We want to give a Meta-analyst a uniform view of
data from these different clinical trials - Example of how this would have helped recent meta
studies such as the estrogen study
Relational Views
DBMS
Meta-Analyst
UCSF
Stanford
Mayo
7Data Integration Layers
- Coping with software from different vendors
- Oracle vs. DB2 vs. SQL Server this is a solved
problem - Coping with different formats
- Relational vs. XML vs. ISAM this too is a solved
problem - Coping with different schemas
- Solved for the small case where one person
understands all the schemas - No products for the case where it is truly
distributed - We know how to do it in theory, but lots of
practical problems - Coping with data from unknown sources
- Wide open lots of unsolved problems
8Typical Data Integration Methodology
- Use a common namespace of terms for the concepts
in the domain of the data sources being
integrated, e.g., Employee, Customer, Patient,
weight, height, bodyTemperature, - Mappings relate data items in data sources to
terms in namespace - Transformation algorithms map queries in terms of
common namespace into corresponding queries in
terms of data source vocabularies - Background knowledge about terms essential for
transformations e.g., Employee subClassOf
Person, 2 people with the same last name, first
name and street address are likely to be the
same, I.e., common namespace is really an
Ontology - Mappings and common namespace are the workhorse
9Role of Semantic Web in Data Integration
- The XML stack (XML, XSD, XPath, XQuery, ) does
not have the concepts (objects, classes,
properties, ) required for representing
ontologies - RDF/S does
- Neither of the them have a language for
expressing mappings - But RDF/S, being closer to logic, has more of the
machinery that is required
10Kinds of Mappings
- Simple structural
- DB1.patient.weight corresponds to Patients
weight - Conditional structural
- If DB1.patient.type equals Outpatient then
DB1.patient.foo corresponds to Patients visits
duration - Term mappings
- CA in DB1 corresponds to California in domain
namespace - Object with ssn 7687667 in database 1 corresponds
to object with id aksdks in database 2
11Challenges and non-challenges in data integration
- Non-challenge algorithms for doing the
transformations (ISI, MCC, SU ATT) - Engineering Challenges
- Creating large, useful ontologies that are shared
by many - Creating mappings
- Research Challenges
- Semantic Drift
- Fuzzy terms, probabilistic mappings
- Trust
12Engineering Challenges
- Creating large, detailed ontologies is complex
and expensive - But it is happening CrossWorlds for business
concepts, MAGE, etc. for medicine - Danger some of them might turn out to be
proprietary - Creating mappings is tedious and time consuming
- Object mappings pose special challenges
- Mappings need to be dynamic and constantly updated
13Research Challenges with mappings
- Semantic Drift
- The meaning of terms as interpreted by different
members of a community, over time could drift - Cyc experience shows that Description Logic
mechanisms are not adequate for either detecting
or fixing these - Fuzzy mappings
- E.g., walmarts concept of chair is similar to
but not the same as MOMAs concept of chair - Probabilistic mappings
- There is a 82 likelihood that Michael Jordan in
database 1 is the same as Michael Jordan in
database 2
14Other data web related challenges
- Trust How should the program know whether to
trust some new data source? - Without this, we will only have closed systems
- Options centralized approaches like UDDI or
decentralized approaches like WOTs - Inverse trust how can I trust you not to
indiscriminately distribute my data? A big issue
in fresh scientific data - Systems challenges
- Caching
- Preventing accidental DOS attacks
15Forecast for SW and Data Integration
- We already have a number of data integration
tools on the market - We are seeing the first generation of ontology
based data integration tools from small companies - At least some of the big players will probably
have some offerings for doing data integration
based on Semantic Web concepts in the near future - Whether they use Semantic Web formats and
acronyms is an open question - These common vocabularies will exhibit very
strong network effects
16Semantic Web for Search Going beyond search as
Location Bar
- Keywords ? a particular page
- Typically a home page or well known hub page
- United airlines ? www.united.com
- Unix ? gnu.org, linux.org, freebsd.org
- Search as a smarter location bar
- Page rank is ideally suited for this
- This is largely a solved problem
17Varieties of Search Research searches
- User is searching for info about something
- Could be directed user is looking for a
particular property - Price of something, location of some event,
- Or undirected user is looking for some general
class of properties - Reviews/feedback on product, info on person or
country - If there is no hub page on the thing, existing
search engines perform very poorly - New focus is on this class of searches
18Semantic Web for Search
- Keyword based approaches havent made significant
advances since PageRank - Improvements may be gained by adding a modicum
of understanding about the object denoted by
the search query - Improvements not just in search itself but also
in the relevance of search related advertising
19Basic Issues
- Need database of potential objects user may be
referring to, along with some properties of the
object e.g., its type - Too many objects to manually construct DB
- At least 300 million distinct object references
on Web - If it does know something more about the search
terms denotation, (e.g., it denotes a musician),
how can the search engine do better?
20Building the Web KB
- Many different automated approaches
- Simple natural language processing (Riloff, TAP,
) - Scrappers
- Machine Learning
- Most commercial efforts lead to proprietary KBs
- Huge opportunity for wider SW community
- Collaborate to actually create the KB
21Using the KB
- Word Sense Disambiguation., e.g., MSN Search,
Teoma - Incorporating data feeds into search results.
E.g., MSN with popular musicians - Incorporating object type specific actions. E.g.,
Google with addresses and stock symbols - Coming soon KB construction driven by ads
22Conclusions