Title: Information Integration the Web Way
1Information Integration the Web Way
- Andrew Schain, Kendall Clark
2Background
- It all started innocently enough, I was at a
conference 8 months ago and heard about a problem
Jeanne was having - It is an interesting problem.
3The Problem
- It is virtually impossible to discover critical
information that you did not know existed and
extremely difficult to find relevant information
you are aware of. - Our data problem exists within at least 5
dimensions size, complexity, diversity, rate of
growth and trust. - Use-case scenarios and requirements change all
the time. - We cannot anticipate in advance what the next
collection of information elements need to be or
for what purpose!!
4The challenge
- Integrate information from disjoint data sources,
ad hocly, to solve customer needs - Without upsetting delicate info-ecologies (data
owners, curators, extant policies procedures) - Without requiring major investment in time or
5The inspiration...
6The Goal
- Alleviate NASAs data management problem by
making information discoverable with machine
assistance, and retrieved as an integrated
response across different databases,
repositories, sources and systems.
7The design principles
8- Aggregate and federate information
- Deploy a service that makes the whole
infrastructure smarter - Leverage public standards
- Innovate in the user interface
- Formalize information models
9Info Federation
- Must leave data in situ, close to those who know
it best - Must not upset delicate info-niches by alienating
curators, owners, or violating policies politics - Use a sufficiently expressive federation
technology (in this case, W3Cs RDF)
10Deploy a service
- POPS is an expertise locator app, but...
- Also a service, deployed in the fabric of NASA
application infrastructure - Thus, the POPS data is reusable by other apps
- Lower barriers to ad hoc reuse
11Leverage public standards
- Why? Humility, laziness, tiny budget!
- Promotes reuse, cohesion with existing
technologies - Open Source software is our friend
- Return on Investment
- They work! )
12Some candidate public standards
- HTTP, SOAP, WSDL, SPARQL Protocol for RDF
- XML, RDF, RDFS, JSON, OWL
- SPARQL Query Language for RDF
- FOAF, DOAP
- Atom Syndication Format
13Innovative UI
- Be different? Look-feel-and-act different.
- JSpace, a Polyarchical Visual Query Builder for
Federated RDF Stores - Social Network visualizations
- What the hell is a polyarchy?
14JSpace
- A polyarchy is a means of interacting with
multiple intersecting hierarchies - Which is precisely what many information
integration problems are (people orgs
projects skills) - The backend is only half the problem
15Visual Query Builder?
- Folks can learn a QL, but why?
- Get the machine to build queries based on regular
and customary user input browsing - Browsing better than searching
- Add query-by-example (find another thing like
this thing except with this difference...) - Propinquity!!!
16Formalizing Information
- Use OWL (Web Ontology Language) to formalize the
problem domain(s) - Why?
- Correctness, create shared understanding,
regulatory compliance (DRM) - To prepare for the eventual semantic upshift
17Other stuff we probably need
- Model Libraries
- Data Access agreements
- Data assurance
- Model assurance
- Good go to application models
- Desire commitment
- Lots more
18Whats next for POPS?
- We have buy-in and project plan with the OCE. We
will validate our agreement (plan)for
implementation within the NEN and have it done
before I go on vacation in August - Continue working with Clark Parsia Kendall,
Bijan, Jen Golbeck, Mike Grove, Chris Shenton
others including folks on my SAIC team to build
a similar service at HQ for EA as-builts.
19What about the rest of us?
- Lets throw a party!
- For our comrades who are current practitioners
- Give them a blank piece of paper and write down
stuff that would make things easier and stuff
that makes things really hard - Invite some folks we want to make friends with
and work the list together
20Examples
21Models in Federated Libraries
- Domain specific references that can be used by
developers - Domain specific information representations
(complete with logic, cardinality, etc) that can
be used to form queryable information that cuts
across sources - Code repositories
- Web Services repositories so that task-orientated
computing services can be discovered, assessed,
choreographed, and orchestrated
22Data access agreements
- Between who and who (and who is keeping track?)
- Valid?
- Has it gone thru a validity checker y/n?
- Current?
- Is it fresh? (may not need to be) but we need to
know - Provenance?
- Who is the responsible person for the system and
for the data? - Access Permissions?
- Given the set of data required, does the access
permission change?
23But really to
- Articulate the goal
- Develop the planning, including gathering
requirements, prioritizing tasks, identifying
resources, and setting up a road map for the next
few years. - Some of you will be invited, if not to the first
one (April) to the lollapalooza InterOp
24Backups
25Mathematics of the who-knows-who relationship
visualization
Given a set of people, P and a set of
relationships, R, that connect people and
entities We define five types of relationships
1) same facility, 2) same department, 3) same
skill and department, 4) same skill and project,
5) same skill, project, and facility. Call these
r1 - r5. rixy indicates a relationship of type i
between person x (px) and person y (py) There is
a direct connection between users pu and ps if
there exists an rmus If there is not a direct
connection, we search for a path from pu to ps by
finding pa such that there exists rmua, rnas.
Then, we add (pu, ps, pa, rmua, rnas) to the
graph. For example, if Alice is the user and Bob
is the selected person, we will look for a direct
relationship between them, such as if Alice and
Bob both work in the same department (i.e. find
rmalice,bob). If the direct relationship does
not exist, we look at all the people Alice has
relationships with, and check to see if any of
them also have relationships with Bob. For
example, Alice may work in the same facility as
Chuck (r1alice,chuck). Chuck, in turn, may have
the same skill and work on the same project as
Bob (r4Chuck,Bob). Chuck then becomes a
connection between Alice and Bob. All three
people and their relationships are added to the
graph.
26(No Transcript)
27(No Transcript)
28(No Transcript)