Title: The Niagara Project
1The Niagara Project
- I have avoided networking like the plague. I am
terrified of getting a connection because its
like drinking from Niagara Falls. - - Arthur C. Clarke
2Who is working on Niagara?
- Professors DeWitt and Naughton _at_ UW, Maier _at_ OGI
- Students Lots of them!
- See http//www.cs.wisc.edu/niagara
3Goals of the Niagara Project
- In broadest terms, to
- improve the precision of Internet searching
- allow queries over the whole Internet (the FROM
clause) - work over streams as well as static files
- monitor the Internet for changes
- Not finished yet...
4Current status
- Completed three java prototypes
- A text-in-context XML search engine.
- An XML-QL query engine.
- An XML-QL trigger engine.
- Doing the same thing (again) in C, maybe with
Quilt as query language. - Finding (solving?) interesting research problems
along the way...
5Text-in-Context XML SE
- Rather than ask
- What are all the documents that contain the
string Montreal? - We can ask
- What are all the documents that contain ship
departure information for a ship whose name is
Montreal?
6How it works
- Locate documents by crawling the web or using
explicit input from user. - Build local index on these docs that supports
fast evaluation of Search Engine Query Language
(SEQL) queries. - Return URLs of documents that satisfy SEQL
queries. - Two uses stand alone, or part of XML-QL
7XML-QL Query Engine
- Evaluates queries expressed in XML-QL.
- Result is XML
- Different from Search Engine
Instead of asking Find all files with ship
departure events where the ships name
is Montreal? We can ask What is a list of
departure dates for ships named Montreal?
8Ex Fragment of XML file...
Electrical Engineering
Robertson
Pedro
6988086 Robertson.Pedr
o_at_foo.edu 660 lty
9XML-QL Query...
WHERE
"Electrical Engineering"
v2
v3
content_as v4 CONSTRUCT
v4
10Important Question
- Which documents should be consulted to answer an
XML-QL query? We support three approaches - explicitly listed documents (in foo.xml)
- documents conforming to DTD (conforms to
some_dtd.xml) - documents that satisfy search engine predicates
extracted from query
11Example of third approach
- Given the previous XML-QL query finding first and
last names of EE faculty members, the system will
extract this Search Engine query
department CONTAINS (deptname IS "Electrical
Engineering" AND faculty CONTAINS
name CONTAINS (lastname AND
firstname))
12Control Flow for Typical Query
- So full flow of typical XML-QL query
- user submits XML-QL query
- system extracts SEQL query from XML-QL, passes it
to search engine - search engine evaluates SEQL query, returns list
of URLs to XML-QL query engine - XML-QL engine fetches documents from URL list,
evaluates query - Answer returned to the user.
13XML-QL Trigger Engine
- Goal
- allow users to define triggers on XML files
using XML-QL predicates. - Scale to huge numbers of triggers by exploiting
commonality among sets of triggers.
14Some research topics...
- Semantics and impl. of queries over streams?
- Use RDBMS for anything at all?
- How smart should the search engine be?
- Can you use caching anywhere?
- Query optimization plan space, stats?
- How should you index (cached?) XML?
- What do you do with queryable sources?
- How do you handle huge s of triggers?
- Performance, performance, performance.
15A Petabyte in your Pocket
- David DeWitt,
- Dave Maier _at_ OGI,
- Jeff Naughton
16Title of NSF ITR Project
- What does it mean?
- Goal is to have, available from a PDA, your
evolving and customized view of all the on-line
digital data that exists anywhere. - Goal is not to develop holographic memory
technology or DNA-based storage units.
17What the PetDB is
- An example of what can be done with new software
infrastructure termed Net Data Managers (NDMs.) - NDMs
- focus on data movement as well as storage
- store and query data of arbitrary types without a
schema having been defined - execute queries and triggers over tens of
thousands of information sites
18Connection with Niagara...
- Niagara is a very early prototype of a simple
NDM. - Project goal
- continue developing Niagara and working on
research problems that arise - prototype a simple NDM application using Niagara
to see if we are on the right track
19For more information...
- Web site http//www.cs.wisc.edu/niagara
- Talk to me or any other Niagara project member...