Title: RGMA : a new API
1R-GMA a new API
Andy Cooke / Heriot-Watt University ltceeawc_at_macs.
hw.ac.ukgt
2A new improved API
- Why did we need to improve R-GMAs API?
- the
old API had flaws - Local buffers so cant guarantee not to lose
tuples. - Static queries execute() didnt make sense.
- Databases couldnt be cleaned.
- My work since October
- started implementing some of the new API
- Lots of servlet refactoring (at last!)
- Two new producers LatestProducer and
ResilientStreamProducer.
3(No Transcript)
4API Features APIBase
- disconnect/ reconnect()
- For APIs that are used infrequently their
machines can now be switched off! - setAutoInsertTimeStampEnabled()
- Every tuple must have a timestamp but what does
it mean the time the tuple was inserted or the
time the measurement was made? - setTerminationInterval() and showSignOfLife()
- The API must send heart-beats to its servlet in
order to stay alive, or to stay registered (GRRP
protocols). - setTupleChecking()
- In case we dont trust that the tuples are
correct!
5API Features Declarable, Insertable, Cleanable
- Declarable declare/ undeclareTable()
- Declarables are publishers that register their
views. - Note, its possible (soon!) for declarables to
introduce new tables to the schema. - Insertable insert()
- Insertables are stream publishers that insert
streams of tuples. - These tuples take the form INSERT INTO cpuLoad
VALUES - Now a vector of tuples may be inserted at a go
if the method returns, then R-GMAs servlet
received them safely! - Cleanable declareTable(, cleanUpPedicate,
cleanUpInterval) - Cleanables are stream publishers that are
connected to servlets that store tuples locally
using a database DatabaseProducers (a history
producer) and LatestProducers. - The servlet starts a thread that cleans the
database periodically according to the policy.
6API Features Archivers and Consumers
- Consumer
- Three query types are supported history,
continuous and latest snapshot. - Answers are returned as a stream.
- Archiver
- An Archiver is a republisher that poses a
continuous query, and publishes the answer. - They can now be constructed with an Insertable
- StreamProducer for answering stream
queries. - DatabaseProducer for answering history
queries - LatestProducer for answering latest
queries
7New Producer LatestProducer
- - a stream producer that supports latest
snapshot queries - offers up-to-date values for each primary key
(previously, R-GMA tables had no primary keys). - Implementation
- When declareTable() is called, the servlet
creates a new mysql database containing that
table. - When a tuple is received, the servlet first tries
to update the table. If this fails, the tuple is
inserted. - Snapshot queries are simply passed on to the
database. - Pros Cons
- Possibility of locally being able to process
join queries. - can handle huge numbers of tuples (servlets
probably couldnt). - - The tuples arent available to continuous or
history queries!
8New Producer ResilientStreamProducer
- - a StreamProducer that can answer continuous
queries, and is resilient to crashes. - The servlet keeps a log of changes made to a
producers state (by serializing a Command
object). - Periodically (when?!) snapshots are taken, and
the InstanceTracker (a hash map of producers kept
by the servlet) is serialized and stored on disc. - Recovering from Failure
- When the servlet restarts, the last snapshot of
the InstanceTracker is retrieved, and state
changes re-applied. - Then the registry is consulted, and any producers
that had timed-out are re-registered.
9New Producer lossless Streaming (to do)
A protocol for lossless streaming between
servlets. (1) producer servlet crashes
(2) servlet recovers (2) producers re-registers
(maybe) (3) now tuples are never discarded (4)
Consumer told of producer (how?) (4) Consumer
requests streaming (5) producer recognises
consumer, so sends all tuples since last tuple
sent
10Summary then what next?
- Lots and lots and lots of coding!
- Still have to finish Consumer API
- Still have to finish C, C APIs
- Not quite finished the ResilientStreamProducer
(my job) - Tests are failing just now, and hard to write for
API. - Demo for EU review very soon
- But nervousness about whether R-GMA is robust
enough and fast enough!! (900 tuples in 30
sec. required) - talk of focus on robustness, not new
functionality! - But what next for us? (new functionality)
- Joins? How?
- Query optimisation using views?
- ( which is what were funded to research).