Title: Having%20Your%20Cake%20and%20Eating%20It%20Too
1Having Your Cake and Eating It Too
- With Apache OODT and Apache Solr
Andrew F. Hart Paul M. Ramirez
2About Myself
- Software Engineer
- NASA Jet Propulsion Laboratory
- Data Management
- Committer
- OODT, SIS, Gora, Streams (Incubating)
- Mentor Streams (Incubating)
3What Well Cover
- Overview of OODT Solr Projects
- Strategies for Combining OODT and Solr
- Detailed Deployment/Config. Example
- Where to Learn More Participate
4Apache OODT
- Object Oriented Data Technology
- Origin in NASA mission data systems
- Components for
- Information integration
- Data cataloging and archiving
- Configurable workflow processing
5Apache OODT
- OODT _at_ Apache
- Incubation 2010, Graduation 2011
- 29 Committers
- Latest Release 0.5 (Dec. 26, 2012)
6Apache OODT
- Karoo Array Telescope (KAT-7)
7Apache OODT
- Virtual Pediatric Intensive Care Unit
8Apache OODT
- Regional Climate Model Evaluation System
9Apache OODT
- Commonalities between systems
- Lots of data
- Defined processing steps / algorithms
- Archives important ( search important)
10Apache OODT
- Strengths of OODT for the above use cases
- Loosely coupled components
- Standard protocols, well-defined interfaces
- Highly configurable
- Vetted, reliable code
11Apache Solr
- Search Web Services
- Powerful features
- Flexible formats
- Highly configurable
12Apache Solr
13Apache Solr
14Apache Solr
- NASA Planetary Data System
15OODT Solr
- Why use these projects together?
- Archives often need search capability
- Similarities / Compatibilities
- XML-based configuration
- Environment (Java, Tomcat)
16Example Integration
Standard Data Archive Pipeline
17Example Integration
Standard Data Archive Pipeline Search
18OODT Products
- Typically 1-1 with Files
- Each uniquely identifiable (GUID)
- Support for higher-level ProductType
- A way to define collections
19OODT Metadata
- Annotations for products
- KeyValMultival
- Common across all OODT components
- Two general classes
- System
- User
20OODT Metadata
- System Metadata
- Added automatically by OODT Components
- Used to track state
- Used to encode relationships between data
21OODT Metadata
- User Metadata
- Specified as policy
- Can be product-level, or productType-level
- Used to extract persist information from files
as they are ingested (become products)
22OODT Metadata
- Metadata (Policy) Example
- (external)
23Solr Schema
- XML document
- Define what will be indexed (Fields)
- Provide high-level context hints
- Data type, behavior, pre-processing
- Extremely flexible, extensible
24Solr Schema
- Solr Schema Example
- (external)
25Making the Connection
- SolrIndexer Tool
- Part of the File Manager component tools
- Map OODT Metadata to Solr Fields
- Create Solr documents from OODT products
- Note only talking about metadata
26SolrIndexer Tool
- Org.Apache.Oodt.Cas.Filemgr.Tools
- Available since 0.4 Release
- Recommend to use 0.5 as some stability
improvements were added - Several modes of operation
27SolrIndexer Tool
28SolrIndexerTool
- Invocation Examples Ingest all products from the
specified File Manager instance
java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --all \ --fmUrl
http//localhost9000 \ --solrUrl
http//localhost8080/solr
29SolrIndexerTool
- Invocation Examples Ingest all products from the
specified ProductType(s)
java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --types urnsomeProductType \
--fmUrl http//localhost9000 \ --solrUrl
http//localhost8080/solr
30SolrIndexerTool
- Invocation Examples Ingest a single product by
its unique product id
java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --product 19bcb4b8-7999-11e1-b581-8b
771498975d \ --delete \ --fmUrl
http//localhost9000 \ --solrUrl
http//localhost8080/solr
31SolrIndexerTool
- Invocation Examples Force optimization of the
Solr index
java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --optimize --solrUrl
http//localhost8080/solr
32Indexer.properties
- Configuration file for the SolrIndexer
- Specify mapping between OODT product metadata and
Solr fields - Additional pre-processing features
33Indexer.properties
- Example Indexer.properties file
- (external)
34Use Case I
- Building a searchable data archive
- Long-term / Lights-out archive
- Products metadata immutable
- Many NASA mission data systems use this model
- Want to make it easily searchable
35Use Case I
Standard Data Archive Pipeline Search
36Use Cases II
- Building an interactively editable, searchable
data archive - Data and metadata mutable
- Want to dynamically select product(s) to edit
based on metadata
37Use Case II
Interactively Editable Data Archive Pipeline
Search
38Use Case II
Interactively Editable Data Archive Pipeline
Search
Solr catalog out of sync!
39Synchronization
- Two ways (at least) to solve this
- Modify the OODT Curator Services
- Treat OODT Curator Services as black box and
write wrapper service to invoke Curator
Services AND update Solr (via scripted call to
SolrIndexer, for example)
40Modify Curator Services
- Services implemented in JAX-RS
- /curator/src/main/java/org/apache/oodt/cas/curatio
n/service - curator_url/services/metadata/update
- Options
- Utilize Solr Java API
- Wrap call to OODT SolrIndexer tool
41Use Case II-A
Modified Curator Services to Simultaneously
update Solr
42Example
- Interactive event tagging
43Wrap Curator Services
- Curator Service/API is black box
- Develop custom service that
- Issues POST request to Curator service
- Updates Solr index via, e.g.
- Utilize Solr Java API
- Wrap call to OODT SolrIndexer tool
44Use Case II-B
Wrapping OODT Curation Services with Custom UI
Services
45Example
46Lessons
- Solr compliments OODT File Manager
- RESTful interfaces (Solr OODT Curator) allow
for great flexibility in designing services and
UI - Best approach depends on situation
47Next Steps
- Develop SolrCatalog for OODT File Manager?
- Pros Reduction in moving parts
- Cons Restrictive?
- Implement Use Case II-A as optional mode for
Curator web service layer
48Learning More
- Solr
- http//lucene.apache.org/solr
- solr-user_at_lucene.apache.org
- OODT
- http//oodt.apache.org
- https//cwiki.apache.org/confluence/display/OODT/H
ome - oodt-user_at_apache.org
49Thanks!