Having%20Your%20Cake%20and%20Eating%20It%20Too - PowerPoint PPT Presentation

About This Presentation
Title:

Having%20Your%20Cake%20and%20Eating%20It%20Too

Description:

Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 50
Provided by: ahar52
Category:

less

Transcript and Presenter's Notes

Title: Having%20Your%20Cake%20and%20Eating%20It%20Too


1
Having Your Cake and Eating It Too
  • With Apache OODT and Apache Solr

Andrew F. Hart Paul M. Ramirez
2
About Myself
  • Software Engineer
  • NASA Jet Propulsion Laboratory
  • Data Management
  • Committer
  • OODT, SIS, Gora, Streams (Incubating)
  • Mentor Streams (Incubating)

3
What Well Cover
  • Overview of OODT Solr Projects
  • Strategies for Combining OODT and Solr
  • Detailed Deployment/Config. Example
  • Where to Learn More Participate

4
Apache OODT
  • Object Oriented Data Technology
  • Origin in NASA mission data systems
  • Components for
  • Information integration
  • Data cataloging and archiving
  • Configurable workflow processing

5
Apache OODT
  • OODT _at_ Apache
  • Incubation 2010, Graduation 2011
  • 29 Committers
  • Latest Release 0.5 (Dec. 26, 2012)

6
Apache OODT
  • Karoo Array Telescope (KAT-7)

7
Apache OODT
  • Virtual Pediatric Intensive Care Unit

8
Apache OODT
  • Regional Climate Model Evaluation System

9
Apache OODT
  • Commonalities between systems
  • Lots of data
  • Defined processing steps / algorithms
  • Archives important ( search important)

10
Apache OODT
  • Strengths of OODT for the above use cases
  • Loosely coupled components
  • Standard protocols, well-defined interfaces
  • Highly configurable
  • Vetted, reliable code

11
Apache Solr
  • Search Web Services
  • Powerful features
  • Flexible formats
  • Highly configurable

12
Apache Solr
  • The White House

13
Apache Solr
  • Netflix

14
Apache Solr
  • NASA Planetary Data System

15
OODT Solr
  • Why use these projects together?
  • Archives often need search capability
  • Similarities / Compatibilities
  • XML-based configuration
  • Environment (Java, Tomcat)

16
Example Integration
Standard Data Archive Pipeline
17
Example Integration
Standard Data Archive Pipeline Search
18
OODT Products
  • Typically 1-1 with Files
  • Each uniquely identifiable (GUID)
  • Support for higher-level ProductType
  • A way to define collections

19
OODT Metadata
  • Annotations for products
  • KeyValMultival
  • Common across all OODT components
  • Two general classes
  • System
  • User

20
OODT Metadata
  • System Metadata
  • Added automatically by OODT Components
  • Used to track state
  • Used to encode relationships between data

21
OODT Metadata
  • User Metadata
  • Specified as policy
  • Can be product-level, or productType-level
  • Used to extract persist information from files
    as they are ingested (become products)

22
OODT Metadata
  • Metadata (Policy) Example
  • (external)

23
Solr Schema
  • XML document
  • Define what will be indexed (Fields)
  • Provide high-level context hints
  • Data type, behavior, pre-processing
  • Extremely flexible, extensible

24
Solr Schema
  • Solr Schema Example
  • (external)

25
Making the Connection
  • SolrIndexer Tool
  • Part of the File Manager component tools
  • Map OODT Metadata to Solr Fields
  • Create Solr documents from OODT products
  • Note only talking about metadata

26
SolrIndexer Tool
  • Org.Apache.Oodt.Cas.Filemgr.Tools
  • Available since 0.4 Release
  • Recommend to use 0.5 as some stability
    improvements were added
  • Several modes of operation

27
SolrIndexer Tool
28
SolrIndexerTool
  • Invocation Examples Ingest all products from the
    specified File Manager instance

java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --all \ --fmUrl
http//localhost9000 \ --solrUrl
http//localhost8080/solr
29
SolrIndexerTool
  • Invocation Examples Ingest all products from the
    specified ProductType(s)

java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --types urnsomeProductType \
--fmUrl http//localhost9000 \ --solrUrl
http//localhost8080/solr
30
SolrIndexerTool
  • Invocation Examples Ingest a single product by
    its unique product id

java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --product 19bcb4b8-7999-11e1-b581-8b
771498975d \ --delete \ --fmUrl
http//localhost9000 \ --solrUrl
http//localhost8080/solr
31
SolrIndexerTool
  • Invocation Examples Force optimization of the
    Solr index

java -DSOLR_INDEXER_CONFIG/path/to/indexer.proper
ties \ -Djava.ext.dirs/path/to/cas/filemgr/l
ib/ \ org.apache.oodt.cas.filemgr.tools.SolrI
ndexer \ --optimize --solrUrl
http//localhost8080/solr
32
Indexer.properties
  • Configuration file for the SolrIndexer
  • Specify mapping between OODT product metadata and
    Solr fields
  • Additional pre-processing features

33
Indexer.properties
  • Example Indexer.properties file
  • (external)

34
Use Case I
  • Building a searchable data archive
  • Long-term / Lights-out archive
  • Products metadata immutable
  • Many NASA mission data systems use this model
  • Want to make it easily searchable

35
Use Case I
Standard Data Archive Pipeline Search
36
Use Cases II
  • Building an interactively editable, searchable
    data archive
  • Data and metadata mutable
  • Want to dynamically select product(s) to edit
    based on metadata

37
Use Case II
Interactively Editable Data Archive Pipeline
Search
38
Use Case II
Interactively Editable Data Archive Pipeline
Search
Solr catalog out of sync!
39
Synchronization
  • Two ways (at least) to solve this
  • Modify the OODT Curator Services
  • Treat OODT Curator Services as black box and
    write wrapper service to invoke Curator
    Services AND update Solr (via scripted call to
    SolrIndexer, for example)

40
Modify Curator Services
  • Services implemented in JAX-RS
  • /curator/src/main/java/org/apache/oodt/cas/curatio
    n/service
  • curator_url/services/metadata/update
  • Options
  • Utilize Solr Java API
  • Wrap call to OODT SolrIndexer tool

41
Use Case II-A
Modified Curator Services to Simultaneously
update Solr
42
Example
  • Interactive event tagging

43
Wrap Curator Services
  • Curator Service/API is black box
  • Develop custom service that
  • Issues POST request to Curator service
  • Updates Solr index via, e.g.
  • Utilize Solr Java API
  • Wrap call to OODT SolrIndexer tool

44
Use Case II-B
Wrapping OODT Curation Services with Custom UI
Services
45
Example
46
Lessons
  • Solr compliments OODT File Manager
  • RESTful interfaces (Solr OODT Curator) allow
    for great flexibility in designing services and
    UI
  • Best approach depends on situation

47
Next Steps
  • Develop SolrCatalog for OODT File Manager?
  • Pros Reduction in moving parts
  • Cons Restrictive?
  • Implement Use Case II-A as optional mode for
    Curator web service layer

48
Learning More
  • Solr
  • http//lucene.apache.org/solr
  • solr-user_at_lucene.apache.org
  • OODT
  • http//oodt.apache.org
  • https//cwiki.apache.org/confluence/display/OODT/H
    ome
  • oodt-user_at_apache.org

49
Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com