Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru

Description:

Neurosciences. High Energy Physics ... Domain-specific Cybertools (software) Shared ... Authentication - Authorization - Auditing - Workflows - Visualization ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 49
Provided by: BenT54
Category:

less

Transcript and Presenter's Notes

Title: Distributed Software Systems: Cyberinfrastructure and Geoinformatics Chaitan Baru


1
Distributed Software Systems Cyberinfrastructure
and GeoinformaticsChaitan Baru
  • San Diego Supercomputer Center

2
Integrated Cyberinfrastructure System Source
Dr. Deborah Crawford, Chair, NSF CI Working
Committee
  • Applications
  • Geosciences
  • Environmental Sciences
  • Neurosciences
  • High Energy Physics

DevelopmentTools Libraries
Education and Training
Discovery Innovation
Middleware Services
Hardware
3
Community Cyberinfrastructure Projects
Friendly Work-Facilitating Portals Authentication
- Authorization - Auditing - Workflows -
Visualization - Analysis
Adapted from Prof. Mark Ellisman, UC San Diego
DevelopmentTools Libraries
Ecological Observatories (NEON)
Biomedical Informatics (BIRN)
High Enegy Physics (GriPhyN)
Ocean Observing (ORION)
Geosciences (GEON)
Earthquake Engineering (NEES)
Middleware Services
Hardware
Distributed Computing, Instruments and Data
Resources
4
Data, Tools, Computation
  • Data
  • Field observations
  • Laboratory analyses
  • Sensor-based data (land, airborne, satellite)
  • Tools
  • QA/QC, simple transformations and analyses
  • Complex models
  • Computation
  • Community codes
  • Access to high-performance computing
  • Data Intensive Computing

5
Variety of Geoinformatics Efforts
  • Data collection
  • Digital data collection in the field
  • When does it become cyberinfrastructure?
  • Database curation
  • E.g. EarthChem, Paleobiology, MorphoBank, Paleo
    Pollen, etc.
  • When does it become tools and community codes
  • Software Development
  • Tools gravity and magnetics, paleogeography,
    geochemistry, seismic data products,
  • Community codes SCEC-CME, CIG,

6
Variety of Geoinformatics Efforts
  • High Performance Computing
  • LiDAR data management
  • Seismic analyses
  • Petascale initiative
  • Data Integration
  • E.g. CUAHSI HIS
  • Also, a pressing need in projects like EarthScope

7
Cyberinfrastructure The Common Platform Across
Distributed Projects
Cyberinfrastructure To provide access to
all of these resources and support
interoperability among them
Data Management And Curation
Modeling and Integration
Data Collection
Tool Development
8
Example USArray Data Flow
  • Deploy field sensor arrays
  • Across US
  • Collect data from sensor arrays and perform QA/QC
  • One of the sites is SIO, San Diego
  • Archive data for community access
  • IRIS, Seattle

EarthScope/USArray Single project, multiple
participants.
9
Survey
Example LiDAR Workflow Courtesy Chris Crosby,
ASU
Interpolate / Grid
D. Harding, NASA
Single goal Multiple projects, multiple
participants, e.g. NCALM, GEON, ASU, NASA, USGS,
Point Cloud x, y, z,
Analyze / Do Science
10
GEON Cyberinfrastructure
  • Funded by NSF IT Research program
  • Multi-institution collaboration between IT and
    Earth Science researchers
  • GEON Cyberinfrastructure provides
  • Authenticated access to data and Web services
  • Registration of data sets, tools, and services
    with metadata
  • Search for data, tools, and services, using
    ontologies
  • Scientific workflow environment and access to HPC
  • Data and map integration capability
  • Scientific data visualization and GIS mapping

11
Key Informatics Areas
  • Portals
  • Authenticated, role-based access to cyber
    resources data, tools, models, model outputs,
    collaboration spaces,
  • Data Integration
  • Search, discovery and integration of data from
    heterogeneous information sources (mediation
    and semantic integration)
  • Use of workflow systems, and access to HPC
  • Ability to program at a higher level of
    abstraction
  • Sharing of models, along with provenance
    information
  • Gateways to HPC environments
  • Management of Geospatial Information
  • Using GIS capabilities, map services, geospatial
    data integration
  • Visualization of 3D, 4D geospatial data and
    information

12
Distributed System Definition
  • A Distributed System is
  • one in which the hardware and software components
    in networked computers communicate and coordinate
    their activities only by passing messages, e.g.
    the Internet
  • A Distributed Database System is
  • one in which data is stored at several sites,
    each managed by a database system (DBMS) that can
    run independently

13
Distributed System Models
  • Client Server
  • Peer to Peer

14
Remote Service Invocation
  • TCP/IP
  • Basic Internet protocol for computer
    communications
  • Platform for building a number of other open or
    proprietary, higher-level communications
    protocols
  • Communication at a higher-level of abstraction
  • http
  • Open protocol based on TCP/IP for the Web
  • Fixed set of verbs (actions) used to transfer
    HTML documents
  • CORBA, Java RMI
  • Protocols based on an object model

15
SDSC Storage Resource Broker Virtualizing
storage

User
Resource, Mthd, User
Metadata Extraction
User Defined
Remote Proxies
MCAT
Dublin Core
DataCutter
Application Meta-data
http//www.sdsc.edu/srb
16
SRB Client/Server Model
Data are requested using an SRB ID and a file
abstraction (open, close, read, write)
SRB Client
Network
SRB Server
17
OpenDAP
  • Client/Server model

18
OpenDAP
From Peter Cornillon Jim Gallagherhttp//www.o
pendap.org/support/stennis_tutorial.html
19
OpenDAP Data Request
  • Data are requested with a URL.
  • http//www.cdc.noaa.gov/cgi-bin/nph-nc/datasets/Re
    ynolds_sst
  • Protocol Machine name OPeNDAP server
    Directory File name
  • User can impose a constraint on the data to be
    acquired from a data set by appending a
    constraint expression to the end of the URL

20
Remote Service Invocation with Web Services
  • A Web Service is a simple protocol for invoking
    remote services on the Web. It is
  • A network endpoint, i.e. server, that
    implements one or more ports.
  • Each port is defined by the message types that
    accepts and the messages it returns.
  • Specified by a Web Service Definition Language
    xml document.
  • Given the WSDL for a web service you know all you
    need to interact with it.
  • Web Service Standards also exist for security,
    policy, reliability, addressing, notification,
    choreography and workflow.
  • It is the basis for MS .NET, IBM Websphere, SUN,
    Oracle, BEA, HP,
  • It is the basis for the new Grid standards like
    WSRF and OGSA.

21
Web Site vs Web ServiceFrom Building Grid
Applications and Portals, An Approach Based on
Components, Web Services and Workflow Tools,
Gannon et al, Euro-Par 2004
  • Web Site
  • Designed to pass http get/post/put request to
    between a browser and a web server.
  • Google has a web site.
  • Web Service
  • Designed for services to talk to other services
    by exchanging xml messages
  • Google also provides a web service so Google may
    be used in distributed apps

Web Server
Web Service
Web Service
Web Service
22
Grid ServicesFrom Building Grid Applications
and Portals, An Approach Based on Components, Web
Services and Workflow Tools, Gannon et al,
Euro-Par 2004
  • Grid A distributed, heterogeneous set of
    resources
  • Integrated by a pervasive layer of services
  • Goal allow users to view it as a single system
  • More than the Internet (which forms part of the
    resource layer)
  • Builds on the Web by building on web services

Web Services Resource Framework Web Services
Notification
Physical Resource Layer
23
Access Interfaces and Levels of Access
  • Web service, native application program
    interface, ODBC/JDBC, filesystem

Application can also be wrapped as a Web Service
SOAP server stack
WSDL and SOAP
SRB, OpenDAP, etc
Web Server stack
URLs and http
Application Program
DBMS
Expose ODBC/JDBC interface (and full SQL)
filesystem
Mount remote filesystems
24
Authentication
  • Client Server models

Network
Server 1
Client A
Client-side authentication
Server-side authentication
?
?
25
Common Authentication
Certificate Authority
Obtain Credentials
Verify Credentials
Client
Server 1
Server 2
Server 3
Invoke with Credentials
26
Grid Account Management Architecture (GAMA)
Single sign-on in GEON (also used in a number of
other projects)Karan Bhatia, Kurt Mueller,
Choonhan Youn, Sandeep Chandra
gama
create user
DB
gridportlets
GridSphere
import user
OGSA Grid services wrapper
retrieve credential
Servlet container
Java keystore
Portal server 1
retrieve credential
Portal server 2
Servlet container
Java keystore
GAMA server
Stand-alone applications
27
Systems Issues
  • Load Balancing, Failover, Replication

Server 1
Multiple servers for load balancing, failover
Server 2
Client
Server 3
28
Distributed Data Access
  • What is the issue?
  • Ability to access data stored in multiple,
    different databases using a single request, e.g.
  • Get geologic information from multiple geologic
    databases
  • Get employee information from all branches
  • Ability to update data stored in multiple
    databases, e.g.
  • Transfer salary amount from University to my bank
    account
  • Transfer funds from Visa account to vendors
    account

29
Distributed data access
Sources may be data repositories or metadata
catalogs
Client
How about creating a cached local copy?
Homogeneous mySQL
mySQL mySQL
Heterogeneous mySQL
Oracle DB2
Database 1
Database 2
Database 3
30
Data Warehousing
But, warehouse data could be stale, i.e. out of
synch with source data
Client
Data Warehouse (common schema)
1. Load data from sources to warehouse
Data Source 1
Data Source 2
Data Source 3
31
Data integration via middleware
Client
Data integration Middleware (aka Mediator)
Database 1
Database 2
Database 3
32
Warehousing vs Mediation
  • Warehousing User ETL to massage local data to
    fit into a common global, warehouse schema
  • Mediation Modify user query to match schemas
    exported by each source
  • But, which schema does the user query?
  • The Integrated View Schema
  • Sources export a view (the export schema)
  • Federated databases
  • Local sources belong to different administrative
    domains, i.e. different owners.
  • Local autonomy

33
The Canonical Mediator / Wrapper Architecture
Client Application
Wrapper processes could execute at sources, at
mediator, or elsewhere
Q1
Export view in mediator data model
Local view in local data model
34
Example A Relational Mediator
Client Application
Mediator (Relational data model)
Wrapper
Wrapper
Shape file
Relational DBMS e.g. PostGIS
35
Example A Shape-file Based Mediator
Client Application
Mediator (Shape file-based data model)
Wrapper
Wrapper
Shape file
Relational DBMS e.g. PostGIS
36
Example An XML Mediator
User / Applications
Mediator (XML-based data model, e.g. GML)
Wrapper
Wrapper
Wrapper
Shape file
XML file e.g. ArcXML
Relational DBMS e.g. PostGIS
37
User Authentication and Access Control
How about using GAMA for authentication?
1. User authenticates to system
Client Application
2. User connects to mediator (passes credentials
to mediator)
Mediator
  • Mediator connects to sources
  • Using original user credentials
  • Or, mapped credentials (role-based access)

Wrapper
Wrapper
4. Need to define users or roles in sources
Data source 1
Data source 2
38
Different types of heterogeneity in data
integration
  • Platform heterogeneity different OS platforms
  • DBMS heterogeneity different database systems,
    e.g. SQLServer, mySQL, DB2
  • Data type heterogeneity
  • Schema heterogeneity
  • Heterogeneity in units, accuracy, resolution
  • Semantic heterogeneity

39
Schema Integration
  • A long standing Computer Science problem
  • Simple case
  • Mediator View
  • (SampleID varchar, Rock_Type varchar, Age int)
  • In Source2 Table, map Age to int

Wrapper
Source 1 Table
Source 2 Table
Wrapper convert between int and varchar for Age
40
Another integration scenario
Source 1 Table
Sample ID Rock type Eon Era
Period varchar varchar varchar
varchar varchar
Source 2 Table
Sample ID Rock type Age varchar
varchar varchar
Phanerozoic/mesozoicjur
  • Mediator View
  • (SampleID varchar, Rock_Type varchar, Age
    varchar, Era varchar, Period varchar)
  • In Source 2 Table, parse Age to obtain
    sub-components of the field

41
A more advanced integration scenario
Sample ID Rock type Eon Era
Period varchar varchar varchar
varchar varchar
Source 1 Table
150
  • Mediator View (SampleID varchar, Rock_Type
    varchar, Eon varchar, Era varchar, Period
    varchar)
  • Same as Source1 table schema
  • Query Get rock types for all rocks from the
    Jurassic period

42
Doing the integration
  • Query sent to mediator
  • SELECT DISTINCT(Rock_Type) FROM Mediator_View
    WHERE PeriodJurrasic
  • Query to Source 1
  • SELECT DISTINCT(Rock_Type) FROM Source1_Table
    WHERE PeriodJurrasic
  • For Source2, need to map PeriodJurassic to Age
    values

43
Query fragment sent to Source 2
  • SELECT DISTINCT (S2.Rock_Type)
  • FROM
  • Source2_Table S2,
  • Geologic_Time_Table GT
  • WHERE
  • GT.Period Jurrasic AND
  • (S2.Age gt GT.Min) AND
  • (S2.Age lt GT.Max)

Where is the Geologic_Time table stored ?
44
Data Integration Carts
  • Integrating data sets without explicitly creating
    views
  • An example request
  • Plot all gravity data points that fall within
    the spatial extent of rocks of a given type, in
    the Rocky Mountain testbed region
  • Use GEONsearch to find all gravity and geologic
    data using bounding box for Rocky Mountain
    testbed region
  • Need gazeteer / spatial ontology to determine
    Rocky Mountain region
  • Need to know classification of datasets (as
    gravity and geology)
  • Intersect extent of gravity and geologic datasets
    (from metadata) with extent of Rocky Mountain
    region
  • Plot gravity point data that fall within polygons
    of rocks of given type

45
Ad hoc integration
Search Metadata Catalog Geologic and
gravity data in Rocky Mountains
GEONsearch
Data Integration Cart
46
Data Registration
Item Registration (Schema registration)
Item Detail Registration
47
(No Transcript)
48
Another complex query
  • Query Get rock types for all rocks from the
    mesozoic era
  • Easy to do for Source 1 Era Mesozoic
  • For Source 2
  • Need to find numeric age range for Mesozoic
  • Find age range across all subclasses of Mesozoic
    (Cretaceous, Jurassic, Triassic)
  • Select all Source 2 Table records whose age range
    falls within the Mesozoic age range
Write a Comment
User Comments (0)
About PowerShow.com