The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble

Description:

My home state, Washington, is on the West Coast. ... Light of my days, Telemachus. Agamemnon, Son of Atreus. Thundering, Zeus. Great-hearted Odysseus ... – PowerPoint PPT presentation

Number of Views:269
Avg rating:3.0/5.0
Slides: 29
Provided by: dfa57
Category:

less

Transcript and Presenter's Notes

Title: The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble


1
The Gedeon Project Data, Metadata and
DatabasesYves DENNEULINLIG laboratory, Grenoble
Laboratoire LIP6
ACI MD
2
Context and goals
  • Heterogeneous metadata management on grids
  • Clusters of clusters
  • High-level queries using metadata
  • Easy and flexible deployment and configuration
  • Minimal overhead
  • Various interfaces
  • Initial target application domains
  • Biocomputing (lots of metadata, few data)
  • Microscopic imaging (lots of data data, few
    metadata)

3
The Gedeon middleware
  • Metadata management on lightweight grids
  • Records of (attribute,value) pairs stored in
    files
  • Flexible requests
  • Can be combined through scripting
  • Various interfaces
  • Command line (tools)
  • Libraries
  • Virtual FS (legacy applications support)
  • Deployment à la carte
  • Composition of various data sources
  • Performances
  • Dedicated I/O library
  • Semantic caching

4
Outline
  • General architecture
  • Gedeon internal structure
  • Composition of various data sources
  • Practical use
  •  dual  cache
  • Conclusion

5
Example of a deployment
Query Interface (API, FS, GUI, ...)
cache
Local proxy
Client
Servers  close  to the client
cache
cache
Interconnect middleware
Interconnect middleware
cache
cache
cache
cache
cache
Local proxy
Local proxy
Local proxy
Storage sites
Interconnect
6
Gedeon components
  • Gedeon Kernel
  • fuple
  • I/O Library
  • Evaluate the queries
  • lowerG
  • Operators to compose bases
  • Remote access
  • Interface
  • API lowerG
  • Virtual FS
  • Cache

Local proxy
cache
lowerG
7
What inside the sources?
  • Records of pairs attribute/value

Record
Id
457
classifA
Bacteria
classifB
Clostridia
taille
26
ref
8
Example of composition of sources
site S2
site S1
site S3

J
RR
Metadata can be local or copies
client
9
Union
enreg. A1
enreg. B1
enreg. A2
enreg. A1
enreg. B1
enreg. A3
enreg. A2
enreg. B2

enreg. B2
enreg. A3
enreg. B3
enreg. B3
enreg. A4
enreg. B4
...
...
enreg. A4
enreg. B4
Unify storage space Parallel evaluation
...
10
Round Robin
Fault Tolerance
Source 1
RR
client
Source 2
11
Round Robin
Load Balancing
Source 1
client
RR
client
Source 2
12
Join operator
Id
457
Id
457
A1
v1
A1
v1
A2
v2
A2
v2
Id
457
A3
v3
A3
v3
An
vAn1
J
An
vAn1
Id
458
Id
458
Id
Id
458
A1
v4
An
vAn2
A1
v4
A2
v5
...
A2
v5
A3
v6
Enrich a source with another
A3
v6
...
An
vAn2
...
13
Outline
  • General architecture
  • Gedeon internal structure
  • Composition of various data sources
  • Practical use
  •  dual  cache
  • Conclusion

14
Tools 1/2
  • Libraries
  • CLI
  • Operations
  • sort
  • projection
  • select
  • index
  • ...

15
Tools 2/2
  • Examples
  • sortgt cat mesmeta.g fsort 'taille' gt
    trie_taille.g

sort(attr'taille')
  • index

.Id.idx
create_idx(attr'Id')
search_idx('Id', 'P0123')
.Id.idx
.Id.idx
16
Language for the requests
  • Simple (, type control with the operators)
  • Regular expressions
  • Of the second order

17
Select expression
Id
457
classifA
Bacteria
classifB
Clostridia
taille
26
Select Idgt459
Id
460
classifA
Fermicutes
Id
459
classifB
Bacteria
taille
47
Id
460
classifA
Fermicutes
18
Select using regexp
Id
457
Id
457
classifA
Bacteria
classifA
Bacteria
classifB
Clostridia
classifB
Clostridia
taille
26
taille
26
Select classifB/.a/
Id
459
Id
459
classifB
Bacteria
classifB
Bacteria
taille
47
taille
47
Id
460
classifA
Fermicutes
19
Select using 2nd order logic
Id
457
classifA
Bacteria
classifB
Clostridia
taille
26
Id
459
Select /classifAB/Bacteria taillegt36
classifB
Bacteria
Id
459
taille
47
classifB
Bacteria
taille
47
Id
460
classifA
Fermicutes
20
Virtual FS interface
  • Just a specific file-oriented interface
  • Data and metadata can be anywhere in the grid
  • Definition of logical directories
  • Ex cd 'classifB.a'
  •  and  between directories
  • 1 filename value of a metadata logical
    view/fs_virt/classifB.agt ls457
    459/fs_virt/classifB.agt cat
    gt/tmp/mater/fs_virt/classifB.agt

21
Outline
  • General architecture
  • Gedeon internal structure
  • Composition of various data sources
  • Practical use
  •  dual  cache
  • Conclusion

22
Dual cache (1)
  • 2 cooperative caches
  • cache of requests (R, id,...)-gt save computing
    power
  • cache of data (id, attr,...)-gt save bandwidth
  • Semantic cache
  • Can evaluate a query using the data in the cache
  • Can generate a remainder to complement the data
    cached

23
Example
  • Refinement of a request
  • 'OC/Eukaryota/'-gt (R, Lidid1,id2, ...)
  • 'OC/Eukaryota/ yeargt1998'Select(Lid,
    'yeargt1998')

24
Dual cache (2)
  • Distributed semantic cache
  • Typically used inside communities
  • Lots of common requests
  • No location constraints
  • Members of the community can be geographically
    scattered
  • Distributed data cache
  • Minimize time and data transfer
  • Cooperation between close, from a topological
    point of view, sites

25
Dual cache (3)
26
Dual cache (4)
  • Work in progress on the notion of distance
  • Find geographical proximity
  • Find common interests between communities
  • Create hybrid communities based on their requests
  • Could be used to change the cache parameters
  • Manual and/or automatic

27
Conclusion
  • A data integration middleware
  • Handling of metadata
  • Distributed and modular
  • Deployment can be done according to
    architectural/organisational constraints
  • Definition of a dual cache infrastructure
  • Reflect both organisational use
  • Prototype in use
  • Packaging and documentation needed

28
Questions?
Write a Comment
User Comments (0)
About PowerShow.com