Title: Semantic Research Grid
1Semantic Research Grid
- Open Grid Forum Web 2.0 Workshop OGF21,
- Seattle Washington
- October 15 2007
- Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu,
Ahmet E. Topcu
- Community Grids Laboratory,
- Indiana University Bloomington IN 47404
- gcf_at_indiana.edu, http//www.infomall.org
1
2Semantic Scholars Grid
Web 2.0
MySpace
Windows Live Academic Search
Traditional GridCyberinfrastructure
ExportRSS, BibtexEndnote etc.
Del.icio.us
Google Scholar
CiteULike
Citeseer
Connotea
Science.gov
Bibsonomy
PubChem
Biolicious
Generic Document Tools
MASHUP
PubMed
CMT ConferenceManagement
Manuscript Central
Community Tools
Integration/Enhancement User Interface
etc.
Existing User Interface
New Document-enhanced Research Tools
Existing Documentbased Tools
3Delicious Semantic Web/Grid
- http//del.icio.us purchased by Yahoo for 30M
- http//www.CiteULike.org
- http//www.connotea.org (Nature)
- Associate metadata with Bookmarks specified by
URLs, DOIs (Digital Object Identifiers)
- Users add comments and keywords (called tags)
- Users are linked together into groups
(communities)
- Information such as title and authors extracted
automatically from some sites (PubMed, ACM, IEEE,
Wiley etc.)
- Bibtex like additional information in CiteULike
- This is perhaps de facto Semantic Web
remarkable for its simplicity
4Example
- Parallel Computing Collection selected on Cell
Tag
- So far no clear winner in tagging space
- Maybe CiteUlike with different metadata better
- How do I preserve investment?
5General Document Semantic Analysis
- Citeseer and Google Scholar scour the Internet
and analyze documents for incidental metadata
- Title, author and institution of documents
- Citations with their own metadata allowing one to
match to other documents
- These capabilities are sure to become more
powerful and to be extended
- Give Citation Index in real time
- Tell you all authors of all papers that cite a
paper that cites you etc. (Note its a small
world so dont go too far in link analysis)
- Tell you all citations of all papers in a
workshop
- Helps journal editor by suggesting referees based
on document analysis or by doing a plagiarism
analysis by scoring comparison with other
Internet documents
6Possible challenges
- Use of Web 2.0 tools in science (and business) is
very promising but adoption is currently small
- Which of many tools will be popular with your
colleagues?
- What happens if tool you chose is not adopted or
worse just disappears in a industry
shake-up?
- How to best integrate web-tagged document with
Word and Latex citations?
- Need to tag URIs e.g. database entries, not
just URLs (did for journal control system)
- Is currently security model sufficient?
- Can we link virtual organization of tagging
system with that of other Cyberinfrastructure/Web
2.0 subsystems
7Roughly what we are doing
- We are NOT building a new tagging or search
system
- We are building tools integrating and adding
value to existing systems
- We built a mashup linking to del.icio.us,
CiteULike, Connotea allowing exchange of tags
between sites and between local repositories
- Repositories also link to local sources
(PubsOnline) and Google Scholar (GS) and Windows
Academic Live (WLA)
- GS has number of cited publications.
- WLA has Digital Object Identifier (DOI)
- We implement a rather more powerful access
control mechanism
- We build heuristic tools to mine web lists for
citations
- We have an event based architecture
(consistency model) allowing change actions to be
preserved and selectively changed
- Supports integrating different inconsistent views
of a given document and its updates on different
tagging systems
8del.icio.us Tags
9Semantic Research Grid (SRG) Architecture
10Key Concepts of System Architecture
- Digital Entity (DE) a digital collection of
metadata for a citation
- Event a time-stamped action on a digital entity.
Our event-based model consists of
- Major Events
- Insertion or deletion of a digital entity
- Minor Events
- Modifications to an existing digital entity
- Dataset
- Collection of major and minor events
- Service-based Framework (SOAP over Http)
10
6/24/2009
11Example Subsystem
CiteULike
Delicous
Connotea
- Transfer
- Download/Upload
- Modify Digital Entity (DE)
- Share DE with
- other users
- Add/Get More info on a DE
- History (as a set of events) of a DE
- and rollback
Core Web Services
Research Database
Research Database
Research Database
6/24/2009
11
12SRG System Modules I
- Digital Entity (DE) Management Service
- Manual DE entity into the system
- DE history
- DE versioning and flexible choices (rollback)
- Editing and more info tools for a DE (Update
Model)
- Session and Event Management Services
- Event and dataset management
- DE view options
- User credentials (username/password) -
cookie-based
- Annotation Tools Service
- Transfer Service
- Download service
- Upload Service
- Extract DE and tags from web lists
6/24/2009
12
13SRG System Modules II
- Search Tools Services
- Google Scholar/Windows Live Academic
- Google Scholar Advanced
- Local Database Search
- Via integrated PubsOnline Tool from Indiana
University
- My Research Database
- My Research Database Advanced
- Authentication and Authorization Services
- Login and Logout service
- DE Access rights management
- Database access rights management
- Administrative tools
- Other Services
- User Registration
- Username and password recovery
- Users Profile Management
- DE metadata view options
6/24/2009
13
14Technical Issues
- Event-based model
- Manipulating data and metadata
- How to build event-based model ?
- Major and Minor events
- Datasets (collection of minor events)
- How to apply event-based model ?
- How to apply modifications to a record (Digital
Entity) ?
- Keep them in users session and let user apply
them
- Or apply them automatically to a DE
- How to merge metadata fields of Event and Digital
Entity ?
- Identification of metadata fields as dynamic or
static field
- How to apply service-based framework as wrapper?
6/24/2009
14
15Some recent Features of SRG
- Hybrid Consistency Framework Implementation
- Data-centric strict consistency model
- Implements primary-copy based consistency
protocol
- Pull-based
- Time-based consistency approach.
- Communicates with Annotation Tools to collect
updates periodically
- Push-based
- Updates are distributed to Annotation Tools
immediately once they occurred on the primary
copy
- Periodic Search Tools Implementation
- Search, compare and apply the updates made to a
Digital Entity (DE) in the system.
- Unique (128 bit) UUID assignment for each Digital
Entity
- User Tags view in the system
- Displays all tags belongs to a user
- Allow easy update or more info request on a
Digital Entity by tags
16Hybrid Consistency Framework for Semantic
Research Grid
17(No Transcript)
18Tool Updating Database from Web Page
19(No Transcript)
20Metadata Collection from CGL web pages
- The aim is to
- Eliminate duplicate data entry in different web
platforms.
- Building richer metadata in SRG using base
collected Digital Entities from web pages.
- Share new Digital Entities with other tools and
users in SRG
- Push new collected Digital Entities to other
communities using web 2.0 features
21Methodology for Collection
- Collect
- Digital Entities in Community Grid Publication
web pages.
- Analyze
- Using heuristic methodology to extract metadata
fields of the Digital Entities for CGL
publications
- Build
- RSS objects using collected Digital Entities.
- New tags using collected Digital Entities.
- Compare
- Collected Digital Entities from CGL web pages
with the existing Digital Entities in SRG.
- If they are
- different Store new Digital Entities in SRG
storage.
- same Option to update tags and other
fields.
- Share
- New Digital Entities with other Tools using
SRG.
22Security Model
- Security in Web 2.0 can be limited
- We implement a simple but more powerful security
model around local tools that wrap Web 2.0
systems
- We used an access-control matrix model to
provide security for our information system
- Supports multiple groups and multiple users for
each object.
- Similar to UNIX file system
- The Unix RWX bits corresponds to Read, Write, and
Execute operation for each file and directory.
- In SRG, DE (Digital Entity) correspond to the
file element and folder corresponds to the
directory element.
- For each DE and folder, there are three types of
access rights defined in the systems Read,
Write, and Delete.
23Security Model II
- We have a security model that supports
- Level of Authorization
- Roles are defined as Super Administrator (SA) and
Group Administrator (GA), User (U)
- The system allows having more than one SA.
- An existing SA can add other SAs to the system.
- SA can assign any U to become GA, and remove GA
from group.
- Each group should at least one GA. GA add/remove
U from group
- User profile
- Share user profile between Web 2.0 sites.
24(No Transcript)
25Current Usage of Semantic Research Grid Project
- We have used/tested Semantic Research Grid (SRG)
(a prototype model) for published scientific
research publications in Community Grids Lab at
Indiana University - In CGL 20 students ,post-docs and faculty
members are testing
- They are using the prototype model for collecting
of publication, uploading/ downloading them and
sharing them with other users
26Summary
- Integration
- We have successfully integrated Google Scholar
and Windows Live Academic search tools and
CiteUlike, Delicious, and Connotea annotation
tools which provide a system that allow dynamic
publication. - Flexibility and Extensibility
- We provides flexibility allowing integration of
different tools having common metadata.
- Easy to add and extend service mechanism
- Management and Consistency Scheme of Digital
Entities
- Allows the manipulation of a digital entity
- Applies Event-based model based on the concept
of
- Major events
- Minor events
- Datasets
- Provides a rollback feature to
- Support for history tool for a DE
- Merge and change the content of a digital entity
- A service-based framework for using existing
annotation tools through web services
- Prototype project web site http//gf6.ucs.indiana
.edu58080/SRGrid
27Domain Specific Semantic Document Analysis
- It is natural to develop core document Services
such as those used in Citeseer/Google Scholar but
applied to your documents of interest that may
not have been processed yet - As just submitted to a conference perhaps
- These tools can help form useful lists such as
authors of all cited or submitted papers to a
journal
- OSCAR3 (from Peter Murray-Rusts group at
Cambridge) augments the application independent
core metadata (Title, authors, institutions,
Citations) with a list of all chemical terms - This tool is a Service that can be applied to
your document or to a set of documents
harvested in some fashion
- Luis Rocha has developed related ideas for
Biology
- Other fields have natural application specific
metadata and OSCAR like tools can be developed
for them
- This is another Semantic Scholar Grid Tool
28OSCAR3 Chemistry Document analysis
- It detects magic chemical strings in text and
then
- Stores them as metadata associated with document
- Queries ChemInformatics repositories to tell you
lots of information about identified compounds
- Tells you which other documents have this compound
29Initial Results from OSCAR on PubMed
- We have a small sample (100) of full text
Chemistry papers selected at random from 15 years
of PubMed with over 5 million abstracts
- OSCAR3 generates 4.17 compound names per
abstract
- and 36.7 compound names per full text
- 555,007 PubMed abstracts of 2005 2006 (part)
used for Abstracts (on Big Red)
- Illustrates how much knowledge journal publishers
are hiding from us
30CICC Chemical Informatics Cyberinfrastructure
Collaboratory
MOAD Database
Integrating document (OSCAR) and conventional
services on the IU Big Red Supercomputer
PubMed Database
OSCAR Text Analysis
Toxicity Filtering
Cluster Grouping
Docking
PubChem Database
Initial 3D Structure Calculation
NIH PubChem Database
NIH PubChem Database
Molecular Mechanics Calculations
Product databases are wrapped with Web service
interfaces and are suitable for inclusion in
Taverna workflows.
Quantum Mechanics Calculations
IUs Varuna Database
POV-Ray Parallel Rendering