Het Web als wetenschapsversneller - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Het Web als wetenschapsversneller

Description:

Versneller? 3. Outline. Stuff you all know: The scientists' problem. The general idea: a Web of Data ... What must be done to realise this. How far away is this ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 29
Provided by: frank401
Category:

less

Transcript and Presenter's Notes

Title: Het Web als wetenschapsversneller


1
Het Web als wetenschapsversneller
  • Frank van Harmelen
  • Vrije Universiteit Amsterdam

Data wants to be free
2
Versneller?
3
Outline
  • Stuff you all know The scientists problem
  • The general idea a Web of Data
  • What must be done to realise this
  • How far away is this
  • Why is this relevant for you
  • Nex steps, dos, donts

4
The Scientists Problem
  • Too much unintegrated data
  • from a variety of incompatible sources
  • no standard naming convention
  • each with a custom browsing and querying
    mechanism (no common interface)
  • and poor interaction with other data sources

Esther Jansma
Henk den Heijer
5
What are the Data Sources?
  • Flat Files
  • URLs
  • Proprietary Databases
  • Public Databases
  • Spreadsheets
  • Emails

Data wants to be free
Ewoud Sanders?
6
In which disciplines?
One dataset per site
a new database each month
  • Archeology
  • Chemistry
  • Genomics, proteomics, ... (bio/life-sciences)
  • Communication science
  • Social history
  • Linguistics
  • Bio-diversity
  • Environmental sciences (climate studies)
  • ....
  • libraries (KB), archives (beeldgeluid)

Willem Bouten
historical data
laymen data
laymen data
international data
7
Outline
  • The general idea a Web of Data
  • What must be done to realise this
  • How far away is this
  • Why is this relevant for you
  • Nex steps, dos, donts

8
Impact of the Web
  • The Web has changed the way
  • how we read the news
  • how we shop
  • how we interact with friends and family
  • how we search and find information
  • ... how we do science ?
  • accessing literature, yes, but
  • doing science?

9
The Web of Data
  • a.k.a. the "Semantic Web (TBL)
  • recipeexpose databases on the web, use
    standard data-formats, integrate
  • meta-data from
  • expressing DB schema semantics in machine
    interpretable ways
  • enable integration and unexpected re-use

10
Een korte geschiedenis van het WWW
  • Web 1.0 netwerk van plaatjes en tekst
  • Web 2.0 netwerk van communities
  • Web 3.0 netwerk van data

door mensen, voor mensen
door groepen mensen, voor groepen mensen
door computers, voor computers, nuttig voor
mensen
11
The Current Web of text and pictures
The Future Web of Data
and another web page about Frank
This page is about the Vrije Uniersitei
a web page in English about Frank
And this page is about LarKC
And this page is about Stefano
Data wants to be free
?
?
?
linked web-pages, written by people, written
for people, used only by people...
?
?
Many of these pages already come from data, that
is usable by computers!
linked data, usable by computers! useful for
people!
But we cant link the data....
12
Outline
  • The general idea a Web of Data
  • What must be done to realise this
  • How far away is this
  • Why is this relevant for you
  • Nex steps, dos, donts

13
machine accessible meaning (What its like
to be a machine)
META-DATA
14
What is meta-data?
  • it's just data
  • it's data describing other data
  • its' meant for machine consumption

15
Required are
  • one or more shared vocabularies
  • so data producers and data consumers all speak
    the same language
  • a standard syntax
  • so meta-data can be recognised as such
  • lots of resources with meta-data attached
  • mechanisms for attribution and trust

16
1. Shared vocabularies
BioMed
  • Mesh
  • Medical Subject Headings, National Library of
    Medicine
  • 22.000 descriptions
  • EMTREE
  • Commercial Elsevier, Drugs and diseases
  • 45.000 terms, 190.000 synonyms
  • UMLS
  • Integrates 100 different vocabularies
  • SNOMED
  • 200.000 concepts, College of American
    Pathologists
  • Gene Ontology
  • 15.000 terms in molecular biology
  • NCBI Cancer Ontology
  • 17,000 classes (about 1M definitions)

17
2. A standard syntax
Semantic Web data model RDF
things relations between things
18
RDF Triples in Life Sciences
19
Web of Data anybody can say anything about
anything
  • All identifiers are URL's ( on the Web)
  • Allows total decoupling of
  • data
  • vocabulary
  • meta-data

Data wants to be free
ltxgt IsOfType ltTgt
x
T
ltprincegt
20
Outline
  • The general idea a Web of Data
  • What must be done to realise this
  • How far away is this
  • Why is this relevant for you
  • Nex steps, dos, donts

21
How far away is this ?
  • Stable data formats
  • Lots of shared vocabularies ( ways to convert
    them)
  • Lots of data sources( ways to convert them)
  • Lots of tools
  • convert, construct, edit (data, vocabularies)
  • store, search, query, reason
  • interlink
  • visualise
  • ...

22
How far away is this ?
  • Not very far away!

every book sold by Amazon
rapidly growing Linked Open Data cloud.
already many billions of facts rules
any CD ever recorded (almost)
life-science databases
hierarchical dictionaries (UK, FR, NL)
basic facts on every country on the planet
common sense rules facts (100.000s)
scientific bibliographies
names of artists art works (10.000s)
Geographic names (millions)
Encyclopedia
It gets bigger every month
23
Outline
  • The general idea a Web of Data
  • What must be done to realise this
  • How far away is this
  • Why is this relevant for you
  • Nex steps, dos, donts

24
Next steps
Can you get famous by sharing data?
  • hunt for shared vocabularies
  • try to avoid building them
  • wrap legacy data sources
  • your own
  • from others
  • link wrapped sources
  • publish linked data on the web
  • make noise
  • reconstruct some old results
  • discover new results
  • get famous

in-use systems in communication science, KB,
Beeld Geluid, Europeana
papers in oncology, in communication science,
dedicated conferences in chemistry,
earth-sciences, life-sciences, humanities
funding opportunities in humanities, social
sciences, life sciences
25
Vb communicatie wetenschappen
  • Lees digitale kranten
  • Annoteer (wie zei wat over wie)
  • triples, RDF (supercomputer ipv studenten)
  • Sla annotaties op in RDF
  • Publiceer op het Web
  • Integreer met andere datasets
  • Nationale studies mogelijk op veel grotere
    datasets
  • Internationaal vergelijkende studies mogelijk

26
Scenario wetenschapsdynamica
  • Nu citatie-patronen, co-auteur netwerken
  • Maar datasets klein en niet representatief
  • Wetenschappers doen meer dan publiceren en
    citeren
  • Oogst datasets van
  • Fondsen (EU, NSF) (NWO?)
  • Conferenties, programma Cies
  • Email lijsten, blogs, twitter
  • Vindt actuelere en accuratere patronen

In RDF Integratie, Semantische analyse
27
Dus
  • Er zijn uniforme data-modellen
  • Er zijn overkoepelende vocabulaires
  • Er is data-publicatie technologie
  • Er zijn tools voor
  • Opslag
  • Visualisatie
  • Query

Data wants to be free
28
Vragen discussie
  • Frank.van.Harmelen_at_cs.vu.nl
  • http//www.cs.vu.nl/frankh/popularising.html
Write a Comment
User Comments (0)
About PowerShow.com