Harpers.org: a Semantic Webish site for Harpers Magazine - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Harpers.org: a Semantic Webish site for Harpers Magazine

Description:

Harpers.org: a Semantic Web(ish) site. for Harper's Magazine. Paul Ford ... A magazine of literature, politics, culture, and the arts published continuously ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 30
Provided by: PaulE5
Category:

less

Transcript and Presenter's Notes

Title: Harpers.org: a Semantic Webish site for Harpers Magazine


1
Harpers.org a Semantic Web(ish) site for
Harpers Magazine
  • Paul Ford
  • Associate Web Editor, Harpers.org
  • ford_at_harpers.org

2
Harpers is
  • A magazine of literature, politics, culture, and
    the arts published continuously from 1850
  • A small non-profit

3
Available content
  • The Weekly Review, an emailed summary of world
    events, from 2000
  • The Harpers Index, a statistical portrait of the
    world, from 1998
  • Public domain, scanned-in archives from
    1850-1982
  • Readings
  • Occasional features

4
And thats it.
  • Maybe full text of issues will be offered
    someday, but not soon. So
  • How do we get more value out of limited content?

5
Solution
  • Hack up the what we have into bits by content
    type, then
  • Reassemble it according to link targets
  • Which are arranged in a taxonomy
  • Creating a very small Semantic Web for
    Harpers.org

6
A quick demo

7
How it works
  • Simple set of ontological relationships (partOf,
    supervisorOf)
  • Taxonomy of content
  • narrative content
  • that is split into smaller pieces
  • links into the taxonomy

8
Markup
  • Text Country Y announced that it had cut off
    relations with country Z. On Wednesday, something
    happened to persons X and Y.

9
Markup
  • Country Y announced that it had cut off
    relations with country Z.
  • On Wednesday, something happened to persons W and
    X.

10
Markup
  • Country Y announced that it had cut off
    relations with country Z.

11
Markup
  • Country Y announced
    that it had cut off relations with toCountryZcountry Z.

12
Conditionals
  • Some text required conditional markup
  • Text Country Y announced that it had cut off
    relations with country Z, and on Wednesday,
    something happened to persons X and Y.

13
Conditionals ugly, but simple
  • Country Y announced that it had cut off relations
    with country Z
  • , and
  • .
  • on
  • On
  • on Wednesday, something happened to persons X and
    Y.

14
Conditionals ugly, but simple
  • Narrative version
  • Country Y announced that it had cut off relations
    with country Z, and on Wednesday, something
    happened to persons X and Y.
  • Timeline-friendly version
  • Country Y announced that it had cut off relations
    with country Z.
  • On Wednesday, something happened to persons X and
    Y.

15
All of it gets slurped up
  • And turned into a set of triples
  • Then processed in-memory
  • With HTML pages spit out as a result

16
Hard, then easy
  • Hard to get started (lots of events, facts, and
    links)
  • Easy to keep going, if you dont mind the markup
    and use a good text editor

17
Tools used
  • emacs, vi, bbedit
  • XSLT2.0 (SAXON)
  • CVS

18
Why not RDF?
  • Not right for redundant content and conditionals
  • Easy enough to transform arbitrary structured XML
    into RDF with XSLT, as needed
  • (Or into RSS1.0, RSS2.0, Atom, etc.)

?
19
For free
  • From 300 individual pages
  • To 1100 pages of remixed content all unique
    and relevant
  • And Google-friendly

20
And also for free
  • Semantically relevant in-site advertising, if we
    want it
  • Topic-sorted, reusable content
  • Permanent, readable URIs

21
Do people get it?
  • Some do, and others just navigate the site as
    usual
  • Harpers was fine with the learning curve
  • Odd but useful Gawker

22
Results
  • Uptick in traffic and subscription revenues
  • Low cost of maintenance
  • Ever-increasing database of facts and events
    adding one Weekly Review adds value to 50
    different pages
  • Happy client

23
Why the SemWeb(ish) framework?
  • Leaves plenty of room to grow
  • Web-only content
  • Full text of issues
  • Subscriber services
  • Etc
  • Take advantage of new SemWeb tools
  • Incorporate RDF sources into the taxonomy
  • Anticipate Semantic Web browsers

24
Next?
25
Make it pretty
  • Redesign
  • Hide some of the navigation
  • Turn links on and off

26
Make it scale
  • Currently maxes out at about 20-30 megs of
    content, due to limits of in-memory DOM
    representation (10-12x XML document size)
  • Use a publicly available storage layer (Kowari,
    Jena, etc)
  • Go triple-crazy

27
Make it easy to query and navigate
  • Show me everything related to George Bush and
    Iraq.
  • or
  • Show me everything related to politicians and
    the Middle East.
  • New navigation
  • ?

28

29
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com