Title: From Documents to Knowledge Models
1From Documents to Knowledge Models
- Max Völkelvoelkel_at_fzi.de
- Forschungszentrum Informatik an der Universität
Karlsruhe (TH)
2Personal Knowledge Management
- Definition knowledge cues Haller
- any kind of symbol, pattern or artefact which
evokes some knowledge in a persons mind, when
viewed or used. - Knowledge cues can be stored and retrieved on a
computer while knowledge may or may not. - Ok, in fact you store bits (signals)
3What is a Document?
A team of 50 French researchers discussed
4Definition Document
- A team of 50 French researchers could agree on
- Document as form
- Document as a container, which assembles and
structures the content to make it easier for the
reader to understand it. - Document as sign
- Emphasize argumentative structure of the content.
- Document can be referenced ? acts as a sign for
its meaning. - Document as medium
- Reading contract intention or assumption of
the author what will happen with the document.
5Document (my definition) I/II
- A document consists of information atoms.
- An information atom is the smallest unit of
content which can be interpreted without a
documents context (but of course requiring
background knowledge). For text, these atoms are
single words.
- Packaging establishes a context
- Reference-ability reference to a published
document can act as a placeholder for the content
expressed within. - Process metadata should be sent along
- such as authors, audience, goal
Document
Author, audience, goal
6Document (my definition) II/II
- A document is a knowledge artefact consisting of
several layers
Content Semantics
- content means something.
- Building upon logical and argumentative
structure, the author encodes statements about a
domain within the content.
- to convey its content to the reader.
- Argumentative structures appear on all scales. A
typical structure is the Introduction - Related
work Contribution - Conclusion-pattern of
scientific articles. On smaller scales, patterns
like claim-proof and question-answer are
used.
Argumentative Structure
Logical Structure
- can reference smaller parts within a document
- i.e. paragraphs, headlines, footnotes, citations,
and title
Visual Structure
- guides the reader informally
- type-setting (i.e. bold, italics, different font
styles and size), placement of figures, pages
carries additional information
Linearity
- defined order
- for navigating through all information items
7Ted Nelson
- I propose a different document agenda
- I believe we need new electronic documents which
are transparent, public, principled, and freed
from the traditions of hierarchy and paper.
8What do people want?
?Why?
9What is a Wiki? Whats new compared to CMS?
- Easy Contribution ? shorter time-to-publication
- Wiki pages can be created and edited by any user
quickly and easily - Easy Writing
- Simple text formatting without the need to learn
HTML ? Wiki Syntax - Easy Linking
- Automatic linking converts written names of
pages, images and websites to links - Recent Changes
- See what has happened Awareness
- Diff function shows the latest changes
- Easily check whether changes are ok
- Fulltext search for page titles and text
- Backlink function shows which pages link to the
current page - Find the context of this page
- Directly link deep into a wiki using readable
names
Wikis were the first deployed, collaborative hyper
text authoring environments ? People want more
links
10My definition based on OMG metamodel MOF
What is a Model? Typed entities and typed
relations
TypeA2
TypeB2
Type C2
(Meta-)Modelling
TypeA1
TypeB1
TypeC1
Modelling
EntityX
EntityY
Real world from theviewpoint of the individual
ArtifactX
ArtifactY
11What is a Knowledge Model?
Document Ontology Knowledge Model
Information atoms Text (paragraphs, images, multimedia resources) Concepts Items (text, images, other binary resources)
- Text Short (headlines) and longer (paragraphs) Short labels Anything from short labels to structured documents
Order Strict linear order Yes, may be partial and have cycles
Hierarchy Yes (chapters, sections, paragraphs, sentences) Yes Yes, may be partial and have cycles
Annotations Yes (footnotes) Yes Yes
- Tagging (annotation with keywords) Yes
- Typing (inc. Inferencing) Yes Yes
Hyperlinks Yes (internal references and external citations) Yes, dont have to occur inside text
Visual layout Yes
12From Documents to Knowledge Models
- From analogue to digital documents
- smaller content granularity
- more interconnected content
- more explicit structures.
- ? Knowledge models
- very small information atoms, such as single
words - Richly connected items
- explicit semantics for the links.
- Definition
- A knowledge model is a superset of
documents and formal ontologies. - Annotated documents, stored together with their
annotations, can be seen as a knowledge
model.
13What is a CDS? Conceptual Data Structures
M. Völkel and H. Haller Conceptual Data
Structures (CDS) - Towards an Ontology for
Semi-Formal Articulation of Personal Knowledge
In Proc. of the 14th International Conference on
Conceptual Structures 2006. Aalborg University -
Denmark, July 2006.
14What is a CDS-based Knowledge Model?
- A set of addressable items (text, images, maybe
even multimedia elements) - Relations between items, classified in four types
- Source/target the generic, directed hyperlink
link - Before/after ordering relations, linear
navigation - Context/detail hierarchical relations, document
and concept hierarchies - Annotation/annotationMember annotations, to
give the ability to type items and relations,
items are used as types ? meta-modeling - Knowledge models must be able to capture
work-in-progress - CDS is not strict, you can have cycles, untyped
items, paradox ordering,
15CDS A Hierarchy of Relations
Legend
Undirected Relation related/related
informal
Relation Typerelation/inverse
Equivalency equivalent
Directed Linking source/target
Labelled Links /-inverse
Order before/after
Hierarchy detail/context
Annotation annotation/annotationMember
Subclassing is-a/superclass-of
Taskpriority
Tagging tag/tagMember
Documentorder
Instantiation type/instance
formal
16Motivation
17Examples for Knowledge Models
18How does Writing/Reading works?
- Writing / Sending
- Write down ideas
- Group them
- Structure them
- Add argumentation structures
- Add references to literature
- Link pieces in a first draft
- Add introduction and conclusion
- Repeat until coherent flow
- Publish document
- Reading / Recieving
- Visualise the structure graphically
- Connect new structures with existing own
structures
Von der Idee zum Text Esselborn 2004
19The tool chains break
- Create a new slide show out of three old
presentation plus one from your colleague - Why not have the content in smaller, more logical
chunks? - Re-use the motivation part of an old paper for a
new one - If you find a mis-spelling, why have to fix it
twice? - Search a stack of paper notes with good ideas
- Why are those not in your computer?
- Search email archives to find out what the
high-level architecture for the new
authentication system is - Why not browse your PKM and see the relations?
20Technological Developments
- ? accelerated distribution by many orders of
magnitude - ? lower costs
Analog ?
Digital
Communicationspeed
internet
printing press
cost
written language
time
21Cost of Communication Data transmission is
cheap now
- Total cost of communication to send content to n
people - choosing relevant parts of the personal
model encoding of model parts in document
parts order document parts strictly
linear/hierarchical n ( data transmission
linear reading of the document
decoding of model parts from document parts
creating a networked model out of
model parts integrate new model to
existing model )
22Cost of Communication Where can we save, if n
is small?
- Total cost of communication to send content to n
people - choosing relevant parts of the personal
model encoding of model parts in document
parts order document parts strictly
linear/hierarchical n ( data transmission
linear reading of the document
decoding of model parts from document parts
creating a networked model out of
model parts integrate new model to
existing model )
23Cost of Communication
- Total cost of communication to send content to n
people - choosing relevant parts of the personal
model encoding of model parts in document
parts order document parts strictly
linear/hierarchical n ( data transmission
linear reading of the document
decoding of model parts from document parts
creating a networked model out of
model parts integrate new model to
existing model )
24Current process culture is document-centric
Recipient(s)
Cost
25Ideal process - What if not documents, but
knowledge models would be exchanged between
people?
Recipient(s)
Cost
26Realistic (improved) process use both
Recipient(s)
Cost
27Information Management Problems ? Solution
Knowledge Models
- Under-utilisation of the interlinked nature of
information Oren? fine-granular nature of
knowledge models allows for precise and effective
linking and browsing - People have problems in using strict hierarchies
Oren? classification methods like tagging and
non-strict taxonomies - Keep the context Oren ? networked nature of a
knowledge model is more suited to represent
contextual links than a set of documents - Granularity ? Represent more than the content of
just one document
28When to use Knowledge Models?
Fixed domain
- Use domain specific tools languages
- Standardised representation formalisms
- Established data exchange processes
Open domain- or Multiple domains
- Use personal knowledge models
- Unstructured, semi-structured, semi-formal and
formal parts - Ad-hoc formalisation
- Cheaper to create, easier to integrate
- Use Documents
- Costly to create
- Cheap to read ? sometimes the best solution
- Hard to integrate
Myself! My TeamMy Community
Broad audience
29Related Work in Semantic Authoring
- Initial ideas - although that term was not used
- can be found already in V. Bush and D.
Engelbart - ABCDE Format from Anita de Waard
- Semantically annotated Latex (SALT) by Tudor
Groza - Systems allowing end-users to construct
ontologies out of their linked information
objects. - L. Ludwig sees redundancy within and among
documents as a hurdle to efficient information
usage. Traditional notion of a document is
replaced by virtual documents, which render parts
of the knowledge base as an interactive tree. - Bernstein describes TinderBox, a "personal
content management assistant", which offers
sophisticated HTML generation via templates. - Gnowsis system by Sauermann allows to link
desktop objects, integrates with wiki - iMapping semantic concept maps by Haller
- Same direction in the fields of semantic desktop
and semantic wiki - Semantic Web Content Repository (swecr)
30Conclusion
ContactMax Völkel, voelkel_at_fzi.de
Thank You very muchfor Your attention
- Documents
- Document-centered culture is a costly legacy
artefact and bottleneck for our society - Personal knowledge models
- Superset of documents and ontologies
- Integrate with the semantic desktop
- Make knowledge worker happier and more productive
- Authoring is the bottleneck
- We should bring the power of modeling to the
end-user - Dont break the tool chain
- Focus on work-in-progress