The Future of Metadata - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

The Future of Metadata

Description:

... without having to have programming knowledge ... Each subtopic is a knowledge domain (hierarchical taxonomy) ... Domain concepts or controlled vocabulary ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 39
Provided by: clifford53
Learn more at: https://firmcouncil.org
Category:
Tags: future | metadata

less

Transcript and Presenter's Notes

Title: The Future of Metadata


1
The Future of Metadata
  • Denise Bedford
  • World Bank
  • Presentation to Fall Metadata Forum
  • November 2, 2005
  • Department of Homeland Security

2
Meta-Future
  • Most of our information use and access today is
    based on an anonymous access model
  • It is increasingly clear that anonymous access to
    information and the packaging of information for
    single use contexts is neither sufficient for
    users nor an efficient use of development/engineer
    ing resources
  • We need to think in terms of contextualization
    and sensitization of information so that it can
    be used in any context where it pertains
  • In the future, information will flow
    information, not the systems in which it lives or
    was created, will be our focus
  • Information needs to be agile and mobile it
    needs to be sensitized to the contexts in which
    it might be used, to the interests of those who
    might use it, and to the applications that might
    consume it

3
Meta-Future
  • Envision a future like that described in the
    Netcentric Information Models formulated by the
    Dept. of Defense
  • Information is created, tagged, posted and shared
  • Any applications or users can according to
    security privileges use any information they
    can find, in any application they need to use to
    do their work
  • Technology becomes increasingly invisible but
    more logic based
  • More and different kinds of information such as
    reference sources need to be managed and
    maintained
  • This meta-future is heavily dependent upon the
    existence of rich, conceptual, sensitized,
    meaningful metadata
  • This future is now it is simply a practical
    view of the Semantic Web

4
The problem with metadata
  • This future sounds wonderful and the
    contextualization vision is exciting but theres
    just one problemmetadata
  • Metadata.
  • Is expensive and time consuming to create
  • Is sometimes subjective and not granular enough
  • Doesnt always address the ways that users and
    systems think about the information it describes
  • May not tell us enough about the information to
    trust it
  • may address only one context the context for
    which it is created
  • May lives in the source application where it was
    created
  • May not be as accessible as the information asset
  • If a Meta-Future depends on metadata, we have to
    solve these problems

5
The problem with technologies
  • Many of the tools are so tightly integrated, you
    might generate rich metadata, but it will not
    make your information agile or mobile
  • Statistical clustering engines do not get us to
    persistent meaning or contextualization.
    Clustering engines are great for thresholding or
    pattern tracings, but they will not generate the
    kind of metadata we need to realize this future
  • We need semantic engines at the base of all our
    metadata efforts, and these engines need to be
    available in multiple languages -- semantics vary
    by language
  • Magic black box approaches are neither meaningful
    nor sustainable -- you need to have access to the
    programs through a user-friendly interface so you
    can adapt them to your environment without having
    to have programming knowledge
  • You need to have several different kinds of
    technologies to do what Im going to describe
    today not just one tool

6
Content Dimension
Content Metadata
Region Scheme
Ideas Tacit Knowledge
Country Scheme
Content Elements Structure (XML)
Collection Development Policy
Banks Business Language
Topic Thesaurus
Content Quality Management
Business Activity Scheme
Programmatic Metadata Capture
Metadata Management
Topic Scheme
Concept Extraction
Anonymous Access (Context Free)

Information Diffusion (Context Sensitive
Group)_
Information Gathering Transformation (Context
Sensitive Person)
Business Process Awareness
Searching
Browsing
Translation Systems
Concept Filtering
Sense Making
Collaborative Filtering
Parametric Searching
Searching By Tools
Publishing
Results Clustering
Content Aggregation
Context Dimension
Individual Discovery
User-User Profile Matching
Knowledge Sharing
QA Systems
Results Sorting
Searching By Source
Text Classification
Syndication Engines
Directories of Expertise
Task Filtering
Workflow Management
Community Building
Individual Learning
Social Group SDI
Authentication Rules
Social Filtering
Threshold Filtering
Centralized Collections
Task Oriented SDI
Personal SDI
Advisory Services
Online Training
Authorization Rules
Recommender Engines
Communities SDI
Content Repurposing
User Dimension
Social Groups
Institutional Roles
Social Group Profiles
Client Profiles
Partner Profiles
Organizational Entities
Institutional Profiles
Individual Profiles
Communities Of Practice
Individual Profiles
Understanding the Dimensions of Contextualization
7
Vision of Contextualization
  • We need to address metadata challenges not in a
    traditional way but in the future context with
    the idea that metadata is contextualizable and
    sensitized to support information agility and
    mobility
  • In order to achieve contextualization you need to
    have extreme metadata
  • Metadata about the information
  • Metadata about the user
  • Metadata about the context
  • Rich metadata designed to meet many functional
    requirements
  • Metadata in multiple languages
  • Metadata needs to be interpretable for and in a
    context
  • Reference sources not only for traditional
    metadata but for all of the relationships and
    logic that are present in an ontology (simply
    different kinds of taxonomy representations)
  • Metadata must reflect any context or interest
    that a user might express
  • Still need to have some control over metadata in
    order to make it understandable in different
    contexts

8
New View of Ontology
Orgs Referenced
uses
Metadata
Contextual Matrix Sensiing
Contextual Logic
Rule Logic
People Referenced
Business Rule
Context
Topic Class Scheme
Has Meaning in
Content Entity1
User
Business Process Scheme
Has values
Has relationship to
Thesaurus
Has
Has
Metadata
uses
Content Parts
Country Names
Profile
Has
Region Names
Content Elements
Has
Metadata
Skill Sets/ Competencies
Contains
Has values
Content
Standard Statistical Variables
Hierarchy
Flat Taxonomy
Network Taxonomy
Faceted Taxonomy
Ring Taxonomy
9
Getting to Rich Metadata
  • Given the future demand for rich,
    contextualizable metadata, and all of the
    traditional drawbacks how will we achieve this
    future
  • We need to look for a different model for
    creating and sustaining metadata and reference
    sources
  • We need to teach technologies how to capture the
    metadata we need and how to maintain our
    reference sources
  • Id like to show you an example of how we might
    achieve that future
  • Please keep in mind that Im showing you an
    example of what is possible Enterprise Search,
    Authority Control/Entity Discovery

10
Fueling Semantic Search With Metadata
  • Or, .if Metadata is Dead, Semantic Web and
    Semantic Search Are Dead

11
Ring taxonomy
Ring taxonomy
Flat taxonomy
Hierarchical taxonomy
Fielded Search Faceted Taxonomy
12
Ring Taxonomy
Metadata
Network Taxonomy
13
More explicit View of faceted taxonomy
14
Building and Maintaining Taxonomies
  • Moving towards automated metadata generation
    means that catalogers shift their effort to
    reviewing the metadata generated and to more
    fully developing and maintaining subject
    headings/thesauri and classification schemes as
    part of a suite of categorization tools
  • Level of effort shifts to training and developing
    the tools and away from original cataloging and
    metadata capture
  • Continue to work closely with subject experts to
    define the controlled vocabularies and
    classification schemes
  • It means that you have to have a metadata
    infrastructure that looks something like that
    ontology we just reviewed
  • There is no silver bullet ontology tool out there
    that will do this work for you your knowledge
    and skills are critical

15
Metadata Capture Methods
Identification/ Distinction
Compliant Document Management
Search Browse
Use Management
Extrapolate from Business Rules
Programmatic Capture
Human Capture
Inherit from System Context
16
Smart Use of Technologies
  • Sample structure Bank Topics Classification
    Scheme (hierarchical taxonomy)
  • Oracle data classes used to represent Topic
    Classification scheme
  • hierarchical taxonomy as reference source for the
    attribute Topic
  • used for Browse, Search, Content Syndication,
    Personalization
  • 1st challenge is to architect the hierarchy
    correctly
  • 3 distinct data classes, not a tree structure
    with inheritance
  • Allows you to use the three data classes for
    distinct functions across systems but still
    enforce relationships across the classes

17
3 Oracle Data classes
Relationships across data classes
18
Topic data class
19
Subtopic Data Class
20
Subsubtopic Data class
21
Categorizing and Indexing Content
  • Lets look at how were categorizing our content
    to this structure automatically
  • Topic classification, geographical region
    assignment, keywording examples
  • Can apply this approach to any kind of content
  • Enables us to build a robust metadata repository
    model, with strong metadata quality, to move
    towards SI at the functional level
  • Also note that we can do this across many
    languages

22
Semantic Analysis Using The Technologies to Best
Advantage
  • Semantic analysis tools which support concept
    extraction, categorization, summarization and
    pattern matching rules engines
  • Teragram works in 23 languages
  • Use categorization to capture Topics, Business
    Activities, Regions, Sectors, Themes, etc.
  • Use Concept Extraction to capture keywords
  • Use Rules Engine to capture Loan , Credit ,
    Project ID, Trust Fund , etc.
  • Use Summarization to generate a gist of the
    content

23
How does semantic analysis work?
24
Semantic Analysis Basics
  • Once you have made some sense of the sentence
    (decompose), reconstruct entities for information
    extraction (compose)
  • Identify names and other fixed form expressions
    people, organizations, actions, relationships,
    places
  • Identify basic noun groups, verb groups,
    formatting elements, logic statements
  • Construct complex noun groups and verb groups
  • Identify event structures
  • Identify common elements and associate

25
Leveraging the Topic Structure
  • Each subtopic is a knowledge domain (hierarchical
    taxonomy)
  • Each subtopic has an extensive concept level
    definition (1,000 5,000 concepts)
  • Concepts are controlled vocabularies in their raw
    form (flat taxonomy)
  • Concepts with relationships (extensive per new
    Z39.19 standard) comprise semantic network
    (network taxonomy)
  • Categorization tools work with topic structure
    concept definitions to categorize and index
    content
  • The following screen illustrates how that same
    structure is embedded into Teragram profile to
    support categorization

26
Subtopics
Domain concepts or controlled vocabulary
27
Extensive operators allow us to write grammatical
rules to manage typical semantic problems
28
Concept based rules engine allows us to define
patterns to capture other kinds of data
29
Example of use of Authority Control to capture
country names but extract authorized version of
country name
Example of use of a gazetteer concept
extraction rules engine to support semantic
interoperability
30
Use of concept extraction rules engine to
capture Loan , Credit , Project ID
31
(No Transcript)
32
(No Transcript)
33
Overview of Process Tools
34
Enterprise Profile Creation and Maintenance
  • Enterprise Metadata Profile
  • Concept Extraction Technology
  • Country
  • Organization Name
  • People Name
  • Series Name/Collection Title
  • Author/Creator
  • Title
  • Publisher
  • Standard Statistical Variable
  • Version/Edition
  • Categorization Technology
  • Topic Categorization
  • Business Function Categorization
  • Region Categorization
  • Sector Categorization
  • Theme Categorization

UCM Service Requests
Update Change Requests
Data Governance Process for Topics, Business
Function, Country, Region, Keywords, People,
Organizations, Project ID
e-CDS Reference Sources for Country, Region,
Topics Business Function, Keywords, Project ID,
People, Organization
Enterprise Profile Development Maintenance
JOLIS E-Journals
Factiva
ISP
TK240 Client
IRIS
ImageBank
Teragram Team
35
Content Owners
Content Owners
Dedicated Server Teragram Semantic Engine
Concept Extraction, Categorization, Clustering,
Rule Based Engine, Language Detection
APIs Integration
APIs Integration
ISP Integration
IRIS Functional Team
IRIS Integration
Business Analyst
Enterprise Metadata Capture Strategy TK240
Client XML Output
Content Capture
Content Capture
XML Wrapped Metadata
XML Wrapped Metadata
APIs Integration
APIs Technical Integration
Enterprise Profile Development Maintenance
Factiva Metadata Database
ImageBank Integration
e-CDS Reference Sources
IDU Indexers
SITRC Librarians
Enterprise Metadata Capture Functional
Reference Model
36
Impacts Outcomes
  • Information Access impacts
  • Increased precision of search
  • Better control over recall
  • Searching like we talk
  • Exact match searching known item searching will
    work better
  • Metadata based searching now begins to resemble
    full-text searching but with all the advantages
    of structure context, and a significant
    reduction in the amount of noise
  • Productivity Improvements
  • Can now assign deep metadata to all kinds of
    content
  • Remove the human review aspect from the metadata
    capture
  • Reduce unit times where human review is still
    used
  • Information Quality impacts
  • All metadata carries the information architecture
    with it
  • Apply quality metrics at the metadata level to
    eliminate need to build fuzzy search
    architectures these rarely scale or improve in
    performance
  • Use the technologies to identify and fix problems
    with our data

37
In Progress Impacts
  • Same methodology can be leveraged to develop a
    structure of lines of business, entities
    prominent in particular domains, relationships
    among entities in a domain, standard statistical
    variables, etc.
  • The richer the metadata and the more fully
    elaborated the reference structures, the closer
    we come to understanding at a system level what
    is happening in a particular domain at any point
    in time
  • It is this overall structure which can then be
    leveraged in other contexts, perhaps even a
    counter-terrorism context, to threshold events
  • Without metadata, though, no information asset
    can be secured but still its importance known
  • Without metadata, no information is agile or
    mobile

38
Thank You.
  • Questions Discussions
Write a Comment
User Comments (0)
About PowerShow.com