Facetbased Knowledge and Records Management - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Facetbased Knowledge and Records Management

Description:

Battelle Memorial Institute. Manager, Enterprise ... Battelle ECM was established in 2006 as Records Management moved from Legal into ... Battelle's ECM Goal ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 45
Provided by: svra
Category:

less

Transcript and Presenter's Notes

Title: Facetbased Knowledge and Records Management


1
Facet-based Knowledge and Records Management
  • Charlie Arp
  • Battelle Memorial Institute
  • Manager, Enterprise Content Management
  • John Lontos
  • UrsaNav EDRM Solutions
  • Senior Director
  • 5 March 2009

2
Agenda
  • Problem Statement
  • The Importance of Metadata
  • History of ECM at Battelle
  • What has been done
  • Next steps
  • Summary
  • Issues
  • QA

3
Problem Statement
  • Effective information management requires more
    metadata than users are willing to create

Description
Relation
Location
Creator
Format
Disposition Schedule
Project
Title
Revision
Date Created
Source
Class
Owner
Category
Date Registered
Author
Identifier
Hold(s)
Type
4
Data Entry Example 1
5
Data Entry Example 2
6
Purpose of Metadata
  • Resource Description
  • Information Retrieval
  • Management of Information Resources
  • Documenting Ownership and Authenticity of Digital
    Resources
  • Enabling Interoperability

Excerpts from the Chartered Institute of
Library and Information Professionals
7
Kinds of Metadata
  • Administrative metadata managing the object
  • Ownership
  • Provenance - chain of custody
  • Digital preservation metadata
  • Descriptive metadata finding the object
  • Key concepts
  • Taxonomic terms
  • Terms extracted from the digital object

8
Kinds of Metadata
  • Free form (uncontrolled) metadata
  • Social computing tagging
  • Notes
  • Searching full text
  • Controlled thesaurus metadata
  • Terms are pre-defined
  • Putting a square peg in a round hole
  • Searching known terms/categories

9
Metadata Drivers Standards
  • Knowledge sharing in support of research
  • DOE-STD-4001-2000
  • DOE Directive, O 243.1, RM Program
  • DOE Directive, O 243.2, Vital Records
  • DOE Directive, O 200.2, Information Collection
    Management Program
  • ISO 15489, Records Management
  • DoD Discovery Metadata Specification
  • PREMIS, OCLC Preservation metadata

10
What Users Want
  • Minimal data entry (i.e., minimal metadata)
  • Easy ways to add content

Social Drivers
Business Drivers
11
Impact of poor metadata
  • Where's that Dilbert?
  • I desperately wanted to paste an old Dilbert
    strip about a "boss stalker" in reply to Noella's
    post. But I just couldn't seem to find it! How do
    you search for old cartoon strips? There's no
    metadata associated with these image files. I
    searched Google (web, images) for "dilbert boss
    stalker" without luck.1
  • Electronic Basements

1. http//mannu.livejournal.com/87255.html
12
Information retrieval - Full text search vs.
Metadata enhanced search
  • Full text
  • Returns an overwhelming amount of unfiltered
    information
  • The user has to employ search strategies
  • Boolean and/or proximity to cull through the
    information
  • Metadata
  • The information has already been filtered into
    meaningful groups
  • The user is searching on known attributes

13
Origin of Battelle ECM Office
  • Battelle ECM was established in 2006 as Records
    Management moved from Legal into the newly formed
    Knowledge Management unit
  • KM was tasked to create a knowledge-rich
    environment to enable Battelle to develop,
    acquire, manage, and leverage knowledge assets.

14
Origin of Battelle ECM Office
  • RMO\ECM always viewed as the foundation of KM
  • To efficiently create, capture, find, manage and
    share knowledge (records) within the flow of
    normal activities
  • Make using the ECM as easy as possible

15
Battelles ECM Goal
  • Facilitate management of electronic records
    through intuitive searching and the automatic
    generation of useful metadata
  • Simplify the search experience for users
  • Knowledge workers spend 3.5 hours per week
    searching for but not finding the information
    they need IDC, 2005
  • If the users cannot find what they put into the
    RMA they will never use it
  • We needed to make it easier for the users to
    submit records to TRIM
  • 50 of users said they would not use the RMA
    because it was too difficult and time consuming
    - NHPRC RMA project in MI, 2000

16
What has been done
  • Implementation of RMA (HP TRIM)
  • 1 seat to 105 seats in 5 years
  • Digital preservation (eg, fixity check utilities)
  • Collecting maintaining permanent digital
    objects
  • Portals
  • SharePoint project and business sites accessing
    the RMA
  • Text analytics
  • Categorization tools, Taxonomies
  • Faceted search tools

17
Current ECM activities
  • Integrated software packages using the
    strengths of different applications
  • SharePoint for ease of use
  • Clear Forest for automated creation of metadata
  • TRIM for records keeping functionality
  • Reliable and authentic records

18
Department SharePoint SitesCurrent uses
  • Up-load reports and search for and retrieve these
    reports
  • Authentic official version of the report
    remains in TRIM
  • Two sites being used
  • 10 users
  • 2,000 reports (?)

19
Project SharePoint SitesProposed usage
  • Selected document libraries are connected to
    TRIM
  • Users submitting, searching for and retrieving
    files in TRIM through SharePoint
  • Working files and Project reports
  • Not the drafts library
  • No lists no events, links, tasks, announcements,
    or contacts
  • Nothing from discussion boards or surveys
    portions of the site

20
Sample Project Site
21
Sample Site
22
(No Transcript)
23
(No Transcript)
24
Clear Forest
  • It extracts or creates metadata from structured
    or unstructured content
  • Can be used on digital objects containing text -
    documents, e-mail, databases, web sites, Excel
  • Uses semantic/linguistics and statistical
    analysis
  • Server based application

25
Clear Forest
  • Entity extraction Identifies and tags metadata
    based on grammatical rules
  • Lexicons Identifies and tags pre-defined word
    lists
  • Key Concepts Identifies, tags and prioritizes
    noun phrases found in documents
  • Categorization Assigns subject headings
    (taxonomic terms) based on terms key words
    defined by training sets

26
Entity Extraction
  • Results
  • Negative
  • No control you get whatever is in the
    document
  • It will give you odd results from time to time
  • Positive
  • Easy and quick, we can run a large set of
    documents almost immediately
  • Can give you surprisingly good results

27
Entity Extraction
28
Entity Extraction
  • Technology
  • Products
  • Company
  • Location
  • Country
  • City
  • Region or state

29
Lexicons
  • Results
  • Negative
  • Creating the word lists can be time- consuming
  • Quicker than a key word search but
    difference?
  • Positive
  • Very dependable can use phrases
  • Can be very specific (good and bad)
  • Easy and quick

30
Key Concepts
  • Results
  • Negative
  • Some of the noun phrases are inaccurate
  • Will always be some throw away phrases
  • Need at least 5 noun phrases
  • Positive
  • Easy to use
  • Produces a good look into the document

31
Categorization
  • Results
  • Negative
  • Create a taxonomy
  • Categories (taxonomy) must be distinct
  • Creating training sets is time consuming
    difficult
  • Positive
  • Great results when it is done well
  • Best metadata to enable enhanced search

32
Categorization
  • Assign a set of documents (known as a training
    set) to a category (subject heading)
  • Uncategorized documents are assigned to a
    category based on frequency of category
    keywords identified within the document
  • Number of categories is unlimited
  • Defining a taxonomy for the organization
  • Can be difficult
  • Is time consuming

33
Categorization
  • For each category Clear Forest defines a set of
    positive documents and a set of neg. documents
  • Positive docs those in the category of interest
  • Negative docs docs in other categories
  • Words are scored based on frequency of
    appearance in positive category
  • Words are scored based on frequency of
    appearance in negative category

Combat Effectiveness
Smoke Obscurants Positive Docs
Toxicology
Counter- proliferation
Negative Docs
Counter- terrorism
And 17 other Categories
34

35
(No Transcript)
36
How it Works
Usable Metadata
  • Title
  • Author
  • Creator
  • Date Created
  • Date Registered
  • Project
  • Contract
  • Award Date
  • Project Mgr
  • Title
  • Author
  • Creator
  • Date Created
  • Date Registered
  • Project
  • Contract
  • Award Date
  • Project Mgr
  • Categories
  • Key Concepts
  • Technology
  • Products
  • Company
  • City
  • Country
  • Region or State
  • Title
  • Author
  • Date Created

1) Document is uploaded to Project Sharepoint Site
2) Document is transferred to the TRIM repository
5) New attributes are exposed via sharepoint
faceted search
3) A copy of the document is passed to ClearForest
4) The TRIM record is updated with ClearForest
output
HP TRIM
37
SharePoint Portal
38
(No Transcript)
39
(No Transcript)
40
Records Management Application
  • Compliant with DOE-STD-4001-2000
  • Interfaces to Microsoft Applications
  • Connectors to non-Microsoft data sources

41
Next Steps
  • TRIM/SharePoint integration
  • Broader deployment
  • Analysis of ongoing pilot projects- governance
    document
  • Harvesting V2 sites ingest into TRIM
  • Faceted search
  • Fully automate Clear Forest updates
  • Refinement of entity extraction
  • Refinement of search facets
  • Digital Preservation
  • Automated Fixity Checks on digital objects
  • Migrate from PDF to PDF/A

42
Issues
  • TRIM/SharePoint integration
  • Support (TRIM and SharePoint expertise)
  • 32 bit vs. 64 bit
  • Faceted search
  • Clear Forest will create erroneous metadata
  • Guinea the country vs. Guinea the pig
  • Programming support for Clear Forest
  • Dial4j
  • Just getting started production issues

43
Summary
  • Microsoft Sharepoint
  • Enterprise Portal
  • Project and Team Sites
  • Collaboration Document Authoring
  • Faceted Searching
  • HP TRIM
  • Unified Records Management Platform
  • Vital Records Features
  • Physical and Electronic Records Management
  • Clear Forest
  • Auto Categorization
  • Entity Extraction
  • Key Concept Tagging
  • Benefits
  • Easier for users to add content
  • Easier for users to find information
  • Improved service to customers
  • Enhanced business intelligence
  • Enhanced regulatory compliance
  • Improved e-Discovery response
  • Features
  • Automatic application of rich metadata
  • Streamlined user experience
  • Seamless and fully automated integration

44
Contact Information
Charlie Arp Manager, Enterprise Content
Management Battelle Memorial Institute (614)
424-7897 arpc_at_battelle.org
John Lontos Senior Director UrsaNav EDRM
Solutions (703) 625-9821 jlontos_at_ursanav.com
Write a Comment
User Comments (0)
About PowerShow.com