Will XML and Information Retrieval Make Society Transparent - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Will XML and Information Retrieval Make Society Transparent

Description:

Information retrieval will be facilitated by XML because of the additional ... Metadata: terms or phrases that are of assuredly high importance ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 17
Provided by: Woo91
Category:

less

Transcript and Presenter's Notes

Title: Will XML and Information Retrieval Make Society Transparent


1
Will XML and Information Retrieval Make Society
Transparent?
  • Gregory B. NewbySchool of Information and
    Library ScienceUniversity of North Carolina at
    Chapel Hill
  • http//ils.unc.edu/gbnewby

2
Basic Premise
  • Information retrieval will be facilitated by XML
    because of the additional structure that XML
    adds.
  • This will result in better IR abilities compared
    to plain text or HTML

3
IR is Not Database Retrieval
Bibliographic Retrieval Controlled Vocabulary
Database Query Structured data
Natural Language (Semi-) Structured Or
unstructured
4
Information Retrieval in One Slide
  • IR is about matching information to info. needs
  • Information may be contained in documents,
    extracts, document surrogates, or newly-created
    documents
  • Information needs may be poorly defined,
    changeable, and context-specific
  • We evaluate IR systems by the numbers of relevant
    documents they identify
  • Recall proportion of all relevant documents that
    are retrieved
  • Precision proportion of documents that are
    retrieved that are judged as relevant

5
Why IR Sucks
  • Human language is ambiguous
  • Polysemy The same word can mean different things
  • Synonymy Different words can mean the same thing
  • The topic or aboutness of a document is hard to
    assess
  • Queries are short and ambiguous
  • Information needs are moving and vague targets

6
Things that help IR
  • Structure matching based on known types of
    content (e.g., a list vs. discourse)
  • Relationships Knowing how groups of documents
    are related
  • Metadata terms or phrases that are of assuredly
    high importance
  • User knowledge context, user models, history

7
Transparency through Information Access (utopic
view)
  • What if organizations (government, corporations,
    etc.) are less able to hide their actions?
  • What if individuals information is readily
    accessible to all?
  • What if nearly all information that is generated
    is available to all seekers?

8
Inequity through Information Access (dystopic
view)
  • Organizations share their data only when and with
    whom they choose
  • Individuals information is hoarded by
    businesses, government and the people themselves
  • Information is available on a fee- and authority
    basis

9
XML cant make societal decisions
  • But XML brings about the opportunity for such
    decisions to be made
  • If information is readily available to all, XML
    will help make it more searchable
  • If information is only available to the
    privileged, XML will make them more powerful

10
XML Uncertainties
  • Will XML be used for markup? Or only at the back
    end?
  • Will standards such as Z39.50 or EDI make it
    easier for sharing XML data? Or will translation
    mapping be difficult?
  • What sort of variety will exist in DTDs? How
    difficult will it be for IR and database systems
    to map between DTDs?

11
XML stakeholders Big organizations
  • Organizations with lots of internal data
  • (The IRS Time-Warner others big small)
  • These organizations will benefit from XML IR by
    being able to match database-type items with
    IR-type information needs.
  • E.g., for people who purchase these products,
    what email and chat messages have they exchanged

12
XML stakeholders Organizations who share
  • Organizations who broker, repackage or resell
    information will benefit from XML IR
  • (Credit bureaus, investigative services)
  • XML will make it easier to submit IR queries
    against multiple datasets and merge the results
  • E.g.,See what this persons public Web pages say
    before deciding whether to hire him or her.

13
XML stakeholders Individuals
  • Ultimately, lots of the most valuable information
    is by or about individuals
  • (Lifestyle, health, purchasing, travel)
  • IR systems that understand us better will be able
    to serve us better
  • E.g., recommend a book based on my past reading,
    movies and available time to read.

14
What we know, revisited
  • IR sucks, but is better to the extent that
    language is unambiguated and structure is present
  • People have information needs, but have trouble
    expressing those needs
  • Documents can address some needs, but often
    real-world information needs are better met by
    assembling answers from diverse sources

15
What we dont know, revisited
  • XML In the background or the foreground?
  • How will organizations share XML data (will
    they?)
  • What external forces might make data in all forms
    more accessible across organizations and to
    individuals?

16
XML IR
  • Despite problems, IR has continued to make good
    progress
  • Despite problems, XML appears to be making a
    strong contribution to storing, organizing and
    presenting data of all types
  • With IR, XML will be more searchable for a
    variety of purposes
  • With XML, IR will gain better precision and
    ability to serve the needs of individuals and
    organizations
Write a Comment
User Comments (0)
About PowerShow.com