Title: TOWARDS INFORMATIONAL SCIENCE Indexing and Analyzing the Knowledge of Scientific Communities
1TOWARDS INFORMATIONAL SCIENCE Indexing and
Analyzing theKnowledge of Scientific Communities
Bruce R. Schatz CANIS LaboratoryGraduate School
of Library and Information ScienceUniversity of
Illinois at Urbana-Champaign www.canis.uiuc.edu,
schatz_at_uiuc.edu
Workshop on The Transformation of Science Max
Planck Society, Elmau, Germany June 1, 1999
2Informational Science
- Towards the Fourth Branch of Science
- Computer Science gt Computational Science
- Information Science gt Informational Science
- Correlation of Knowledge across Sources
- Distributed Community Repositories
- Semantic Indexing of Community Knowledge
- Analysis Environments on the Net
3The Distributed World
- Community Repositories in the Interspace
- Every Person performs Every Role
USER request LIBRARIAN reference INDEXER class
ify PUBLISHER quality AUTHOR generate
4Community Systems
results
data
(database management)
(electronic mail)
knowledge
(hypertext annotations)
literature
news
(information retrieval)
(bulletin boards)
Formal
Informal
browse and share all the knowledge of a community
5Worm Community System
- WCS Information
- Literature BIOSIS, MEDLINE, newsletters,
meetings - Data Genes, Maps, Sequences, strains, people
- WCS Functionality
- Browsing search, navigation
- Filtering selection, analysis
- Sharing linking, publishing
- WCS 250 users at 50 labs across Internet (1991)
6WCS
7THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
8CONCEPT SPACES
- from Objects to Concepts
- from Syntax to Semantics
- Infrastructure is Interaction with Abstraction
Internet is packet transmission across
computers Interspace is concept navigation
across repositories
9LEVELS OF INDEXES
10SCALABLE SEMANTICS
- Automatic indexing
- Domain-Independent indexing
- Statistical clustering
- Compute Context of
- concepts within documents
- documents within repositories
11CROSS-OVERS IN SEMANTIC INDEXING
12COMPUTING CONCEPTS
92 4,000 (molecular biology) 93 40,000
(molecular biology) 95 400,000 (electrical
engineering) 96 4,000,000 (engineering) 98
40,000,000 (medicine)
13SIMULATING A NEW WORLD
- Obtain discipline-scale collection
- MEDLINE from NLM, 10M bibliographic abstracts
- human classification Medical Subject Headings
- Partition discipline into Community Repositories
- 4 core terms per abstract for MeSH classification
- 32K nodes with core terms (classification tree)
- Community is all abstracts classified by core
term - 40M abstracts containing 280M concepts
- concept spaces took 2 days on NCSA Origin 2000
- Simulating World of Medical Communities
- 10K repositories with gt 1K abstracts (1K w/ gt
10K)
14COMMUNITY PROCESSING
15INTERSPACE NAVIGATION
- Semantic Indexes for Community Repositories
- Navigating Abstractions within Repository
- concept space
- category map
- Interactive browsing by Community experts
16Interspace Remote Access Client
17Navigation in MEDSPACE
- For a patient with Rheumatoid Arthritis
- Find a drug that reduces the pain (analgesic)
- but does not cause stomach (gastrointestinal)
bleeding
Choose Domain
18Concept Search
19Concept Navigation
20Retrieve Document
21Navigate Document
22Retrieve Document
23(No Transcript)
24Category Map
25Category Navigation
26Concept Navigation
27SWITCHING
- In the Interspace
-
- each Community maintains its own repository
- Switching is navigating Across repositories
- use your vocabulary to search another specialty
28Medicine Session
29Categories and Concepts
30Concept Switching
31Document Retrieval
32CONCEPT SWITCHING
- Concept versus Term
- set of semantically equivalent terms
- Concept switching
- region to region (set to set) match
33Building Your Interspace
- Gather the Information Sources
- external bibliographic and community documents
- community meta-data and specialty data
- Generate the Community Repositories
- concept spaces (terms) category maps
(documents) - Construct the Analysis Environment
- concept switching and community links
- Evolve Community Interspace
- concept navigation and object sharing
34THE NET OF THE 21st CENTURY
- Beyond Objects to Concepts
- Beyond Search to Analysis
- Every Community has its own special library
- Every Community does semantic indexing
- Problem Solving via Cross-Correlating
- Concepts Across the Interspace
35The Zen of the Net