Tang E' K, Tiun S'T, Abdullah R' - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Tang E' K, Tiun S'T, Abdullah R'

Description:

Extended Yahoo directory path. Computers and Internets (computer1, Internet1) ... Domain : Web page directories (Yahoo) The experiment and its result ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 12
Provided by: utmk3
Category:

less

Transcript and Presenter's Notes

Title: Tang E' K, Tiun S'T, Abdullah R'


1

Enriching ontology using WordNet
  • BY
  • Tang E. K, Tiun S.T, Abdullah R.
  • Pusat Pengajian Sains Komputer
  • Universiti Sains Malaysia

2
Smart Product Information Search
1. Product Info Cataloging/Indexing
2. Product Concepts Relevancy/Categorization
Product info Databases
Indexing of selected contents
Dictionaries Wordnet Thesaurus Concepts Category
Hierarchy
Product Concepts Categorization
Full text indexing
Information brokering
Product Indexing Database
Product catalogues
Product Concepts Relevancy Network
Product Name Manufacturer Price ..
Language processor
Indexing Info. of Product 1
Indexing Info. of Product n
3. Searching
Search parameters
Interpreting/expanding users query
SEARCH
Key Words Expression
  • Search by key words and other constraints
  • Results sorted by relevancy and field range
  • Customizable search results presentation

User profiles
Access level Roles Tasks
Users query
3
Ontology ? Extended Ontology
yahoo
WordNet
Computer And Internet (computer1, Internet1)
Arts_and_Humanities (art1, humanities1)
computer1 data-processor1, electronic-computer
1, digital-computer1, machine1 Internet1
cyberspace1, computer-network1
  • How Using external linguistics database
    (WordNet) with the synonym, hyponyms/hypernyms
    and meronyms/holonyms relationships.
  • Why Solve Out Of Vocabulary (OOV) problem

4
Stemming word sense tagging
(input) Yahoo!Computer And Internet Security and
Encryption
 
Stemming process
computer internet security encryption
Sense tagging process
computer1 internet1 security4 encryption1
(output)
5
Obtain related words from Wordnet
computer1 (word from concept)
 
SYNONYM ? data-processor1,
electronic-computer1
HYPERNYM / ? machine, digital -
computer1 HYPONYM
HOLONYM/ ? null MERONYM
WORDNET
6
Extended Yahoo directory path
 
 
 
Yahoo  
 
null  
Computers and Internets (computer1, Internet1)  
 
computer1 data-processor1, electronic-computer
1, digital-computer1, machine1 Internet1
cyberspace1, computer-network1  
Security_and_Encryption (security4,
encryption1)  
 
security4security-reason1, precaution1,
safeguard1 encryption1coding1, compression1
7
General Overview Web page topic identification
Web page
Keywords Extraction
Yahoo Ontology
Mapping
Optimization
Topic
Extended Ontology
8
Web page keywords extraction.
Keywords extracted from text based on HTML tag
  • Words within title HTML tag. lttitlegt ..lt/titlegt
  • Words that are used for hyperlinks. lta
    hrefgt...lt/agt
  • Highlighted words
  • - Bold. ltbgt ..lt/bgt
  • - Italics. ltigt ..lt/igt
  • - Enlarged. lthDgt..lt/hDgt

9
Topic node identification
Tree
Optimized tree
Single Path
Topic node
Domain Web page directories (Yahoo)
10
The experiment and its result
Precision hits / (hits mistakes)
RESULT
69.7
Optimized tree
51.9
Single Path
29.7
Topic Node
11
We conclude..
  • Our approach is simple yet comparable to others
    works (others obtained accuracy result between
    the range of 30 - 50).
  • Problems that caused poor result
  • Text extraction heterogenity of web
    document,
  • poor quality.
  • Domain vocabulary is still insufficient.
  • Top-down model uncoverable mistakes (choose
    the wrong
  • node at the top, lead to the wrong node
    topic).
  • Related work
  • Use learning method with feature extraction
    from
  • each category node concept.
Write a Comment
User Comments (0)
About PowerShow.com