Work at TACOLA Lab - PowerPoint PPT Presentation

About This Presentation
Title:

Work at TACOLA Lab

Description:

Work at TACOLA Lab Team Members T.V.Geetha Ranjani Parthasarathi Madhan Karky E.UmaMaheswari J.Balaji Subalalitha Elanchezhiyan.K, Karthika, Thenmalar ... – PowerPoint PPT presentation

Number of Views:318
Avg rating:3.0/5.0
Slides: 42
Provided by: HOD98
Learn more at: https://www.infitt.org
Category:

less

Transcript and Presenter's Notes

Title: Work at TACOLA Lab


1
Work at TACOLA Lab
  • Team Members
  • T.V.Geetha Ranjani Parthasarathi Madhan Karky
  • E.UmaMaheswari J.Balaji Subalalitha
    Elanchezhiyan.K, Karthika, Thenmalar,
    Radhakrishnan, Kandasamy, Padmavathi, Aruna,
    Vijayavani

2
Tamil Language Processing
  • Tamil Language Processing
  • Morphological analyser
  • Normal Words, Compound Words, Colloquial Words
  • Parser
  • Simple, Complex and Compound Sentences
  • Semantic analysis based on UNL
  • Language Technology
  • Blog Mining
  • Ontology Based Information Extraction
  • Personalized Search
  • Parallelization for NLP Processing
  • Emotion detection form text
  • Carnatic Music Processing
  • Raga Modelling
  • Singer, Genre Identification
  • Music Emotion Recognition
  • Tamil Language Oriented Tools
  • Dictionary
  • Text Compaction
  • UNL Based Work
  • UNL for semantic representation
  • Nested UNL
  • Concept based Search
  • Bi-lingual Search
  • Event Processing
  • Discourse Analysis
  • Summarization
  • Question answering
  • Thirukural Search
  • Lyric Oriented Processing
  • Lyric Mining
  • Lyrics for Tunes
  • Pleasantness

3
Papers for TIC 2011
  • Tamil Language Oriented Tools
  • Agaraadhi A Novel Online Dictionary Framework
  • An Efficient Tamil Text Compaction System.
    (Surukkupai)
  • Kuralagam, A Concept Relation Based Search
    Framework for Thirukural.
  • Popularity Based Scoring Model for Tamil Word
    Games
  • Tamil Language Processing
  • Template based Multilingual Summary Generation.
  • On Emotion detection from Tamil Text.
  • Tamil Summary Generation for Cricket Match.
  • Lyric Oriented Processing
  • Lyric Mining Word, Rhyme Concept
    Co-occurrence Analysis.
  • Special Indices for LaaLaLaa Lyric Analysis
    Generation Framework.

4
AGARAADHIA NOVEL ONLINE DICTIONARY FRAMEWORK
  • Elanchezhiyan.K
  • Karthikeyan.S
  • T.V.Geetha
  • Ranjani Parthasarathi
  • Madhan Karky

5
OBJECTIVES
  • Agaraadhi, a dictionary framework for indexing
    and retrieving Tamil words, their meaning,
    analysis and related information.
  • Framework to incorporate various unique features
    - designed to provide additional information to
    the user regarding the word that they query
    about.

6
INTRODUCTION
  • Agaraadhi dictionary has more than 3 lac words in
    various domains such as
  • General,
  • Literature,
  • Medical,
  • Engineering,
  • Computer Science,
  • Birds Name and More
  • The Agaraadhi is a Tamil English bilingual
    dictionary.

7
INTRODUCTION CONT
  • The Agaraadhi is a Tamil English bilingual
    dictionary with 20 features. such as
  • morphological analysis,
  • morphological generation,
  • word usage statistics,
  • word pleasantness analysis,
  • spell checking,
  • similar word finder,
  • word usage in literature,
  • picture dictionary,
  • number to text conversion,
  • phonetic transliteration,
  • live usage analysis from micro blogs and more

8
AGARAADHI FRAMEWORK CONT
9
AGARAADHI FEATURES
  • Morphological Analyser
  • gives the morphological features of the query
    word such as root word, parts of speech, gender,
    tense and count.
  • If the Query word is padithaan, Morphological
    Analyser gives as padi as root, word represents
    male gender and query word is past tense and so
    on.
  • Morphological GeneratorTamil morphological
    generator tackles different syntactic categories
    such as nouns, verbs, post positions, adjectives,
    adverbs.
  • The generator is used to generate possible
    morphological variations of the query word.
  • Spell Checker
  • used to check the spelling of Tamil words and to
    provide alternative suggestions for the wrongly
    spelt words.
  • If root word not in dictionary - generates all
    the possible suggestions with minimum variations
    from the given word

10
AGARAADHI FEATURES
  • Word Suggestions
  • gives the list of equivalent or related words for
    the given query word.
  • Word Pleasantness
  • score generator provides how easy it is to
    pronounce the word.
  • Word Popularity Score
  • shows the word usage in the web based on
    frequency distribution of the word across the
    popular blogs, news articles, social nets etc.
  • Word Usage Statistics
  • shows the usage of the word in the social network
    over the past one week.
  • Word Usage in Literature
  • finds the usage of words in popular literature
    such as Thirukural, Bharathiyar Padalgal, Avvai
    songs and also Lyrics of Tamil Movie songs.

11
AGARAADHI FEATURES
  • Word of the Day
  • A rare word is randomly chosen and is displayed
    in the opening page to facilitate users to learn
    a new word every day.
  • Number to Text Converter
  • converts a number to Tamil word equivalent as
    well as in English text. For example in Tamil we
    represent oru Arpputham (????????) for 100
    million, Kumbam (???????) for 10 billion and
    finally up to Anniyan (????????) for one zilli
  • Picture Dictionary
  • Pictures, photos or line drawings to depict
    popular words have been included in the
    dictionary to enable efficient learning for
    children using this tool.

12
RESULTS
  • Query word pookkal (???????)
  • http//www.agaraadhi.com/dict/OD.jsp?wE0AEAAE
    0AF82E0AE95E0AF8DE0AE95E0AEB3E0AF
    8DlntaSubmit.x8Submit.y7
  • Query word mazhai (???)
  • http//www.agaraadhi.com/dict/OD.jsp?wE0AEAEE
    0AEB4E0AF88lntaSubmit.x21Submit.y4
  • Query word fruit
  • http//www.agaraadhi.com/dict/OD.jsp?wfruitlnen

13
FUTURE WORK
  • Providing APIs for programmers and developing
    mobile apps for Agaraadhi framework will open a
    good platform for many researchers and developers
    working in Tamil Computing area.

14
REFERENCE
  • Anandan, R. Parthasarathi, and Geetha,
    Morphological Analyser for Tamil. ICON 2002,
    2002.
  • Anandan, R. Parthasarathi, and Geetha,
    Morphological Generator for Tamil. Tamil Inayam,
    Malaysia, 2001.
  • J. Jai Hari Raju, P. IndhuReka, Dr. Madhan Karky,
    Statistical Analysis and  visualization of Tamil
    Usage in Live Text Streams, Tamil Internet
    Conference, Coimbatore, 2010.

15
An Efficient Tamil Text Compaction System
  • N.M.Revathi
  • G.P.Shanthi
  • Elanchezhiyan.K
  • T V Geetha
  • Ranjani Parthasarathi
  • Madhan Karky

16
OBJECTIVES
  • Why Compacting?
  • limited message length in blog sites and tiny
    user interface of mobile phones.
  • saves online storage space and hence reduction in
    cost.
  • The paper proposes
  • a text compaction system for Tamil, first of its
    kind in Tamil.
  • Idea of compaction
  • Getting the shortest word has no specific rule it
    is mainly aimed at understanding.
  • can be obtained by omitting letters, replacing
    prefix and suffix through suitable symbols and
    numbers.

17
FRAMEWORK ARCHITECTURE

18
FRAMEWORK CONT..
  • Input Processing
  • The morphological analyzer removes the suffix (if
    present) added to the word and delivers the root
    word (RW).

19
FRAMEWORK CONT..
  • Identification of the category Extraction of
    compact word
  • Three categories of words common Tamil words,
    abbreviations/acronyms, numbers.
  • abbreviations /acronyms by comparing it with the
    keys of the hashmap.
  • With the help of the hash key and a mapping
    algorithm, the compact word is retrieved.
  • Otherwise belongs to either the common tamil word
    or numbers
  • If numbers - Numerical analyser for text to
    number conversion.
  • Output Processing
  • Tamil tool Morphological Generator to add the
    suitable suffix to cater to the rules of the
    language.

20
RESULT AND ANALYSIS
  • Tested with over 10,000 words.
  • The final result is reduced to 40 of the
    original text.

21
REFERENCES
  • Anandan, R. Parthasarathi, and Geetha,
    Morphological Analyser for Tamil. ICON 2002,
    2002.
  • Fung, L. M. (2005). SMS short form identification
    and codec. Unpublished masters thesis, National
    University of Singapore, Singapore . 
  • Acrophile (LSLarkey, P Ogilvie, MA Price, B
    Tamilio, 2000) a system that automatically
    searches acronym expansion pairs.
  •   Short Message Service (SMS) Texting Symbols A
    Functional Analysis of 10,000 Cellular Phone Text
    Messages by Robert E. Beasley,Franklin College.
  •  

22
Kuralagam - Concept Relation based Search Engine
for Thirukkural
  • Elanchezhiyan.K
  • T.V.Geetha
  • Ranjani Parthasarathi
  • Madhan Karky

23
Objectives
  • Kuralagam is a conceptual search framework for
    Thirukkural based on UNL Framework.
  • Searching with keywords in kurals and
    intepretations
  • Concept based search based on CoReX conceptual
    indexing based on UNL
  • Bilingual search English and Tamil
  • Showing Relationships between the concepts.

24
Kuralagam Framework
25
Offline Processing
  • Web Crawler
  • A Thirukkural statistics crawler
  • crawls the news and blog documents - to find the
    usage of each individual Thirukkural.
  • The usage recorded for measuring the popularity
    score for each Thirukkural
  • Enconversion Based on UNL
  • Indexed based on CoReX Framework

26
UNL Enconversion
  • UNL is an intermediate language
  • processes knowledge across languagebarriers.
  • captures semantics by converting natural language
    terms present in the document to concepts.
  • concepts are connected to the other concepts
    through UNL relations - 46 UNL relations
  • plf(Place From), plt(Place To), tmf(Time from),
    tmt(Time to) etc
  • Process of converting a natural language text to
    UNL graph is known as Enconversion
  • reverse process is known as Deconversion.

27
An Example speaks more...
  • ExJohn was playing in the garden

28
Indexer
  • The Kuralagam Indexer is designed based on CoReX
    Techniques.
  • The Indexer stores and manages the UNL graphs in
    two different indices.
  • Concept only index (C index), and
  • Concept-Relation-Concept index (CRC index)

29
Online Processing
  • Query Translation and Expansion
  • converts the user query to UNL graph.
  • uses CRC (Concept Relation Concept) CoReX indices
    to fetch similarity thesaurus and co-occurrence
    list to populate the Multi list Data Structure.
  • Search and Ranking
  • fetches the Thirukkural number and its details.
  • Thirukkurals for a given query are fetched using
    the two types of concept relation indices namely
    CRC and C.
  • The query concept is expanded using related CRC
    indices pointing to the query concept.
  • helps in retrieving many Thirukkurals
    conceptually related to the query not possible
    with key word Thirukkural search engines.
  • The ranking is based on
  • priority to the indices in the order CRCgtC
  • usage score
  • frequency occurrence of the query concept

30
Tab Layout
31
Performance Evaluation
  • The accuracy of the Thirukkural search engine was
    measured using the average precision and mean
    average precision.
  • The comparisons between concept based search and
    keyword based search were measured using Average
    Precision methodology

32
Average Precision
33
Reference
  • 1. Subalalitha, T V Geetha, Ranjani Parthasarathy
    and Madhan Karky Vairamuthu. CoReX A Concept
    Based Semantic Indexing Technique. In SWM-08.
    2008. India.
  • 2. Foundation, U., the Universal Networking
    Language (UNL) Specifications Version 3 3ed.
    December 2004 UNL Computer Society, 2004.
    8(5).Center UNDL Foundation
  • 3. Anandan, R. Parthasarathi, and Geetha,
    Morphological Analyser for Tamil. ICON 2002,
    2002.
  • 4. T.Dhanabalan, K.Saravanan, and T.V.Geetha.
    2002. Tamil to UNL Enconverter, ICUKL, Goa,
    India.
  • 5. Andrew, T. and S. Falk. User performance
    versus precision measures for simple search
    tasks. In 29th Annual international ACM SIGIR
    Conference on Research and Development in
    information Retrieval 2006. Seattle, Washington,
    USA.

34
Template Based MultiLingual Summary Generation
  • Subalalitha C.N
  • E.Umamaheswari
  • T V Geetha
  • Ranjani Parthasarathi
  • Madhan Karky

35
Aim
  • To generate a multi lingual summary using based
    on Universal Networking Language (UNL) Framework

36
The Architechture
37
Multi Lingual Summary Generation using UNL
  • Template based Information Extraction
  • Seven tourism specific templates have been
    designed and used
  • Templates filled using semantic information
    inherent in UNL input graphs
  • Template information is language independent and
    can be used with any desired language.

38
Example Templates for Tourism Domain
Template Semantics inherited from UNL
God iofgtgod, iofgtgoddess, iclgtgod
Food iclgtfood, iclgtfruit
Flaura and Fauna iclgtanimal, iclgtreptile, iclgtmammal, iclgt plant
Boarding facility iclgtfacility
Transport facility iclgttransport
Place iclgtplace, iofgtplace, iofgtcity, iofgtcountry
Distance icl gtunit , icl gtnumber
39
SummaryGeneration
  • The template information is converted to target
    language using respective UNL-target language
    dictionaries.
  • UNL-target language dictionaries contains root
    words.
  • Natural language term from the root word is
    obtained using target language information like
    case suffixes and language technology tools like
    morphological generator
  • (???????????????????)
  • When these converted template information is
    fitted into target language specific dynamic
    sentence patterns, a summary is generated.

40
Performance Evaluation
  • Tested with 33,000 Tamil and English text
    documents enconverted to UNL graphs.
  • The performance of the methodology proposed has
    been evaluated using human judgement.
  • The accuracy of the summary generated has
    achieved 90 .
  • Further Enhancements
  • Query specific summary
  • Comparing the performance with human generated
    summaries.

41
References
  • 1 Elanchezhiyan K, T V Geetha, Ranjani
    Parthasarathi Madhan Karky, CoRe Concept
    Based Query Expansion, Tamil Internet Conference,
    Coimbatore, 2010.
  • 2 Alkesh Patel , Tanveer Siddiqui , U. S.
    Tiwary , A language independent approach to
    multilingual text summarization, Conference
    RIAO2007, Pittsburgh PA, U.S.A. May 30-June
    1,2007
  • 3David Kirk Evans, Identifying Similarity in
    Text Multi-Lingual Analysis for Summarization ,
    Doctor of Philosophy thesis, Graduate School of
    Arts and Sciences , Columbia University, 2005
  • 4 Radev, Allison, Blair-Goldensohn et al
    (2004), MEAD a platform for multidocument
    multilingual text summarization
  • 5 The Universal Networking Language (UNL)
    Specifications Version 3 Edition 3, UNL Center
    UNDL Foundation December 2004.
  • Jagadeesh J, Prasad Pingali, Vasudeva Varma,
    Sentence Extraction Based Single Document
    Summarization Workshop on Document
    Summarization, March, 2005, IIIT Allahabad.
  • 7 Naresh Kumar Nagwani, Dr. Shrish Verma , A
    Frequent Term and Semantic Similarity based
    Single Document Text Summarization Algorithm
    International Journal of Computer Applications
    (0975 8887) Volume 17 No.2, March 2011 .
  • 8Prof. R. Nedunchelian, Centroid Based
    Summarization of Multiple Documents Implemented
    using Timestamps First International Conference
    on Emerging Trends in Engineering and Technology,
    IEEE 2008
Write a Comment
User Comments (0)
About PowerShow.com