What is the expected category of contribution: CLIR

1 / 10
About This Presentation
Title:

What is the expected category of contribution: CLIR

Description:

Technical Capabilities : NLP Lab equipped with necessary software ... This will be based on hybrid approach of Rule based and dictionary based ... –

Number of Views:28
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: What is the expected category of contribution: CLIR


1
Title Cross Lingual Information Retrieval for
Indian Languages
Proposer CDAC Noida
Name of the company CDAC Noida
Language/Language pair English, Hindi, Punjabi
and Marathi
What is the expected category of contribution
CLIR
2
  • Strength of CDAC Noida
  • Technical Capabilities NLP Lab equipped with
    necessary software
  • and 50
    trained software professional
  • Manpower Involved
  • Technical
  • K K Arora
  • Vijay Kumar
  • Manish Kumar
  • Pragdeeshvaran
  • Linguistic Support
  • Prof K K Goswami
  • Prof Thakur Das
  • Mrs Ragini
  • Prof Jagannathan, Mr Chandra Mohan and others

3
List previous collaboration with universities/RD
institutions
  • IIT Kanpur CHD, Delhi CSIO, Chandigarh
  • ISI Kolkata Kendriya Hindi Sansthan
  • DRDO, Delhi Delhi Press Prakashan Pustak
    Mahal
  • IISc Bangalore CSTT, New Delhi COCOSDA, Japan
  • Sahitya Akademi IIT Roorkee ELDA, France
  • MGAHV, Wardha Abbyy, Russia W3C
  • Jamia-Milia GKV, Hardwar BITS Pilani
  • GBPUAT, PantNagar Kumaon Univ, Nainital
    Banasthali Vidyapeeth

4
  • Previous work done in this or similar areas
  • Machine Translation
  • Parallel Corpus for Indian languages
  • Dictionaries / Terminologies like Shabdika,
    Lexicon for
  • MAT, IT Terminology
  • Prototype development for CLIR

5
Need of the CLIR Indian Context
  • Availability of Content on Internet in multiple
    Indian languages
  • People in India generally are familiar with more
    than one language
  • To retrieve the related information that may be
    available in any of the known set of languages by
    querying in one of these languages.

6
Block diagram for CLIR
7
Sample entry in Dictionary
8
Language Resources needed CLIR
  • Dictionary of words in English, Hindi, Punjabi
    and Marathi
  • Root words dictionaries
  • Database of phrases and collocations
  • A stop word lists in Hindi, Marathi and Punjabi
  • List of Hyphenated words
  • Proper Names database

9
Language Tools needed CLIR
  • Inflator/Stemmer routines for English, Hindi,
    Punjabi and Marathi
  • Transliteration routine for language pairs of
    targeted languages. This will be based on hybrid
    approach of Rule based and dictionary based
  • Inflator/Stemmer routines for English, Hindi,
    Punjabi and Marathi
  • Transliteration routine for language pairs of
    targeted languages. This will be based on hybrid
    approach of Rule based and dictionary based
  • N-gram Analyzer tool
  • Spell variant generator
  • Proper name extractor from Parallel corpus

10
  • Thank You
Write a Comment
User Comments (0)