What is the expected category of contribution: CLIR presentation

About This Presentation

Title:

What is the expected category of contribution: CLIR

Description:

Technical Capabilities : NLP Lab equipped with necessary software ... This will be based on hybrid approach of Rule based and dictionary based ... –

Number of Views:28

Avg rating:3.0/5.0

Slides: 11

Provided by: tdilM

more less

Transcript and Presenter's Notes

Title: What is the expected category of contribution: CLIR

1
Title Cross Lingual Information Retrieval for
Indian Languages
Proposer CDAC Noida
Name of the company CDAC Noida
Language/Language pair English, Hindi, Punjabi
and Marathi
What is the expected category of contribution
CLIR
2

Strength of CDAC Noida
Technical Capabilities NLP Lab equipped with
necessary software
and 50
trained software professional
Manpower Involved
Technical
K K Arora
Vijay Kumar
Manish Kumar
Pragdeeshvaran
Linguistic Support
Prof K K Goswami
Prof Thakur Das
Mrs Ragini
Prof Jagannathan, Mr Chandra Mohan and others

3
List previous collaboration with universities/RD
institutions

IIT Kanpur CHD, Delhi CSIO, Chandigarh
ISI Kolkata Kendriya Hindi Sansthan
DRDO, Delhi Delhi Press Prakashan Pustak
Mahal
IISc Bangalore CSTT, New Delhi COCOSDA, Japan
Sahitya Akademi IIT Roorkee ELDA, France
MGAHV, Wardha Abbyy, Russia W3C
Jamia-Milia GKV, Hardwar BITS Pilani
GBPUAT, PantNagar Kumaon Univ, Nainital
Banasthali Vidyapeeth

Previous work done in this or similar areas
Machine Translation
Parallel Corpus for Indian languages
Dictionaries / Terminologies like Shabdika,
Lexicon for
MAT, IT Terminology
Prototype development for CLIR

5
Need of the CLIR Indian Context

Availability of Content on Internet in multiple
Indian languages
People in India generally are familiar with more
than one language
To retrieve the related information that may be
available in any of the known set of languages by
querying in one of these languages.

6
Block diagram for CLIR
7
Sample entry in Dictionary
8
Language Resources needed CLIR

Dictionary of words in English, Hindi, Punjabi
and Marathi
Root words dictionaries
Database of phrases and collocations
A stop word lists in Hindi, Marathi and Punjabi
List of Hyphenated words
Proper Names database

9
Language Tools needed CLIR

Inflator/Stemmer routines for English, Hindi,
Punjabi and Marathi
Transliteration routine for language pairs of
targeted languages. This will be based on hybrid
approach of Rule based and dictionary based

Inflator/Stemmer routines for English, Hindi,
Punjabi and Marathi
Transliteration routine for language pairs of
targeted languages. This will be based on hybrid
approach of Rule based and dictionary based
N-gram Analyzer tool
Spell variant generator
Proper name extractor from Parallel corpus

Thank You

Write a Comment

User Comments (0)