CENTRAL INSTITUTE OF INDIAN LANGUAGES - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CENTRAL INSTITUTE OF INDIAN LANGUAGES

Description:

Word-mongers talk nineteen to the dozen. Word-lords don't tell ... 9 Bibliographies. 12 Rhymes/Lg Games. 16 Proceedings. 6. The Challenge before CIIL: Enormous ... – PowerPoint PPT presentation

Number of Views:282
Avg rating:3.0/5.0
Slides: 25
Provided by: thedirecto
Category:

less

Transcript and Presenter's Notes

Title: CENTRAL INSTITUTE OF INDIAN LANGUAGES


1
1st International Conference
In association with
CIIL-Mysore, IIT-Mumbai, IIIT-Hyderabad
2
Words unite people. Words can divide nations
they indulge in war of words
  • Word-smiths fashion texts
  • Word-mongers talk nineteen to the dozen
  • Word-lords dont tell you that they
    double-speak
  • Word-poets open the inner abyss of lanes
    bye-lanes of meaning
  • And so do WordNets

Which is why we are all here!
3
Welcome to 1st Global WordNet Conference
MY ADDRESS HAS TWO PARTS
  • First, I shall tell you a little about what the
    Indian linguistic scene is like, and what we at
    CIIL have been doing
  • Then, we will offer our suggestions on what we in
    India could do in WordNet

4
CENTRAL INSTITUTE OF INDIAN LANGUAGES
  • maVr mfm gñWmZejm dmJ, maV gaHma
  • Initiatives in
  • LANGUAGE TECHNOLOGY

5
CIIL in the first three decades
  • Equipping
  • Language
  • teachers and
  • Analysts
  • technologically

6
1. An Apex Institution under Languages Division,
MHRD
  • In July 2001, 32 years completed
  • This 287-people institution works for development
    of Indian languages.
  • CIIL has five Centers with Research Groups (16)
    and Service Groups (6).
  • 7 Regional Language Centers are at Bhubaneswar,
    Guwahati, Lucknow, Mysore, Patiala, Pune,
    Solan.

7
2. Four Main Objectives
  • 1. Develops languages by creating content,
    corpus, techniques and technologies.
  • 2. Protects Documents Minority Tribal
    languages
  • 3. Creates linguistic harmony by teaching 15
    Indian tongues to non-native learners.
  • 4. Above all, advices both Central and State
    governments on matters related to language.

8
3. Functionality and Multi-disciplinarity
  • Although the mainstay are Indian Languages
    Linguistics, the focus of all projects and
    programmes is on developing materials products
    in print, audio, video and computational.
  • In addition, there is enough interest in Comp.
    Lit, Education, Language Technology NLP,
    Folklore, Geography, Statistics
    Psychology,Sociology Translation

9
4. Coverage of CIIL - sizable
  • Archived 118 lgs data
  • Creating Voice Corpora
  • Studied 80 Tribal lgs
  • 35 grammars on-line soon
  • Published 490 books
  • Cassette Courses in Assamese, Urdu, Bengali
    Kashmiri Marathi
  • Radio courses in Hindi through Kannada

10
5. Major Publications 490 books all produced
in-house
  • 22 Grammars
  • 30 Intensive Courses
  • 24 2nd Lg Textbooks
  • 5 Common Vocab.
  • 18 Dictionaries
  • 49 Apni Boli (KVS)
  • 15 Pictorial Glossaries
  • 16 Literacy Books
  • 12 Folklore
  • 9 Bibliographies

12 Rhymes/Lg Games 16 Proceedings
11
6. The Challenge before CIIL Enormous
12
A truly plural world of languages
  • 1,576 rationalized mother-tongues
  • 1,796 other mother-tongues
  • 114 languages with 10,000 speakers
  • Large variation Hindi (337 m) to Maram of
    Manipur with 10,144
  • Large non-scheduled lgs - Bhili (6 m) and Santali
    (5 m)
  • 146 radio lgs/69 school lgs /35 lg dailies.

13
7. Programs - Modes of Delivery
  • 10 months L2 teaching 8000 teachers trained
  • Distance Courses in Tamil/Telugu/Bengali/Urdu
  • On-line Programs in 15 Indian languages
  • Kannada for officials in Karnataka
  • Radio courses with AIRs collaboration
  • 3-months Courses in Communication
  • Orientation for Mother-tongue teachers
  • Refresher Courses in Linguistics
  • NLP Training modules

14
8. Language Technology Further Goals
  • Enlargement of 3-million word Corpora
  • 100 m word corpora for Hindi-Urdu
  • Multilingual multidirectional E- Dictionaries
  • On-line Administrative Glossaries
  • Lexical databases for MT Programs
  • Tagging Corpus Tools
  • E-Zines and E-Journals
  • Language Information Services
  • Anukriti Web-based Translation services

15
9 Indian Lgs IT at CIIL
  • 132-node LAN set up
  • V-SAT through STPI
  • Brousing centre
  • Has 2400 E-Journals 350 paper journals.
  • Collaborating with Schoolnet for electronic
    materials
  • New generation Lg Labs
  • Focus Visual Phonetics

16
10. LIS-India Website
  • Type Language Name
  • Type Area Name
  • Home or http//www.ciil.org/
  • General Information
  • Language/ Area Profile
  • Geolinguistic Sociolinguistic Cultural
    Literary
  • Language/Area History
  • Genealogical Archaeological Cultural
    Textual
  • Language Vitality
  • Attitudinal Utilitarian Socio-political
    Referential
  • Grammatical Information
  • Phonetic Graphemic Phonological
    Morphological Lexical
  • Syntactic Semantic Stylistic
  • Biblio search

17
11. Anukriti A Translation with NBT/SA
  • Electronic lexicon
  • Corpus tools
  • Parallel corpora
  • Cultural Glossaries
  • Thesauri
  • Word finders
  • WordNets
  • WEB-BASED SERVICE SITE called ANUKRUTI.
  • To be maintained with NBT/Sahitya Akademi
  • E-journals
  • Technological Tools

18
12. Bhasha Bharati Project
To be set up in collaboration with
  • Sahitya Akademi
  • Sangeet Natak Academy
  • All India Radio
  • Doordarshan
  • National Library
  • National Archive
  • National Book Trust
  • Major TV Channels
  • Films Division
  • Major Newspaper houses
  • Numerous Foundations
  • Individual writers
  • Heirs of writers
  • Personal libraries
  • Little magazines
  • This rich manuscriptorium will display plural
    literary and linguistic landscape of India.

19
13. Doctoral Programs under planning
  • Already available through 22 Universities
  • Linguistics Psychology
  • Now being planned in
  • NLP
  • Folklore/Communication
  • Translation
  • Indian Gram.Tradition

20
14. Future Programs
  • Dip in Experimental Phonetics
  • Masters by Research in Field Linguistics
  • Courses in Statistical Linguistics
  • Diploma in Translation Studies
  • Dip in Folklore/Comp. Lit. Semiotics
  • Internship in Linguistic Geography
  • Internship in NLP Corpus Linguistics

21
WHAT COULD WE DO TO CREATE AN
22
India has already had a strong lexicographical
tradition
  • Working on WordNet, therefore, should come
    naturally to us.
  • Efforts have already begun as we see in Hindi,
    Tamil, Oriya and a few other languages.
  • There does not seem to be any academic
    coordination, however.
  • Early 20th century Indian linguistics was
    dominated by studies on sound-system and
    etymologies
  • Mid-20th C focussed on word-formation patterns
  • Late 20th C emphasized on syntax

23
We havent so far worked seriously on Lexical
Semantics
  • While Sociolinguistics was a favourite, serious
    Psycholinguistics was almost absent
  • Formal Syntax was highly valued, but intricacies
    of Semantics were not so attractive.
  • Making of Dictionaries continued throughout, but
    major concerted efforts in each language were
    highly individualistic or had happened long ago.
  • While writing softwares or applying them means
    money, and is hence a crowded field, Language
    Technology has so far been neglected.

24
So, what do we need to do now?
  • Create an Indian WordNet Association
  • Work coordinatedly
  • Remember to focus on areal semantic features
    because with so much linguistic cultural
    diversity, India is ideal to test and validate
    the concept of WordNet.
Write a Comment
User Comments (0)
About PowerShow.com