Title: language technology ciil
1language technology _at_ciil
language technology _at_ciil
Prof. Udaya Narayan Singh DIRECTOR
2Central Institute of Indian Languages
Set up on July 17, 1969 Located in Mysore,
Karnataka
3Overall Structure
- Functions under the Department of Secondary
Higher - Education, Ministry of Human Resource
Development - Guided by a Governing Committee chaired by
the - Honble HRM
- Headed by a Director
- Assisted by seven Deputy Directors
- Supported by Seven Principals of RLCs
- Administered with the help from an Assistant
Director - (Administration)
4 Main Objectives
- Advices and Assists both Central State Govts in
the - matter of language
- Promotes all Indian languages by creating content
and - corpus
- Protects and Documents Minor, Minority and
Tribal - languages
5- CCCK program for officials in Karnataka
- Radio courses in Hindi for listeners
- Offers 3-months Courses in Communication
- Orientation Courses for Mother-tongue teachers
- Refresher Courses under Academic Staff College
- Organizes more than 100 Intl national
- seminars/workshops
6Regional Language Centres
Promote Linguistic harmony by teaching 15 Indian
languages to non-native learners
- 10 months L2 teaching 8000 teachers trained
- National Integration Camps and Refresher courses
- Distance Courses in Tamil/Telugu/Bengali/Urdu
- Originally conceived of only four RLCs in four
- corners of India with following aims
- NRLC at Patiala to handle Kashmiri, Urdu
Panjabi
7Regional Language Centres
- SRLC, Mysore to handle all four Dravidian
languages - WRLC at Pune to handle Marathi, Sindhi Gujarati
- ERLC to handle Oriya Bengali Assamese
- Later two more were added in 1973, UTRC
- at Solan in 1981, UTRC at Lucknow.
- Latest addition being the NERLC at Gauhati, 1999
8Human Resource
- Language Specialists 88
- Information Scientists 12
- Hardware Persons 05
- Software Persons 21
- Engineers/LLTs 07
- Supporting Staff 125
9Own printing press with all the facilities
Published 515 books
- 22 Grammars
- 30 Intensive Courses
- 24 L2-Textbooks
- 5 Common Vocab.
- 18 Dictionaries
- 49 Apni Boli (for KVS)
- 15 Pict. Glossaries
- 16 Literacy
- 12 Folklore
- 12 Rhymes/Lg-Games
- 18 Proceedings
- 9 Bibliographies, etc.
10 Some other achievements
Archived data of 118 languages Studied 80
Tribal/Border languages Cassette Courses in Four
Language Kashmiri on the net Link Radio courses
in Hindi through Kannada
11Hardware
- 150-node LAN set up at CIIL and separate 10 node
- LANs at NRLC and ERLC
- Itanium Web server and database server at CIIL
for - launching sites
- High speed V-SAT connection through STPI
- Analog audiotick computerized lab at SRLC and
ERLC - Digital audiotick computerized labs at NRLC
- 2400 Electronic Journals acquired for CIIL RLCs
- Browsing section in the library
12Web based language resources Spoken language
corpus
Speech Science lab has following Hardware and
Software
Computerized Speech Lab. Model 4100Developed by
Kay Elemetrics Corp.
Lincoln Park, N. J. 07035-1488. Software
(dependent on CSL Hardware)
13Web based language resources Spoken language
corpus
1.Computerized Speech Lab Main Programme
Version 2.5.22.Real-Time Spectrogram, Model
5129, Version 2.5.2 3.Video Phonetics Program
and Database, Model 5150, Version
2.5.24.Multi-Dimensional Voice Program, Model
5105, Version 2.5.25.Multi-Dimensional Voice
Program Advanced, Model 5105, Version
2.5.26.Real-Time Pitch, Model 5121, Version
2.5.27.Analysis Synthesis Laboratory, Model
5104, Version 2.5.2
14Web based language resources Spoken language
corpus
Software (without any hardware dependency)
1.Multi-Speech Signal Analysis Workstation,
Model 3700, Version 2.5.22.Real-Time
Spectrogram, Model 5129, Version 2.5.2 3.Video
Phonetics Program and Database, Model 5150,
Version 2.5.24.Real-Time Pitch, Model 5121,
Version 2.5.25.Analysis Synthesis Laboratory,
Model 5104, Version 2.5.2
CD-ROMSpeech Production and Perception
(CD-ROM Developed by Sensimetrics)
15Web based language resources Spoken language
corpus
Branches of study in Speech Science
- Articulatory Phonetics
- Experimental Phonetics
- Biological Clinical Linguistic
- Speech Technology
- Forensic Phonetics
16Web based language resources Spoken language
corpus
Phonetic Readers
Angami , Ao-Naga , Balti ,Bengali , Brokskat,
Gojri , Gujarati ,Kashmiri , Khasi , Kota ,
Kurux , Kuvi, Ladakhi, Lotha ,Manipuri , Mishmi
, Mundari Sema , Shina ,Tangkhul-Naga ,Thaadou
,Tripuri
17Web based language resources Spoken language
corpus
Major Events
- International institute of phonetics
- Seminar Cum Workshop On Voice Modulation
- And Culture
- Workshop On Aspiration
- Seminar On Voice Quality
- Workshop On Nasalization
- Workshop On Multilingual Speech Analysis
- And Synthesis
- Instrumental Analysis Of Phonetic Features
Across - Major Indian Languages
- Analysis Of Retroflex Sounds etc
18Web based language resources Spoken language
corpus
Training / orientation programmes in phonetics
for the teachers from
- Tamil Nadu
- Uttar Pradesh
- Arunachal Pradesh
- Bihar
- Haryana
- Himachal Pradesh
- Jammu Kashmir
- Madhya Pradesh
- Rajasthan
www.ciil-spokencorpus.net
19Web based language resources
Text corpora in major and minor Indian languages
http//www.ciilcorpora.net
Web based Indian Languages Grammars
http//www.ciilgrammars.org
Web based Indian Language Courses
http//www.bangla-online.info/
Web based books and journals
http//www.ciil-ebooks.net/
20Web based Translation services
http//www.anukriti.net/
In collaboration with Sahitya Akademi NBT
Eelectronic journal - Translation Today and
Tools for translation
- Electronic dictionaries
- Annotated corpus tools
- Parallel corpora
- Translational dictionaries
- Cultural Glossaries
- Thesauri
- Word finders
- Technical terminologies
21Linguistic Data Consortium for Indian Languages
(LDC-IL)
Takes advantage of the giant strides in
Information Technology
Model Linguistic Data Consortium (LDC) hosted by
the University of Pennsylvania, USA. Budget One
crore per year and ten crore for ten years.
Funds by the Ministry of Human Resource
Development
Preliminary discussion held in International
Workshop on Creation of Linguistic Data
Consortium for Indian Languages on August 16-17,
2003. Meeting of the lead institutions to create
LDC-IL on August 18, 2003 at IISc, Bangalore.
22LDC-IL will focus on
Becoming a repository of linguistic resources in
all Indian languages in the form of text, speech
and lexical corpora. Facilitating creation of
such databases by different member
organizations. Setting standards for data
collection and storage of corpora for different
research and development activities. Supporting
development and sharing of tools for data
collection and management.
23LDC-IL
Facilitating training through workshops, seminars
etc. in technical as well as process related
issues. Creating and maintaining the LDC-IL
website that would be the primary gateway for
accessing LDC-IL resources. Designing or
providing help in creation of appropriate
language technology for mass use. Providing the
necessary linkages between academic institutions,
individual researchers and the masses
24LDC-IL
Major areas of languages covered
- Speech corpora
- Handwritten corpora
- Text corpora including parallel corpora
- Natural Language Processing
- Several by-products like lexicon, thesauri etc.,
25LDC-IL
Participating Institutions
- Indian Institute of Science, Bangalore,
- Indian Institute of Technology, Bombay,
- Indian Institute of Technology, Madras,
- International Institute of Information
Technology, Hyderabad - ISI Calcutta TIFR Mumbai HP Labs India BM
C-DOT - C-DAC Tata InfoTechAll other IITs KHS NCPUL
Rashtriya
Sanskrit Sansthan TDIL, MIT
26LDC-IL
All academic institutes, research organizations
and Corporate RD groups from India and abroad
working on Indian languages will be encouraged to
participate in LDC-IL. Different Indian
Universities with major departments of
Linguistics and computer science/Artificial
Intelligence
27Web Based Language Information Services
- General Information
- Language/ Area Profile
- Geolinguistic Sociolinguistic Cultural
Literary - Language/Area History
- Genealogical Archaeological Cultural
Textual - Language Vitality
- Attitudinal Utilitarian Socio-political
- Referential
- Grammatical Information
- Phonetic Graphemic Phonological
- Morphological Lexical
Syntactic Semantic - Stylistic
- Biblio search
Link to LIS site
28Website for Modern Indian Literary Classics in
Translation
- In collaboration with Sahitya Akademi and NBT
-
- To promote the celebrated Indian fiction writers
during - the last 150 years both within
- the country and abroad through a series of
initiatives. - A library of 100 major contemporary fiction
writing - in English and several
- Other European languages.
29Digital Library and Manu scriptorium
Special Library with linguistics and allied
disciplines as focus
- Over 65000 books
- Subscription to over 270 journals
- Subscription to 4200 online journals
- Back volumes of all the journals
- RLC 7 libraries with collection in Indian
languages - Has CDs (worth 50 lakhs) in Indian languages
- in digital form
- Library automation through VTLS package
30- Bhasa-Bharati will have display galleria as well
as - scanned copies of writings.
- Audio and video tapes of interviews,
- Lectures notes and recordings
- Their own as well as professional recitations.
- Films and tele-films and serials.
- Documentaries.
31Website for Modern Indian Literary Classics in
Translation
Bhasha Bharati
will also house and create hyper-texts of Indian
languages classics. It will provide a service to
common people who may either visit here actually
or virtually and seek answers to their questions
and queries. It will handle questions on
different topics, ranging from knowledge and
interpretation of a literary or religious text,
or to seek information on a speech group or even
on a word or an expression.
32Website for Modern Indian Literary Classics in
Translation
Web based information on Indian Scripts
Linguistic Integration Project of India
Aim LIPIKA will promote greater understanding
among Indian people, produce useful learning
materials, create web-based information. LIPIKA
will show unity in India's apparently diverse
writing systems. LIPIKA will also help
generate softwares with necessary tools like
spell-checkers and grammar checkers. 25
33Website for Modern Indian Literary Classics in
Translation
Task .1
Preparation of a brief history of various
writing systems of India, such as Brahmi,
Kharosthi, etc. a learners' manual (aimed at
both foreigners and Indians) into the structure
of syllabic writing systems as prevalent in
India, including a comparison of apparently
divergent scripts used by Indian languages today.
34Task.2
(a) Preparation of a CD/Video version of the
Learners' manual, based on the expertise of
C-DAC/NCST/CIIL (b) Making the learning
software in the public domain, for propagation of
Indian writing systems.
Task.3
(a) Creation of new fonts and images in respect
of Deva-nagari and a few other major Indian
writing systems through a series of workshops
(i) calligraphists, (ii) print making
experts, (iii) computer experts, (iv) creative
persons
35Website for Modern Indian Literary Classics in
Translation
Some of the important collaborators of CIIL
- All IITs, IIIT Hyderabad, IISc.,
- Government of Karnataka
- Andaman Nicobar
- Administration
- Government of Singapore
- Lancaster University
- SASNET
- SIDA
- MGI-CIIL from
- Mauritius
- SchoolNet
- NCPUL and many
- more
36Website for Modern Indian Literary Classics in
Translation
- Sahitya Akademi
- Konkani Academy
- Dogri Sansthan
- Karnataka Nataka
- Rangayana
- CHD
- HP Labs
- NSOU
- University of Hyderabad
- NEHU
- Delhi Univ-
- NBT
Directors Speech
37DIRECTORS SPEECH