language technology ciil - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

language technology ciil

Description:

Angami , Ao-Naga , Balti ,Bengali , Brokskat, Gojri , ... Shina ,Tangkhul-Naga ,Thaadou ,Tripuri. Web based language resources. Spoken language corpus ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 38
Provided by: pri5177
Category:

less

Transcript and Presenter's Notes

Title: language technology ciil


1
language technology _at_ciil
language technology _at_ciil
Prof. Udaya Narayan Singh DIRECTOR
2
Central Institute of Indian Languages
     Set up on July 17, 1969 Located in Mysore,
Karnataka  
3
Overall Structure
  •    Functions under the Department of Secondary
    Higher
  • Education, Ministry of Human Resource
    Development
  •   Guided by a Governing Committee chaired by
    the
  • Honble HRM
  • Headed by a Director
  • Assisted by seven Deputy Directors
  • Supported by Seven Principals of RLCs
  • Administered with the help from an Assistant
    Director
  • (Administration)

4
Main Objectives
  • Advices and Assists both Central State Govts in
    the
  • matter of language
  • Promotes all Indian languages by creating content
    and
  • corpus
  • Protects and Documents Minor, Minority and
    Tribal
  • languages

5
  • CCCK program for officials in Karnataka
  • Radio courses in Hindi for listeners
  • Offers 3-months Courses in Communication
  • Orientation Courses for Mother-tongue teachers
  • Refresher Courses under Academic Staff College
  • Organizes more than 100 Intl national
  • seminars/workshops

6
Regional Language Centres
Promote Linguistic harmony by teaching 15 Indian
languages to non-native learners
  • 10 months L2 teaching 8000 teachers trained
  • National Integration Camps and Refresher courses
  • Distance Courses in Tamil/Telugu/Bengali/Urdu
  •  Originally conceived of only four RLCs in four
  • corners of India with following aims
  • NRLC at Patiala to handle Kashmiri, Urdu
    Panjabi

7
Regional Language Centres
  • SRLC, Mysore to handle all four Dravidian
    languages
  • WRLC at Pune to handle Marathi, Sindhi Gujarati
  • ERLC to handle Oriya Bengali Assamese
  • Later two more were added in 1973, UTRC
  • at Solan in 1981, UTRC at Lucknow.
  • Latest addition being the NERLC at Gauhati, 1999

8
Human Resource
  • Language Specialists 88
  • Information Scientists 12
  • Hardware Persons 05
  • Software Persons 21
  • Engineers/LLTs 07
  • Supporting Staff 125

9
Own printing press with all the facilities
Published 515 books
  • 22 Grammars
  • 30 Intensive Courses
  • 24 L2-Textbooks
  • 5 Common Vocab.
  • 18 Dictionaries
  • 49 Apni Boli (for KVS)
  • 15 Pict. Glossaries
  • 16 Literacy
  • 12 Folklore
  • 12 Rhymes/Lg-Games
  • 18 Proceedings
  • 9 Bibliographies, etc.

10
Some other achievements
Archived data of 118 languages Studied 80
Tribal/Border languages Cassette Courses in Four
Language Kashmiri on the net Link Radio courses
in Hindi through Kannada
11
Hardware
  • 150-node LAN set up at CIIL and separate 10 node
  • LANs at NRLC and ERLC
  • Itanium Web server and database server at CIIL
    for
  • launching sites
  • High speed V-SAT connection through STPI
  • Analog audiotick computerized lab at SRLC and
    ERLC
  • Digital audiotick computerized labs at NRLC
  • 2400 Electronic Journals acquired for CIIL RLCs
  • Browsing section in the library

12
Web based language resources Spoken language
corpus
Speech Science lab has following Hardware and
Software
Computerized Speech Lab. Model 4100Developed by
Kay Elemetrics Corp.                      
Lincoln Park, N. J. 07035-1488. Software
(dependent on CSL Hardware)
13
Web based language resources Spoken language
corpus
1.Computerized Speech Lab Main Programme  
Version 2.5.22.Real-Time Spectrogram, Model
5129, Version 2.5.2 3.Video Phonetics Program
and Database, Model 5150, Version
2.5.24.Multi-Dimensional Voice Program, Model
5105, Version 2.5.25.Multi-Dimensional Voice
Program Advanced, Model 5105, Version
2.5.26.Real-Time Pitch, Model 5121, Version
2.5.27.Analysis Synthesis Laboratory, Model
5104, Version 2.5.2
14
Web based language resources Spoken language
corpus
Software (without any hardware dependency)
1.Multi-Speech Signal Analysis Workstation,
Model 3700, Version 2.5.22.Real-Time
Spectrogram, Model 5129, Version 2.5.2 3.Video
Phonetics Program and Database, Model 5150,
Version 2.5.24.Real-Time Pitch, Model 5121,
Version 2.5.25.Analysis Synthesis Laboratory,
Model 5104, Version 2.5.2
CD-ROMSpeech Production and Perception
(CD-ROM Developed by Sensimetrics)
15
Web based language resources Spoken language
corpus
Branches of study in Speech Science
  • Articulatory Phonetics
  • Experimental Phonetics
  • Biological Clinical Linguistic
  • Speech Technology
  • Forensic Phonetics

16
Web based language resources Spoken language
corpus
Phonetic Readers
Angami , Ao-Naga , Balti ,Bengali , Brokskat,
Gojri , Gujarati ,Kashmiri , Khasi , Kota ,
Kurux , Kuvi, Ladakhi, Lotha ,Manipuri , Mishmi
, Mundari Sema , Shina ,Tangkhul-Naga ,Thaadou
,Tripuri
17
Web based language resources Spoken language
corpus
Major Events
  •  International institute of phonetics
  • Seminar Cum Workshop On Voice Modulation
  • And Culture
  • Workshop On Aspiration
  • Seminar On Voice Quality
  • Workshop On Nasalization
  • Workshop On Multilingual Speech Analysis
  • And Synthesis
  • Instrumental Analysis Of Phonetic Features
    Across
  • Major Indian Languages
  • Analysis Of Retroflex Sounds etc

18
Web based language resources Spoken language
corpus
Training / orientation programmes in phonetics
for the teachers from
  • Tamil Nadu
  • Uttar Pradesh
  • Arunachal Pradesh
  • Bihar
  • Haryana
  • Himachal Pradesh
  • Jammu Kashmir
  • Madhya Pradesh
  • Rajasthan

www.ciil-spokencorpus.net
19
Web based language resources
Text corpora in major and minor Indian languages
http//www.ciilcorpora.net
Web based Indian Languages Grammars
http//www.ciilgrammars.org
Web based Indian Language Courses
http//www.bangla-online.info/
Web based books and journals
http//www.ciil-ebooks.net/
20
Web based Translation services
http//www.anukriti.net/
In collaboration with Sahitya Akademi NBT
Eelectronic journal - Translation Today and
Tools for translation
  • Electronic dictionaries
  • Annotated corpus tools
  • Parallel corpora
  • Translational dictionaries
  • Cultural Glossaries
  • Thesauri
  • Word finders
  • Technical terminologies

21
Linguistic Data Consortium for Indian Languages
(LDC-IL)
Takes advantage of the giant strides in
Information Technology
Model Linguistic Data Consortium (LDC) hosted by
the University of Pennsylvania, USA. Budget One
crore per year and ten crore for ten years.
Funds by the Ministry of Human Resource
Development
Preliminary discussion held in International
Workshop on Creation of Linguistic Data
Consortium for Indian Languages on August 16-17,
2003. Meeting of the lead institutions to create
LDC-IL on August 18, 2003 at IISc, Bangalore.
22
LDC-IL will focus on
Becoming a repository of linguistic resources in
all Indian languages in the form of text, speech
and lexical corpora. Facilitating creation of
such databases by different member
organizations. Setting standards for data
collection and storage of corpora for different
research and development activities. Supporting
development and sharing of tools for data
collection and management.
23
LDC-IL
Facilitating training through workshops, seminars
etc. in technical as well as process related
issues. Creating and maintaining the LDC-IL
website that would be the primary gateway for
accessing LDC-IL resources. Designing or
providing help in creation of appropriate
language technology for mass use. Providing the
necessary linkages between academic institutions,
individual researchers and the masses
24
LDC-IL
Major areas of languages covered
  • Speech corpora
  • Handwritten corpora
  • Text corpora including parallel corpora
  • Natural Language Processing
  • Several by-products like lexicon, thesauri etc.,

25
LDC-IL
Participating Institutions
  • Indian Institute of Science, Bangalore,
  • Indian Institute of Technology, Bombay,
  • Indian Institute of Technology, Madras,
  • International Institute of Information
    Technology, Hyderabad
  • ISI Calcutta TIFR Mumbai HP Labs India BM
    C-DOT
  • C-DAC Tata InfoTechAll other IITs KHS NCPUL
    Rashtriya
    Sanskrit Sansthan TDIL, MIT

26
LDC-IL
All academic institutes, research organizations
and Corporate RD groups from India and abroad
working on Indian languages will be encouraged to
participate in LDC-IL. Different Indian
Universities with major departments of
Linguistics and computer science/Artificial
Intelligence
27
Web Based Language Information Services
  • General Information
  • Language/ Area Profile
  • Geolinguistic Sociolinguistic Cultural
    Literary
  • Language/Area History
  • Genealogical Archaeological Cultural
    Textual
  • Language Vitality
  • Attitudinal Utilitarian Socio-political
  • Referential
  • Grammatical Information
  • Phonetic Graphemic Phonological
  • Morphological Lexical
    Syntactic Semantic
  • Stylistic
  • Biblio search

Link to LIS site
28
Website for Modern Indian Literary Classics in
Translation
  • In collaboration with Sahitya Akademi and NBT
  •  
  • To promote the celebrated Indian fiction writers
    during
  • the last 150 years both within
  • the country and abroad through a series of
    initiatives.
  • A library of 100 major contemporary fiction
    writing
  • in English and several
  • Other European languages.

29
Digital Library and Manu scriptorium
Special Library with linguistics and allied
disciplines as focus
  • Over 65000 books
  • Subscription to over 270 journals
  • Subscription to 4200 online journals
  • Back volumes of all the journals
  • RLC 7 libraries with collection in Indian
    languages
  • Has CDs (worth 50 lakhs) in Indian languages
  • in digital form
  • Library automation through VTLS package

30
  • Bhasa-Bharati will have display galleria as well
    as
  • scanned copies of writings.
  • Audio and video tapes of interviews,
  • Lectures notes and recordings
  • Their own as well as professional recitations.
  • Films and tele-films and serials.
  • Documentaries.

31
Website for Modern Indian Literary Classics in
Translation
Bhasha Bharati
will also house and create hyper-texts of Indian
languages classics. It will provide a service to
common people who may either visit here actually
or virtually and seek answers to their questions
and queries. It will handle questions on
different topics, ranging from knowledge and
interpretation of a literary or religious text,
or to seek information on a speech group or even
on a word or an expression.
32
Website for Modern Indian Literary Classics in
Translation
Web based information on Indian Scripts
Linguistic Integration Project of India
Aim LIPIKA will promote greater understanding
among Indian people, produce useful learning
materials, create web-based information. LIPIKA
will show unity in India's apparently diverse
writing systems. LIPIKA will also help
generate softwares with necessary tools like
spell-checkers and grammar checkers. 25
33
Website for Modern Indian Literary Classics in
Translation
Task .1
Preparation of a brief history of various
writing systems of India, such as Brahmi,
Kharosthi, etc. a learners' manual (aimed at
both foreigners and Indians) into the structure
of syllabic writing systems as prevalent in
India, including a comparison of apparently
divergent scripts used by Indian languages today.
34
Task.2
(a) Preparation of a CD/Video version of the
Learners' manual, based on the expertise of
C-DAC/NCST/CIIL (b) Making the learning
software in the public domain, for propagation of
Indian writing systems.
Task.3
(a) Creation of new fonts and images in respect
of Deva-nagari and a few other major Indian
writing systems through a series of workshops
(i)  calligraphists, (ii) print making
experts, (iii) computer experts, (iv) creative
persons
35
Website for Modern Indian Literary Classics in
Translation
Some of the important collaborators of CIIL
  • All IITs, IIIT Hyderabad, IISc.,
  • Government of Karnataka
  • Andaman Nicobar
  • Administration
  • Government of Singapore
  • Lancaster University
  • SASNET
  • SIDA
  • MGI-CIIL from
  • Mauritius
  • SchoolNet
  • NCPUL and many
  • more

36
Website for Modern Indian Literary Classics in
Translation
  • Sahitya Akademi
  • Konkani Academy
  • Dogri Sansthan
  • Karnataka Nataka
  • Rangayana
  • CHD
  • HP Labs
  • NSOU
  • University of Hyderabad
  • NEHU
  • Delhi Univ-
  • NBT

Directors Speech
37
DIRECTORS SPEECH
Write a Comment
User Comments (0)
About PowerShow.com