Title: WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN LANGUAGES
1WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN
LANGUAGES
CIIL, Mysore FEB 5-7, 2005
SIMPLE 05 INAUGURAL LECTURE IIT-Kharagpur
2Professor uses Language Technology to preserve
tribal languages
http//uanews.org/cgi-bin/WebObjects/UANews.woa/wa
/MainStoryDetails?ArticleID9400
- In June last year, an internet news item popped
up in Google, which I think would be important
for all of us. It said - - Professor Susan Penfield with CRIT (Colorado
River Indian Tribe) and UofA support, and with
funding from Bill Melinda Gates Foundation,
worked to preserve two Amerindian languages. - She created and trained tribal members of Mohave
and Chemehyevi in the use of software internet
tools that would support preservation
instruction in these languages. - Such timely steps are important as Mohave has 33
fully fluent speakers they are of plus 70
age-group, and Chemehyevi has 10 speakers of 60
plus age.
The moral? Language technology could be should
be used for smaller languages.
3Objectives of this presentation Two-fold
- I would like to draw your attention to two issues
in my presentation here - The first is a concern for linguists. It is a
challenge for all practitioners of language
management, namely, how do we develop smaller
languages in a diverse space? A space where
number seems to be an important category? - Secondly, given the profile of such smaller and
lesser-known (and often, least cared for)
languages of South Asia, what can language
technology do to alter their status?
4Language Endangerment Relevance of this talk
- As a Bio-Mathematician, Pagel tells us,, language
disappearance is happening at an alarming pace
- If only 10 of the current 6,000 worlds smaller
languages survive, we lose our heritage. Death of
a speaker is understandable. Wiping off of an
entire community is painful. And yet governments
may think it is a wastage of resources. - The sociolinguists tell us that in South Asia,
somehow, language retention is more natural than
language loss. - But even in India, 2nd/3rd generation migrants
adopt other tonguesith them. - They realize albeit late that without their
languages, they are like lost persons, home-less
and social rejects. - On the other hand, it is also true that meanwhile
many new languages keep emerging through split
and merger.
5World-wide Decline Canada
- Over 60 languages were originally spoken but 8
(13) were lost before 1990, and 13 more are
near-extinct now (Kinkade (1991) - 23 languages in Canada are 'endangered' (38)
now, because they have few speakers under 50
years - Only 4 languages may survive in the long run.
- Based on 1996 Canadian census data, Norris (1998)
later shows that the Indigenous languages that
are likely to survive are further down to 3.
6The North American Scenario Indigenous people
now left in Eastern USA
- For North America the number of Indigenous
languages originally spoken were around 300,
(Bright 1994 and Mithun 1999). - Chafe (1962) counted 211 languages as still
living in 1960 - Out of these only 89 (42) had speakers of all
ages - In 1991, 51 (approx. 24) are alive (Zepeda and
Hill 1991). - Campbell (199716) says 80 of them 'will die in
this generation'. - The prediction? Only 20-30 Indigenous languages
will survive by 2040.
7Australia Disturbing Statistics
- Out of about 300 in 1800, there has been a 90
decrease of speakers in all age groups. - Decline rate varies from 100 in 1800 to 13 in
1996. - At this rate, by 2050 there will no longer be any
Indigenous language in Australia. - In absolute terms, there may actually be 55 000
speakers of Indigenous languages there.
ABS 1994 Survey of Aboriginal
8A Real Danger
- Although I do not want to sound like the prophet
of the doom, the facts about language attrition
tell their own stories.
... this vastly reduced reservoir of linguistic
diversity constitutes one of the great
treasures of humanity, an enormous store-house of
expressive power and profound understandings of
the universe. The loss of the hundreds of
languages that have already passed into history
is an intellectual catastrophe in every way
comparable in magnitude to the ecological
catastrophe we face today. (Zepeda and Hill,
1991) Zepeda, O. and J.H. Hill, 1991. The
Condition of Native American Languages in the
United States. In R.H. Robins and E.M. Uhlenbeck,
Robins (editors). Endangered Languages. Oxford
Berg Publishers
9Losses Smaller or bigger?
- Many might say - Lets forget about smaller
losses, if they are inevitable, and rather
concentrate on the bigger groups, and larger
issues. But for the smaller groups, their
struggle for retention of identity is no less
real. They are of paramount importance. - Many subaltern specialists or leaders often open
their gates only to those who are into gender
issues or take up the cause for cultural and
religious minorities. The smaller linguistic
groups are generally pushed to the periphery in
all instances. - Whenever questions on smaller linguistic groups
are raised, Ive heard South Asianists raise this
question almost with a vengeance Are they
dialects or languages? meaning thereby that if
they are dialects, why worry about them? If you
persist, the next predictable question would be
Do they have a script? Even if we get past this
question, the next one is Are they taught in
schools? They appear to me to be ever shifting
stand of the Higgle Piggle Di. - This reminds me of a story from Sukumar Rays
ha-ja-ba-ra-la where the protagonist is
confronts a menacing cat that changes its name
all the time. Here goes the story
10We need to see the point that the problem here
cannot be wished away by changing/redesignating
labels. They remain where they are.
11Which India should we talk about?
- In a talk I had given at Saarbrucken in a
Conference on Peripheral centers and central
peripheries in 2002, while talking about
Another India, I had asked myself, Which India
shall I talk about (since there are many)? It is
still not possible to spell out the priorities
that a language planner must have in the South
Asian context? - Whenever one looks at the space called South
Asia from outside, we need to ask who is showing
you what the space is like? If it is seen either
through cinematic extravaganza like Sanjay Leela
Bhansalis Devdas, or through the texts woven
by our writers who write or rewrite in English,
the space appears like a conundrum to them. - Indian language scenario may often appear like a
universe plotted as a pastiche on a canvas -
remote and diverse at the same time. - However, when we learn about the space through
the Vernacular languages texts, it is an
illuminating experience. - The real India of numerous smaller speech groups
becomes even more strident if we are exposed to
the texts that are generated outside the
twice-born languages (to borrow Meenakshi
Mukherjees tag for the 8th Schedule languages ).
- Obviously, Indian English writing forces a large
part of India to perpetually remain outside the
focus. Our concern is here for those smaller
languages. I am sure other South Asian nations
are probably not an exception to this trend.
12SOUTH ASIA A COMPARATIVE CHART
13Lets try to understand the Spread of the problem
in India
- The Scheduled Tribes each with its own
linguistic heritage -account for 67.76 million
representing 8.08 of our population living
mainly in the forest and hilly regions (1991
Census) - More than 70 of them are in Madhya Pradesh,
Maharashtra, Orissa, Bihar, Andhra Pradesh, West
Bengal and Gujarat. - We not only need special provisions for their
protection from social injustices and all forms
of exploitation, we also need concrete plans to
cater to their educational interests - Although it is useless to raise the issue of
mother tongue as medium of primary and early
secondary education after all the experiments
conducted by a number of sociolinguists and
psychologists of language, and especially after
the 1954 UNESCO document was accepted by most
countries, many still raise this issue whether
they should not be better educated in English or
Hindi or through the language of the majority
community in the region?
14Provisions in India?
- There is a general feeling among those who do not
understand the Indian polity and the
administrative set up that we do not have a
mechanism in place to protect and promote these
minor languages. - They often point to smaller countries like Nepal
where both in Constitutional provisions and in
Universal Education documents these issues are
specifically mentioned. - However, in case of India, the sheer size of the
country and complexity of the administrative set
up are such that the provisions that exist here
(at least, on paper) cannot be compared with
those in other South Asian nation-states.
15Some examples
- 5th 6th Schedule of the Constitution Article
224 made special provision but these are under
Home Ministry - Special representation for the STs in the Lok
Sabha and State legislative assemblies till 25th
January, 2010 (Arts, 330, 332 and 334) also made - Under Articles 164 and 338, separate State-level
and National Commission at the Centre was set up
to promote their welfare and safeguard their
interests - But the Ministry of Tribal Welfare, set up in
October 1999, is the nodal agency. - Commission for Linguistic Minorities (Allahabad)
under the Ministry of Social Justice
Empowerment - And, now the UPA Government is contemplating
setting up of another kind of minority commission.
- A Grant-in-Aid scheme under Article 275(1) was
created - Protection of Civil Rights Act, 1955 and the
Scheduled Castes, Scheduled Tribes (Prevention of
Atrocities) Act 1989 were enacted - Planning Commission took a landmark step by
opening 43 Special Multi-purpose Tribal Blocks
(SMPTBs) during 2nd Five Year Plan, later called
Tribal Development Blocks (TDBs) - Later, under 4th Plan, six projects with Rs.2
crores set up in Andhra Pradesh, Bihar, Madhya
Pradesh and Orissa, and a separate Tribal
Development Agency was established - Such steps have only multiplied in the later
years.
16A lot more needs to be done with concerted focus,
and we believe that in this context technology
can play a major role.
- The Fifth Five Year Plan marked a shift in the
approach when the Tribal Sub Plan (TSP) for
direct benefit of the STs was launched - In 1987, the Tribal Cooperative Marketing
Development Federation (TRIFED) was set up to
provide marketing help to tribals for their minor
forest surplus agricultural produce - The GIA scheme covered 376 NGOs working in this
area, each getting about 90 of their subsistence
and development grants.
BUT ARE ALL THESE ENOUGH?
17There are a large number of non-tribal smaller
linguistic groups in India, and all of them, like
these tribal languages, need at least the
following
Grammars Graded learning materials Literacy
books Dictionaries (general purpose) Thesauri Spec
ialized/Technical Glossary Cultural
documentation Primers Style Manuals Encouragements
to initiate literary activities
But they are not the only ones
18A picture of India that is closer to reality
- 1,576 rationalized mother-tongues
- 1,796 other mother-tongues
- 114 languages with 10,000 plus speakers
- Variation in size Hindi with 337 million to
Maram of Manipur with only 10,144 - 22 Constitutional languages
- Large non-scheduled languages Bhili with 5.57
million speakers
19- 146 speech varieties used in radio network
- 69 used in schools
- 3954 newspapers in 35 languages as in 1971 (The
figure has doubled in 2003, with Hindi (2507),
Urdu (534), English (407), Marathi tamil (395
each) alone (4238) surpassing earlier figure. - 58 languages with dwindling number of speakers
- Highest literary prizes are awarded in 22
- 96 speak only 20-odd IA Dravidian languages
- 14 major writing systems in use.
20The Real-time Divisions
- India is geographically organized in terms of 35
States and Union Territories, and each of these
large units has under it, divisions at several
levels, like -
- 593 districts, further sub-divided into 5,564
sub-districts - To complicate matters, we dont use the common
terms - So many ways Tehsils or talukas, Mandals (AP),
Circles, C.D.Block (Bihar, Tripura, Meghalaya,
W.B. Jharkhand), R.D. Block (Mizoram), Commune
Panchayats (Pondicherry), Sub-divisions
(Arunachal Pr Lakshadweep), and even Police
Stations (Orissa) - With 51 Cities, 384 Urban Agglomerates 5,161
towns scattered all over, which occupy only
insignificant amount of our land space, the
picture is further complicated. - Although states in India are supposed to be
organized according to languages, each state has
speakers of a number of minority languages, and
there are many who have the same or similar set
of language profile.
21Bilingualism Figures
- Since bilingualism figures are available in
census (1931 census added a question on
bilingualism, and 1991 Census even had one on
trilingualism), we see that the urban areas
especially the big cities urban agglomerates
are places where the local or regional
language(s) were as popular as other languages. - It is specially true of industrial townships or
small towns with business hubs and commercial
interests. - In 1951, besides 14 languages recognized in the
Indian Constitution, we find 23 major tribal
languages and 24 other minority languages, each
with over 100,000 speakers, with 722 other
languages whose speakers numbered less than that. - Indias national average of bilingualism in 1991
(19.44) was significantly higher than averages
of 1961 (9.7). - Compare them with 1971 (13.04) 1981 (13.44)
figures. - 1991 figures of average rate of trilingualism was
7.26.
22Extent of multilingualism
- When we look beyond 1951, the number of speakers
of minority languages varied greatly from state
to state e.g. in Tripura, over 31 speak
minority languages, but in Kerala the figure is
only 3.4. Nagaland and Arunachal Pradesh (the
biggest groups being 14.4 and 19.9,
respectively) do not have a majority language as
such. - 7 states - Kerala, Punjab, Gujarat, Haryana,
A.P., U.P., H.P., Rajasthan, TN WB have
negligible minor speech groups, with 85 speaking
a major language. - And whatever we get as minor language group
usually figures elsewhere as a major speech
community. Further, in six of these states, one
could see a cluster of Urdu as a minority speech
group. - Lets look at state profiles here
23(No Transcript)
24Differences between sets
- Set B has Hindi, Oriya or Lushai as major
languages spoken by over 75 of the people, but
these are also the states where a large number of
tribal communities use their own languages. - It is not surprising, therefore, that the first
two (M.P. and Bihar) gave rise to two new states
Chattisgarh, and Jharkhand, respectively,
exclusively for tribal speakers of numerous
languages. Chattisgarh is yet to find a lingua
franca but we will have to wait and see how
Sadani/Sadari does in Jharkhand. - Set C states had their fair share of both
pronounced and hidden language tensions. The
Konkani in Goa had to fight a long battle to
claim their own position, as they were always
branded as a dialect of Marathi. - In Set D states, the linguistic tensions have
been quite volatile, and that can be correlated
with their profile. - Set E is the most variegated geo-space in India
with many tongues.
25Typical problems
- Languages like Konkani present yet another
problem of South Asia, namely diagraphia. Konkani
is written in four scripts in Roman and
Devanagari in Goa, in Kannada in Karnataka and in
Malayalam in Kerala. We have Punjabi (with
Gurmukhi/Deva-nagari/Perso-Arabic), Sindhi and
Kashmiri (both with Deva-nagari/Perso-Arabic
systems) in the same category. Add Bodo and
Santali to that list too.
- On implementation of various decisions, problems
crop up. The National Curriculum Framework for
School Education A Discussion Document (released
Jan, 2000) while reviewing the Three Language
Formula, states In a number of
states/organizations/ boards, however, the spirit
of the formula has not been followed and the
mother tongue of the people has been denied the
status of the first language because of the
changed socio-economic scenario, the difference
between the second and the third languages has
dwindled. Thus, in reality, there may be
two-second languages for all purposes and
functions.
26(No Transcript)
27The hidden tussle
- This tussle is also evident among smaller
languages. - If we take the entire Indian sub-continent, the
smallest group happens to be the speakers of
Austro-Asiatic languages, who make up
approximately 0.7 of the population (still about
6.5 million people). - These groups earlier had no states to call their
own, whereas many Tibeto-Burman groups had. - This status that has changed with the formation
of three new states, Jharkhand, Chattisgarh and
Uttaranchal, in 1999. But it took quite a while
to make these alterations. - The comparison among them becomes unavoidable as
there are many non-scheduled languages with a
large number of speakers, e.g. see this chart
- Even when it is the case that about a large
percentage of all Indians speak one or the other
major Indian language, and even if some form of
Hindi is understood by close to 60, there are
still many other languages with a long literary
history, grammatical and lexicographical
tradition and rich literary heritage. - All these languages are still in use in all
modern means of communication. - As a result, although the official language of
India is Hindi, there is always a hidden tussle,
as well as open confrontation between supporters
of Hindi as an official language who mostly
oppose the use of English. - On the other hand, supporters of the regional
languages look to English as an alternative link
between the Indian states.
28(No Transcript)
29Required Planned Interventions
- No doubt that natural evolution and development
are processes that are pre-destined and
pre-decided. But when there are exceptions and
deformities, one could see some unforeseen
problems with development emerging. - The major languages of India today are themselves
a product of what we can call secondary
standardization model. They had before them the
models of Sanskrit, of course, but those
responsible for cannon formation were also
exposed to the western tradition of creation of
standards. It is not a surprise to find our
writers modeling new genres (novel, for instance)
either on the Kissa tradition or following the
European model. Gone are the days when we can
expect a language to discover their own model of
standard formation like Sanskrit, Greek or
Arabic or Chinese did. - If we develop this line of action, we would see
that the smaller languages of South Asia which
must be brought to the level where they can be
used for primary and literacy education, would
require to jump start, rather than depending on a
Vidyasagar to canonize punctuation and spellings
or a Bankim or Tagore to bring them literary
prestige. One can do it provided one uses the
advances made in the field of language
technology.
30Can Language Technology help?
- In 1969, we saw this kind of question that had
come in a conference at the East-West Center,
Hawaii Can language be planned? Many of us
remember the initial skepticism with which CLBP
was received in 1971 - Of course, technology can help provided as
scholars devoted to South Asian languages make a
planned effort in this direction. - The exercise assumes greater importance because
language development is related here also the
development of the Other, the marginalized
speech groups.
31Where and how to use LT? Some tips
- Creation of school texts, following the model of
shell-book project as in Papua New Guinea
situation - Generation of a computational orthography that
does justice to the phonetic/phonological nature
of the given language, and linking it up with
UNICODE standards. - Building up of large and annotated corpuses with
appropriate visual and audio documentation. - Setting up of techniques of glossary formation
based on such data, or automatic updation of the
tool whenever more data are added to the corpus. - Linking it up by using techniques of parallel
corpus with the regional language/Hindi/English
that the child is to be exposed to from secondary
level in school. - Creation of attractive language games based on
the models available in other speech communities
around, if they do not have such games developed
naturally. - Building a bridge material with cassette courses
or radio courses (this is only to use the cheaper
options) if EDUSAT project is launched
successfully, use of the television/DTH medium
with deeper penetration should also be tried out.