WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN LANGUAGES - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN LANGUAGES

Description:

WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN LANGUAGES – PowerPoint PPT presentation

Number of Views:421
Avg rating:3.0/5.0
Slides: 32
Provided by: pc7558
Category:

less

Transcript and Presenter's Notes

Title: WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN LANGUAGES


1
WHAT LANGUAGE TECHNOLOGY CAN DO FOR INDIAN
LANGUAGES
  • UDAYA NARAYANA SINGH

CIIL, Mysore FEB 5-7, 2005
SIMPLE 05 INAUGURAL LECTURE IIT-Kharagpur
2
Professor uses Language Technology to preserve
tribal languages
http//uanews.org/cgi-bin/WebObjects/UANews.woa/wa
/MainStoryDetails?ArticleID9400
  • In June last year, an internet news item popped
    up in Google, which I think would be important
    for all of us. It said -
  • Professor Susan Penfield with CRIT (Colorado
    River Indian Tribe) and UofA support, and with
    funding from Bill Melinda Gates Foundation,
    worked to preserve two Amerindian languages.
  • She created and trained tribal members of Mohave
    and Chemehyevi in the use of software internet
    tools that would support preservation
    instruction in these languages.
  • Such timely steps are important as Mohave has 33
    fully fluent speakers they are of plus 70
    age-group, and Chemehyevi has 10 speakers of 60
    plus age.

The moral? Language technology could be should
be used for smaller languages.
3
Objectives of this presentation Two-fold
  • I would like to draw your attention to two issues
    in my presentation here
  • The first is a concern for linguists. It is a
    challenge for all practitioners of language
    management, namely, how do we develop smaller
    languages in a diverse space? A space where
    number seems to be an important category?
  • Secondly, given the profile of such smaller and
    lesser-known (and often, least cared for)
    languages of South Asia, what can language
    technology do to alter their status?

4
Language Endangerment Relevance of this talk
  • As a Bio-Mathematician, Pagel tells us,, language
    disappearance is happening at an alarming pace
  • If only 10 of the current 6,000 worlds smaller
    languages survive, we lose our heritage. Death of
    a speaker is understandable. Wiping off of an
    entire community is painful. And yet governments
    may think it is a wastage of resources.
  • The sociolinguists tell us that in South Asia,
    somehow, language retention is more natural than
    language loss.
  • But even in India, 2nd/3rd generation migrants
    adopt other tonguesith them.
  • They realize albeit late that without their
    languages, they are like lost persons, home-less
    and social rejects.
  • On the other hand, it is also true that meanwhile
    many new languages keep emerging through split
    and merger.

5
World-wide Decline Canada
  • Over 60 languages were originally spoken but 8
    (13) were lost before 1990, and 13 more are
    near-extinct now (Kinkade (1991)
  • 23 languages in Canada are 'endangered' (38)
    now, because they have few speakers under 50
    years
  • Only 4 languages may survive in the long run.
  • Based on 1996 Canadian census data, Norris (1998)
    later shows that the Indigenous languages that
    are likely to survive are further down to 3.

6
The North American Scenario Indigenous people
now left in Eastern USA
  • For North America the number of Indigenous
    languages originally spoken were around 300,
    (Bright 1994 and Mithun 1999).
  • Chafe (1962) counted 211 languages as still
    living in 1960
  • Out of these only 89 (42) had speakers of all
    ages
  • In 1991, 51 (approx. 24) are alive (Zepeda and
    Hill 1991).
  • Campbell (199716) says 80 of them 'will die in
    this generation'.
  • The prediction? Only 20-30 Indigenous languages
    will survive by 2040.

7
Australia Disturbing Statistics
  • Out of about 300 in 1800, there has been a 90
    decrease of speakers in all age groups.
  • Decline rate varies from 100 in 1800 to 13 in
    1996.
  • At this rate, by 2050 there will no longer be any
    Indigenous language in Australia.
  • In absolute terms, there may actually be 55 000
    speakers of Indigenous languages there.

ABS 1994 Survey of Aboriginal
8
A Real Danger
  • Although I do not want to sound like the prophet
    of the doom, the facts about language attrition
    tell their own stories.

... this vastly reduced reservoir of linguistic
diversity constitutes one of the great
treasures of humanity, an enormous store-house of
expressive power and profound understandings of
the universe. The loss of the hundreds of
languages that have already passed into history
is an intellectual catastrophe in every way
comparable in magnitude to the ecological
catastrophe we face today. (Zepeda and Hill,
1991)   Zepeda, O. and J.H. Hill, 1991. The
Condition of Native American Languages in the
United States. In R.H. Robins and E.M. Uhlenbeck,
Robins (editors). Endangered Languages. Oxford
Berg Publishers
9
Losses Smaller or bigger?
  • Many might say - Lets forget about smaller
    losses, if they are inevitable, and rather
    concentrate on the bigger groups, and larger
    issues. But for the smaller groups, their
    struggle for retention of identity is no less
    real. They are of paramount importance.
  • Many subaltern specialists or leaders often open
    their gates only to those who are into gender
    issues or take up the cause for cultural and
    religious minorities. The smaller linguistic
    groups are generally pushed to the periphery in
    all instances.
  • Whenever questions on smaller linguistic groups
    are raised, Ive heard South Asianists raise this
    question almost with a vengeance Are they
    dialects or languages? meaning thereby that if
    they are dialects, why worry about them? If you
    persist, the next predictable question would be
    Do they have a script? Even if we get past this
    question, the next one is Are they taught in
    schools? They appear to me to be ever shifting
    stand of the Higgle Piggle Di.
  • This reminds me of a story from Sukumar Rays
    ha-ja-ba-ra-la where the protagonist is
    confronts a menacing cat that changes its name
    all the time. Here goes the story

10
We need to see the point that the problem here
cannot be wished away by changing/redesignating
labels. They remain where they are.
11
Which India should we talk about?
  • In a talk I had given at Saarbrucken in a
    Conference on Peripheral centers and central
    peripheries in 2002, while talking about
    Another India, I had asked myself, Which India
    shall I talk about (since there are many)? It is
    still not possible to spell out the priorities
    that a language planner must have in the South
    Asian context?
  • Whenever one looks at the space called South
    Asia from outside, we need to ask who is showing
    you what the space is like? If it is seen either
    through cinematic extravaganza like Sanjay Leela
    Bhansalis Devdas, or through the texts woven
    by our writers who write or rewrite in English,
    the space appears like a conundrum to them.
  • Indian language scenario may often appear like a
    universe plotted as a pastiche on a canvas -
    remote and diverse at the same time.
  • However, when we learn about the space through
    the Vernacular languages texts, it is an
    illuminating experience.
  • The real India of numerous smaller speech groups
    becomes even more strident if we are exposed to
    the texts that are generated outside the
    twice-born languages (to borrow Meenakshi
    Mukherjees tag for the 8th Schedule languages ).
  • Obviously, Indian English writing forces a large
    part of India to perpetually remain outside the
    focus. Our concern is here for those smaller
    languages. I am sure other South Asian nations
    are probably not an exception to this trend.

12
SOUTH ASIA A COMPARATIVE CHART
13
Lets try to understand the Spread of the problem
in India
  • The Scheduled Tribes each with its own
    linguistic heritage -account for 67.76 million
    representing 8.08 of our population living
    mainly in the forest and hilly regions (1991
    Census)
  • More than 70 of them are in Madhya Pradesh,
    Maharashtra, Orissa, Bihar, Andhra Pradesh, West
    Bengal and Gujarat.
  • We not only need special provisions for their
    protection from social injustices and all forms
    of exploitation, we also need concrete plans to
    cater to their educational interests
  • Although it is useless to raise the issue of
    mother tongue as medium of primary and early
    secondary education after all the experiments
    conducted by a number of sociolinguists and
    psychologists of language, and especially after
    the 1954 UNESCO document was accepted by most
    countries, many still raise this issue whether
    they should not be better educated in English or
    Hindi or through the language of the majority
    community in the region?

14
Provisions in India?
  • There is a general feeling among those who do not
    understand the Indian polity and the
    administrative set up that we do not have a
    mechanism in place to protect and promote these
    minor languages.
  • They often point to smaller countries like Nepal
    where both in Constitutional provisions and in
    Universal Education documents these issues are
    specifically mentioned.
  • However, in case of India, the sheer size of the
    country and complexity of the administrative set
    up are such that the provisions that exist here
    (at least, on paper) cannot be compared with
    those in other South Asian nation-states.

15
Some examples
  • 5th 6th Schedule of the Constitution Article
    224 made special provision but these are under
    Home Ministry
  • Special representation for the STs in the Lok
    Sabha and State legislative assemblies till 25th
    January, 2010 (Arts, 330, 332 and 334) also made
  • Under Articles 164 and 338, separate State-level
    and National Commission at the Centre was set up
    to promote their welfare and safeguard their
    interests
  • But the Ministry of Tribal Welfare, set up in
    October 1999, is the nodal agency.
  • Commission for Linguistic Minorities (Allahabad)
    under the Ministry of Social Justice
    Empowerment
  • And, now the UPA Government is contemplating
    setting up of another kind of minority commission.
  • A Grant-in-Aid scheme under Article 275(1) was
    created
  • Protection of Civil Rights Act, 1955 and the
    Scheduled Castes, Scheduled Tribes (Prevention of
    Atrocities) Act 1989 were enacted
  • Planning Commission took a landmark step by
    opening 43 Special Multi-purpose Tribal Blocks
    (SMPTBs) during 2nd Five Year Plan, later called
    Tribal Development Blocks (TDBs)
  • Later, under 4th Plan, six projects with Rs.2
    crores set up in Andhra Pradesh, Bihar, Madhya
    Pradesh and Orissa, and a separate Tribal
    Development Agency was established
  • Such steps have only multiplied in the later
    years.

16
A lot more needs to be done with concerted focus,
and we believe that in this context technology
can play a major role.
  • The Fifth Five Year Plan marked a shift in the
    approach when the Tribal Sub Plan (TSP) for
    direct benefit of the STs was launched
  • In 1987, the Tribal Cooperative Marketing
    Development Federation (TRIFED) was set up to
    provide marketing help to tribals for their minor
    forest surplus agricultural produce
  • The GIA scheme covered 376 NGOs working in this
    area, each getting about 90 of their subsistence
    and development grants.

BUT ARE ALL THESE ENOUGH?
17
There are a large number of non-tribal smaller
linguistic groups in India, and all of them, like
these tribal languages, need at least the
following
Grammars Graded learning materials Literacy
books Dictionaries (general purpose) Thesauri Spec
ialized/Technical Glossary Cultural
documentation Primers Style Manuals Encouragements
to initiate literary activities
But they are not the only ones
18
A picture of India that is closer to reality
  • 1,576 rationalized mother-tongues
  • 1,796 other mother-tongues
  • 114 languages with 10,000 plus speakers
  • Variation in size Hindi with 337 million to
    Maram of Manipur with only 10,144
  • 22 Constitutional languages
  • Large non-scheduled languages Bhili with 5.57
    million speakers

19
  • 146 speech varieties used in radio network
  • 69 used in schools
  • 3954 newspapers in 35 languages as in 1971 (The
    figure has doubled in 2003, with Hindi (2507),
    Urdu (534), English (407), Marathi tamil (395
    each) alone (4238) surpassing earlier figure.
  • 58 languages with dwindling number of speakers
  • Highest literary prizes are awarded in 22
  • 96 speak only 20-odd IA Dravidian languages
  • 14 major writing systems in use.

20
The Real-time Divisions
  • India is geographically organized in terms of 35
    States and Union Territories, and each of these
    large units has under it, divisions at several
    levels, like -
  • 593 districts, further sub-divided into 5,564
    sub-districts
  • To complicate matters, we dont use the common
    terms
  • So many ways Tehsils or talukas, Mandals (AP),
    Circles, C.D.Block (Bihar, Tripura, Meghalaya,
    W.B. Jharkhand), R.D. Block (Mizoram), Commune
    Panchayats (Pondicherry), Sub-divisions
    (Arunachal Pr Lakshadweep), and even Police
    Stations (Orissa)
  • With 51 Cities, 384 Urban Agglomerates 5,161
    towns scattered all over, which occupy only
    insignificant amount of our land space, the
    picture is further complicated.
  • Although states in India are supposed to be
    organized according to languages, each state has
    speakers of a number of minority languages, and
    there are many who have the same or similar set
    of language profile.

21
Bilingualism Figures
  • Since bilingualism figures are available in
    census (1931 census added a question on
    bilingualism, and 1991 Census even had one on
    trilingualism), we see that the urban areas
    especially the big cities urban agglomerates
    are places where the local or regional
    language(s) were as popular as other languages.
  • It is specially true of industrial townships or
    small towns with business hubs and commercial
    interests.
  • In 1951, besides 14 languages recognized in the
    Indian Constitution, we find 23 major tribal
    languages and 24 other minority languages, each
    with over 100,000 speakers, with 722 other
    languages whose speakers numbered less than that.
  • Indias national average of bilingualism in 1991
    (19.44) was significantly higher than averages
    of 1961 (9.7).
  • Compare them with 1971 (13.04) 1981 (13.44)
    figures.
  • 1991 figures of average rate of trilingualism was
    7.26.

22
Extent of multilingualism
  • When we look beyond 1951, the number of speakers
    of minority languages varied greatly from state
    to state e.g. in Tripura, over 31 speak
    minority languages, but in Kerala the figure is
    only 3.4. Nagaland and Arunachal Pradesh (the
    biggest groups being 14.4 and 19.9,
    respectively) do not have a majority language as
    such.
  • 7 states - Kerala, Punjab, Gujarat, Haryana,
    A.P., U.P., H.P., Rajasthan, TN WB have
    negligible minor speech groups, with 85 speaking
    a major language.
  • And whatever we get as minor language group
    usually figures elsewhere as a major speech
    community. Further, in six of these states, one
    could see a cluster of Urdu as a minority speech
    group.
  • Lets look at state profiles here

23
(No Transcript)
24
Differences between sets
  • Set B has Hindi, Oriya or Lushai as major
    languages spoken by over 75 of the people, but
    these are also the states where a large number of
    tribal communities use their own languages.
  • It is not surprising, therefore, that the first
    two (M.P. and Bihar) gave rise to two new states
    Chattisgarh, and Jharkhand, respectively,
    exclusively for tribal speakers of numerous
    languages. Chattisgarh is yet to find a lingua
    franca but we will have to wait and see how
    Sadani/Sadari does in Jharkhand.
  • Set C states had their fair share of both
    pronounced and hidden language tensions. The
    Konkani in Goa had to fight a long battle to
    claim their own position, as they were always
    branded as a dialect of Marathi.
  • In Set D states, the linguistic tensions have
    been quite volatile, and that can be correlated
    with their profile.
  • Set E is the most variegated geo-space in India
    with many tongues.

25
Typical problems
  • Languages like Konkani present yet another
    problem of South Asia, namely diagraphia. Konkani
    is written in four scripts in Roman and
    Devanagari in Goa, in Kannada in Karnataka and in
    Malayalam in Kerala. We have Punjabi (with
    Gurmukhi/Deva-nagari/Perso-Arabic), Sindhi and
    Kashmiri (both with Deva-nagari/Perso-Arabic
    systems) in the same category. Add Bodo and
    Santali to that list too.
  • On implementation of various decisions, problems
    crop up. The National Curriculum Framework for
    School Education A Discussion Document (released
    Jan, 2000) while reviewing the Three Language
    Formula, states In a number of
    states/organizations/ boards, however, the spirit
    of the formula has not been followed and the
    mother tongue of the people has been denied the
    status of the first language because of the
    changed socio-economic scenario, the difference
    between the second and the third languages has
    dwindled. Thus, in reality, there may be
    two-second languages for all purposes and
    functions.

26
(No Transcript)
27
The hidden tussle
  • This tussle is also evident among smaller
    languages.
  • If we take the entire Indian sub-continent, the
    smallest group happens to be the speakers of
    Austro-Asiatic languages, who make up
    approximately 0.7 of the population (still about
    6.5 million people).
  • These groups earlier had no states to call their
    own, whereas many Tibeto-Burman groups had.
  • This status that has changed with the formation
    of three new states, Jharkhand, Chattisgarh and
    Uttaranchal, in 1999. But it took quite a while
    to make these alterations.
  • The comparison among them becomes unavoidable as
    there are many non-scheduled languages with a
    large number of speakers, e.g. see this chart
  • Even when it is the case that about a large
    percentage of all Indians speak one or the other
    major Indian language, and even if some form of
    Hindi is understood by close to 60, there are
    still many other languages with a long literary
    history, grammatical and lexicographical
    tradition and rich literary heritage.
  • All these languages are still in use in all
    modern means of communication.
  • As a result, although the official language of
    India is Hindi, there is always a hidden tussle,
    as well as open confrontation between supporters
    of Hindi as an official language who mostly
    oppose the use of English.
  • On the other hand, supporters of the regional
    languages look to English as an alternative link
    between the Indian states.

28
(No Transcript)
29
Required Planned Interventions
  • No doubt that natural evolution and development
    are processes that are pre-destined and
    pre-decided. But when there are exceptions and
    deformities, one could see some unforeseen
    problems with development emerging.
  • The major languages of India today are themselves
    a product of what we can call secondary
    standardization model. They had before them the
    models of Sanskrit, of course, but those
    responsible for cannon formation were also
    exposed to the western tradition of creation of
    standards. It is not a surprise to find our
    writers modeling new genres (novel, for instance)
    either on the Kissa tradition or following the
    European model. Gone are the days when we can
    expect a language to discover their own model of
    standard formation like Sanskrit, Greek or
    Arabic or Chinese did.
  • If we develop this line of action, we would see
    that the smaller languages of South Asia which
    must be brought to the level where they can be
    used for primary and literacy education, would
    require to jump start, rather than depending on a
    Vidyasagar to canonize punctuation and spellings
    or a Bankim or Tagore to bring them literary
    prestige. One can do it provided one uses the
    advances made in the field of language
    technology.

30
Can Language Technology help?
  • In 1969, we saw this kind of question that had
    come in a conference at the East-West Center,
    Hawaii Can language be planned? Many of us
    remember the initial skepticism with which CLBP
    was received in 1971
  • Of course, technology can help provided as
    scholars devoted to South Asian languages make a
    planned effort in this direction.
  • The exercise assumes greater importance because
    language development is related here also the
    development of the Other, the marginalized
    speech groups.

31
Where and how to use LT? Some tips
  • Creation of school texts, following the model of
    shell-book project as in Papua New Guinea
    situation
  • Generation of a computational orthography that
    does justice to the phonetic/phonological nature
    of the given language, and linking it up with
    UNICODE standards.
  • Building up of large and annotated corpuses with
    appropriate visual and audio documentation.
  • Setting up of techniques of glossary formation
    based on such data, or automatic updation of the
    tool whenever more data are added to the corpus.
  • Linking it up by using techniques of parallel
    corpus with the regional language/Hindi/English
    that the child is to be exposed to from secondary
    level in school.
  • Creation of attractive language games based on
    the models available in other speech communities
    around, if they do not have such games developed
    naturally.
  • Building a bridge material with cassette courses
    or radio courses (this is only to use the cheaper
    options) if EDUSAT project is launched
    successfully, use of the television/DTH medium
    with deeper penetration should also be tried out.
Write a Comment
User Comments (0)
About PowerShow.com