Sanskrit and Natural Language Processing - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Sanskrit and Natural Language Processing

Description:

Machine Translation. Speech Processing. Summary Extraction from huge texts ... Machine Translation. English To Indian Languages. Word sense disambiguation ... – PowerPoint PPT presentation

Number of Views:3334
Avg rating:3.0/5.0
Slides: 39
Provided by: vark
Category:

less

Transcript and Presenter's Notes

Title: Sanskrit and Natural Language Processing


1
Sanskrit and Natural Language Processing
  • Dr.Srinivasa Varakhedi
  • Center for Advanced Studies and Research in
    Shabdabodha and NLP
  • RASHTRIYA SANSKRIT VIDYAPEETHA
  • DEEMED UNIVERSITY
  • Tirupati(A.P)

2
Dream of a bee..
  • úÉÊjÉ MÉÊɹªÉÊiÉ ÉÊɹªÉÊiÉ ºÉÖÉÉÉiÉÉÂ
  • ÉɺÉÉxÉ näù¹ªÉÊiÉ ½þʺɹªÉÊiÉ ÉRÂóEòVÉÉÒ
  • ltilÉÆ ÊÉÊSÉxiɪÉÊiÉ EòÉäÉMÉiÉä Êuùäúäò
  • ½þÉ ½þxiÉ ½þxiÉ xÉÊÉxÉÓ MÉVÉ VVɽþÉú

3
Present situation of Sanskrit
  • Sanskrit colleges are like 'zoo'!
  • No Govt. support unless we are productive
  • Humanities and Languages are being neglected
  • How far this support will continue ?
  • Great tradition of learning is being lost
  • No scope for novel research

4
Innovation is the key
  • Sanskrit Shastras are competent enough to enter
    the science world
  • Move out of Humanities and get merged with
    science
  • Analogy Maths, psychology, Logic.
  • We must find practical approach for these
    Sanskrit Sciences.

5
we have lost 80
  • Meemamsa - No practical approach !
  • Nyaya - No use in modern dialectics ?
  • Vyakarana No application ??
  • What to do ?

6
Relevance of Sanskrit Shastras in Modern
Technology
  • fortunately these shastras are found relevent in
    todays technology
  • Computing ideas in Panini
  • Text processing principles in Meemamsa
  • Formal languages in Nyaya
  • we lack the technology and application area
  • Story of Babbage!!!

7
Massage of Acharya Shankara Bhagavatpada
  • avidyayaa mrtyum tiirtvaa..
  • vidyayaa amrtamashnute.. - Ishavasya
    Uapanishad
  • Sri Shankara Bhagavatpada comments on this ..
  • avidyaa karma vidyaa knowledge

8
Opportunity
  • Emerging Info technology has provided a great
    oportunity to survive
  • MÉÞàþÒªÉÉiÉ ÊiÉxiÉÞhÉÒÉÉJÉÉÆ ÊÉOÉÖÉÉJÉÉOɽäþhÉ
    ÊEòÉÂ ?
  • Solve a major contemporary problem like MT basing
    on the shastras
  • Get new openings for Sanskritists
  • Open a new avenue for research

9
Know How
  • Ultimate aim finding appropriate place for
    sanskrit Shastras
  • Method solutions to contemporory problems
    adopting modern technology
  • Resource needed Adequate manpower, who act as a
    bridge between modern scientists and
    technologists one side and sanskrit scholars on
    the other side.

10
Change the scenario
  • Technology
  • Western Theories
  • INDIAN THEORIES

11
Opportunities missed
  • Industrial revolution
  • We missed this with some hasty decisions
  • IT revolution
  • Indians are serving in the level of coding not
    in designing level !
  • Knowledge Revolution
  • we should take this advantage

12
Need of the hour
  • we need
  • to understand how technology works
  • to understand the contempomporary problems
  • Then
  • we will be able to give solutions in the light of
    sashtras and show the relevence of Indian theories

13
History and Progress
  • Conference held at Bangalore in Dec 1987 on
    Knowledge Representation and Sanskritam
    generated tremendous interest
  • Nothing much has been archived, except some
    efforts and projects here and there in small
    scale that too in technical institutions
  • Time running out ! What progress has been made
    since then?

14
Complexity of the problem
  • Different Goal Two disciplines Technology and
    Shastras - are developed in different context
  • Paradigm difference Modern Scholars are
    accustomed to visual teaching method, Traditional
    Pandits on the other hand prefer oral tradition
  • Language Barrier Both of them do not understand
    each others language !
  • The tuning in of the dialogue will take time

15
Who would bell the cat ?
  • It needs a long interaction between technologists
    and Traditional Sanskrit Scholars
  • Technical institutions are always ready for such
    activities
  • There is NO much interest is seen in Sanskrit
    Institutions
  • It is we Sanskritists should to bell the cat

16
Long process like extraction of ghee from milk
  • Nothing miracle happens in the initial stage
  • Its a big challenge, one OR two persons are not
    enough
  • We need hundreds of dedicated persons to achieve
    a small goal
  • A person can climb a small hill Team can climb
    the Everest

17
Identifying the problem
  • Analogy- Braman in Upanishads
  • what is Brahman?
  • we can NOT show it as it is impercievable.
  • we can NOT describe it as it is beyond words.
  • Hence ,
  • we can direct you towards that by way of negating
    what we know.
  • (ÉÉä½þ) - ÉÉJÉÉSÉxpùɺÉÉûxvÉiÉÒxªÉɪÉ

18
Platform For Innovation
  • To achieve this Rashtiya Sanskrit Vidyapeetha has
    set up a view Innovative centre for advanced
    study and research in shabdabodha and language
    technology
  • Center has faculty from shabdabodha (Nyaya
    Vyakarana Meemamsa), NLP and computer science
  • Center has full-fledged computer lab

19
Possible areas
  • Machine Translation
  • Speech Processing
  • Summary Extraction from huge texts
  • Indo Wordnet as a base for IL-wordnets
  • Developing Tools for IL Researchers
  • Knowledge Representation schemes

20
Machine Translation
  • English To Indian Languages
  • Word sense disambiguation
  • Karaka Syntax Relation
  • Word-grouping
  • Idiomatic Expression
  • Shabdasutra
  • MT among Indian Languages
  • Bi-language Electronic Dictionaries
  • Karaka Vibhakti Relation

21
Major MT systems
  • India
  • Angla-Bharati, IIT Kanpur
  • Shakti, IIIT Hyderabad
  • Mantra, CDAC Pune
  • SaHiT (Sanskrit Hindi Translator), CSS, JNU
  • Anusaaraka (RSV, HCU, IIIT)

22
Major MT systems
  • Outside India
  • UNITRAN
  • BabelFish AltaVista (Systran)
  • ATR (bimodal, Japan)
  • JANUS (bimodal, US-Germany)
  • SLT (SRI, Cambridge)
  • VERBMOBIL (Germany)
  • DIPLOMAT (Carnegie-Mellon)
  • Get a 125 page directory of available MT systems
    at
  • http//ourworld.compuserve.com/homepages/WJHutchin
    s/Compendium-11.pdf

23
Summary Extraction
  • Meemamsa Principles applied to extract the
    summary of a text
  • Upakramaadi Tatparya Lingas are used to extract
    the summary of a text in Indian Institute of
    Science, Bangalore, in our consultancy.

24
Wordnet / Concept-net based on NN ontology
  • Wordnet is an electronic lexical reference
    resource system designed on the basis of semantic
    relations of words
  • Synonymy Graha, nivaasa,.
  • Hypernymy Amra, vriksha, vanaspati
  • Antonnymy Shreemaan, akinchana
  • Mecronymy nAsika, mukha, shariira..
  • Gradation Shushka,tara,.tama

25
Sanskrit Corpus
  • Annotating the relation in Sanskrit Texts
  • Tagging Samasas
  • Identifying the topics of the texts
  • Make available Sanskrit Texts along with Simple
    translations on web and CD R form
  • Statistical analysis of Sanskrit Texts

26
Knowledge Engineering
  • Representation
  • For Data representation, several databse
    management systems are available.
  • For representing and retrieving useful
    information, there are various worked out
    methodologies
  • Finally Knowledge Representation needs special
    treatment where Indian Knowledge systems can be
    applied

27
Knowledge and its importance in AI
  • AI researchers are interested in building
    Intelligent systems
  • Web technologies looking forward to Semantic webs
    instead of syntactic web
  • Knowledge is more valuable than data and
    Information
  • Data simple DoB. Info Age calculated.
  • Knowledge the judgment about suitability for
    job at hand etc. This requires a lot of inputs
    from various K- sources.

28
Computational Linguistics and Paninis Grammar
  • The structure of Paninian Grammar is nothing but
    a computer program Babbage !
  • It has captured the base of universal principles
    of all languages
  • CL requires formal rules for analysis and
    generation of language
  • Slowly Chomsky and others are turning towards
    Panini

29
The System of Panini
  • Phonetic component
  • Phonemes
  • pratyahara
  • Rule base
  • Vidhi (operations)
  • Samjna
  • paribhasha (metarules)
  • adhikara (headings)
  • atide?a (extension)
  • niyama (restriction)
  • Lexicon
  • Dhatupaatha
  • Ganapaatha
  • Lists
  • Affixes
  • Rule specific items

30
Paninian Model for Sentence Analysis
  • Action Central theme
  • Karakas Syntactico-semantic roles
  • Visheshana-Visheshyabhava
  • Concept of anabhihitein switching to different
    voice
  • Vivakshaa Intention of speaker
  • Form and meaning

31
Navya Nyaya -gt AI ?
  • Classify Nyaya into five parts ..
  • 1. Ontology
  • 2. Epistemology
  • 3. Technical Language
  • 4. Semantics
  • 5. Art of debate and fallacies

32
Ontology
  • Includes
  • Categories - Substance, Quality etc.,
  • Relations SamavAya, SvarUpa
  • Universals Types or classes
  • Ontology helps to various areas like NLP, K-Repr,
    K-Engg, especially in Cognitive sciences.

33
Epistemology
  • Deals with
  • Cognitive process
  • Cognitive structure
  • It helps to solve the problems of cognitive
    sciences and K-repr.

34
Technical Language
  • NNL is a Restricted Language that has both the
    features power of mechanism of Artificial
    Languages and power of of expression of Natural
    Languages.
  • The basic ideas behind this language will be
    helpful in Knowledge Represenation.

35
Semantics
  • Way of analysis of semantics shown by Navya
    Naiyayikas has been crucially found helpful in
    NLP and Machine Translation
  • Eg. Classification of words rUdha, yoga
  • Syntactical analysis
  • Power of definitions
  • KR NN

36
Semantics in MT
  • Lexicography
  • Word/concepts nets based NN ontology
  • Classification of padas (words)
  • Rudha word has convention I.e names
  • Yougik word has etymological meaningcook,
    driver,
  • Yoga-rudha which has etymology as well as
    conventionCD-driver

37
WSD using different techniques
  • Definitions of Karaka relation without any
    overlap
  • Kartrtvam kriyAnukUlakritimattvam
  • Karmattvam para-samaveta-kriyA-janya-phala-Ashra
    yatvam
  • Going Rama and Forest
  • Who is going where ?
  • Result contact is possible in Rama too..
  • To avoid such overlap, this def. Is useful

38
Refinement of karaka Relations
  • Classification of Karma
  • Karma Reachable, understandable so on.
  • Analysis of root semantics
  • Leave He left the place / left from the place
  • Analysis of expectancy (AkAnkshA)
  • Rats killed cats

39
To infinity relation
  • I stand up to speak
  • I want o speak
  • He goes to London to study law
  • He wants to study law in London
  • To walk in mornings is good for health

40
Computer as a Tool
  • story of Greek research
  • not only sciences, but humanities subjects are
    also benefited by the aid of computers
  • we can use computers
  • to improve our education method
  • to improve the quality in research

41
Power of computers
  • Memory store any amount of data in discs
  • Speed processing access it fast
  • Search
  • Replace / Edit/ Add
  • Get statistical info
  • Create hyperlinks
  • Present it in a better way
  • produce it several times less cost
  • Distribute in easy ways

42
Sansk - Net
  • an online gigantic electronic library of Sanskrit
    works
  • more than 500 works(3,00,00,000 pages of
    E-content)
  • www.sansknet.ac.in
  • Dhathuratnakara is available on web. It can be
    accessed through web http/sanskrit.nic.ac.in

43
CD R Production
  • Paniniya Udaharanakosha is now available in CD
    form
  • 'koshas' will be made available in CD form.
    Vachaspathyam, Sabdakalpadruma
  • Dhaturatnakara All the forms of all roots will
    be made available on CD R.
  • Morphological analyzer for Sanskrit

44
Vatmikiramayana on NET
  • - Vatmikiramayana moolam in all Indian scripts
  • -Audio recording
  • -Transalation in five foriegn languages.
  • -Eight Sanskrit commentories
  • -English transalation and commentories
  • -Summary, Glossary
  • -Beautiful picture gallary
  • http//www.rsvpramayana.ac.in

45
Machiene translation
  • English to Sanskrit
  • Circular translation
  • English Sanskrit dictionaries
  • Sanskrit wordnet

46
Sanskrit readers (accessors)
  • Ramayana accessor
  • Bhagavathgeeta reader
  • Nyaya Classics Reader
  • Vyakarana Reader

47
Sanskrit language processing tools
  • Sandhi concator - Ready
  • Morphological analyser Hosted on web
  • Sandhi spliter (Under progress)
  • Samasa tag interpretor - Ready

48
Future Projects
  • Text to speach for Sanskrit texts
  • High quality search engine for Sanskrit E-library
  • Hypertext archive for Sanskrit Literature

49
Dream Projects
  • Paninian Grammar for English (MT)
  • Ground work is done
  • A national Symposium conducted
  • Validity checking of Paninian system through
    computing
  • Basing teaching material is ready
  • Sanskrit Wordnet
  • Prototype project is undertaken by a student

50
Namaste!
Thank you
  • Special thanks to
  • The authorities of
  • Sri Chandrashekharendra Sarasvati
    Vishvamahavidyalaya
  • Kanchipuram
Write a Comment
User Comments (0)
About PowerShow.com