Internationalization Localization

1 / 54
About This Presentation
Title:

Internationalization Localization

Description:

Arabic script, but Arabic, Farsi, Urdu,... languages ... Jokes/cartoons can be offensive. www.cdacnoida.in. 27. Customs & Traditions ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 55
Provided by: johnck

less

Transcript and Presenter's Notes

Title: Internationalization Localization


1
Internationalization Localization Unicode
  • Karunesh Arora
  • Vijay Gugnani
  • C-DAC Noida

2
Everyone has the right... to seek, receive and
impart information and ideas through any media
regardless of frontiers -- Universal
Declaration of Human Rights
3
Internationalization
  • Internationalization, which is often referred
    as i18n, depicts the practice of designing and
    developing a application, product or document in
    a way that makes it easily localizable for target
    audiences that vary in culture, region, or
    language.

4
Why Internationalization?
  • To remove barriers to local and international
    access
  • Adaptation to local, regional, linguistic or
    cultural needs.
  • To provide global reach
  • ROI, Revenue generation

5
Internationalization Vs. Localization
  • Localization is the actual adaptation to meet
    the language, cultural, and other requirements
    for specific target audience.
  • While internationalization gives us the
    technology and tools to target a given audience,
    its the act of localization that makes it
    accessible.

6
What goes with localization?
  • Localization is much more than translation.
  • Specifically, localization refers to adaptation
    to other language, which involves appropriate
  • Language Translation
  • Locale transformation and Cultural aspects

7
Language Translation
Languages and Countries
  • Most languages are used in many countries, not
    just those where they are dominant or official
  • People migrate and take languages with them
  • Over enough time, most languages evolve
    differently in different locations

8
Scripts and Languages
Language Translation
  • A script may be defined as collection of
    related characters
  • It is common for several languages to share most,
    but not all characters from a given script
  • Scripts are often given the same name as one of
    the languages that uses them
  • Arabic script, but Arabic, Farsi, Urdu,
    languages
  • Scripts are also given common name for a group of
    languages
  • Devanagri script for Hindi, Marathi, Nepali,
    Konkani etc.

9
Language Translation
Some Points to consider
  • Identify Translatable and Non-translatable
    strings
  • Gender and number agreement, ordering of segments
    in a sentence
  • e.g. Page number -gt
  • e.g. Number of pages -gt
  • Many languages can take at least 30 more
    space Tool
  • ????? (HI) ?????? - customer (EN)
  • Design should be compatible, or else the UI may
    have to be redesigned
  • Narrow columns often cannot accommodate long
    Target language equivalent words

10
Language Translation
Some Points to consider Contd.
  • Avoid ambiguous phrases
  • Display options
  • Options of the display -- as Noun Noun
  • Show the options (all of them) as Verb Noun
  • Proverbs and metaphors may not have equivalents
    in target language
  • Keep Web pages and paragraphs short.
  • Avoid text in graphics.
  • Use simple grammatical structures.
  • Use everyday language.
  • Provide clues.

11
Language Translation
Some Points to consider Contd.
  • Follow source language conventions.
  • Avoid acronyms.
  • Abbreviations may have to be expanded when
    translated
  • Check spelling and grammar.
  • The more compact the source writing, the longer
    the Translation
  • Brief translators about the purpose and target
    audience
  • All items in a menu or set of check boxes should
    have the same grammatical structure

12
Locale
  • Set of parameters that define the users
    language, country and cultural preferences

13
Different aspects of locale
  • Names Titles
  • Calendars,
  • Numeric, Date and Time formats, Addresses,
  • Currencies, Paper size, Weights measures
  • Input Mechanism,
  • Language Selection,
  • Oral Pronunciation

14
Titles and Names
  • In India, it is required to specify
    etc.)
  • these titles do not necessarily translate
  • Family name is not always last (In South West
    part of country)
  • Sorting can be based on last name or first
  • Salutations in letters (e.g. Dear) are different
    in different locales e.g.

15
Titles and Names
Source Delhi Press Prakashan
16
Calendars
  • The Gregorian calendar should not always be
    assumed
  • Proper localization of some software requires the
    use (at least as an option) of calendars distinct
    to a culture
  • E.g. Vikram Samvat/ Saka / Hijri calendar in
    India
  • Calendars of various religions where year 0 was
    not 2006 years ago
  • Fiscal-year based calendars vary widely
  • Some have 13 months (364/28) or 53 weeks

17
Date formats
  • Date separators depend on locale /, -, .
  • am and pm are not used universally (many
    cultures use 24 hour clock)
  • ISO standard dates are unambiguous yyyy-mm-dd
    hhmmss
  • Non ISO date 01-03-02 means different things in
    different locales.
  • If not using ISO, then display dates in the
    locale of the user
  • Preferably use a long form with the month
    spelled out (in the correct language)

18
Formatting Numbers
  • locale dependent, not the language of application
  • Group separation
  • Number of digits in a group
  • In English and ISO it is 3 while for Indic
    languages its different 1,23,456 i.e.
    ,,,
  • Group separator
  • In English ,, but ISO uses space, and some
    locales use . or none
  • Decimal separator ., ., ,
  • Negative symbol -, , ()

19
Currency
  • Use the currency symbol of the data
  • i.e. INR doesnt automatically translate to or
    when the locale changes
  • Format depends on the users locale, not the
    currency
  • Differences in formats
  • Symbol
  • Position (before or after the currency)
  • Blanks separating the symbol from the data

20
Currency contd
  • Different ways of expressing Rs. 1000
  • Rs.1000 OR Rs. 1000/- or Rs.1,000/- or Rs.
    1000.00
  • INR 1000
  • 1000 Rupees 1000 ?????
  • Strong currencies like Indian need decimal
    precision (e.g. 2 digits after the decimal point
    for paisa)

21
Language selection
  • Avoid using national flags to choose preferred
    language
  • Multiple countries use the same language
  • Display of language selection order?
  • Language of displaying languages ?
  • In the language itself, or with a translation in
    the default language of the operating system

22
Pronunciation
  • Important for Speech based systems
  • Higher recognition accuracy can be obtained by
    tailoring voice input to regional dialects
  • Voice output in the wrong dialect can make an
    application sound foreign
  • Applications supported with regional dialects
    have better impact

23
Culture
  • Culture is a complex collection of experiences
    which condition daily life
  • It includes
  • history,
  • social structure,
  • geographical effects,
  • religion,
  • traditional customs and everyday usage.

24
Cultural issues
  • Icons, symbols and images
  • Colors, myths, beliefs and feelings
  • Humour
  • Geographical environmental effects
  • Customs traditions
  • Social Security Numbers

25
Icons Symbols
  • Icons that are a play on words do not translate
  • e.g.
  • A dust bin for dumping files
  • A rocket for launching an application
  • A scissors for cutting in edit operation
  • B, I, U
  • Some concepts have been found extremely hard to
    represent as an icon
  • E.g. Sorting (A-gtZ is not universal)
  • Images of people or body parts such as hands
  • Considered inappropriate in some cultures
  • What skin color do you use?
  • People Images need to be localized for each
    country

26
Colors Humour
  • The color white may represent purity and green
    prosperity in the Indian context, but it may not
    be the same in another culture.
  • Humour generally does not get translated
  • People are sensitive to different things in
    different cultures
  • Jokes/cartoons can be offensive

27
Customs Traditions
  • In the Indian culture, people show respect to
    their elders and renowned personalities by
    addressing them in plural.
  • e.g. Dr. Manmohan Singh is the prime minister
    of India.
  • ??. ?????? ???? ???? ?? ???????????? ????
  • Similarly, in social relationships, there are
    several words to address a relation
  • e.g. for uncle - ????, ???, ????

28
Unicode?
Unicode provides a unique number for every
character, no matter what the platform,no
matter what the program,no matter what the
language.
Source http//unicode.org
29
Universal Character Encoding
  • Unique number for every character


30
Unifies all Languages
  • 96 thousand characters, so far
  • All characters accessible at the same time, in
    the same document
  • ?, ?, ?,

31
Wide Spread Support
  • Developed supported by industry leaders
  • Apple, HP, IBM, JustSystem, Microsoft, Oracle,
    SAP, Sun, Sybase, Unisys,
  • Supported in standards
  • XML, HTML, Java, ECMAScript (JavaScript), LDAP,
    CORBA 3.0, WML, Perl, etc.
  • Implemented in
  • All modern operating systems, browsers, and other
    products

32
IDN
  • http//????.in

33
Information about Unicode
  • www.unicode.org
  • Online Standard
  • Technical Reports
  • FAQs
  • General Information
  • Discussion Forums, Conferences

34
Resources Availability
  • System APIs
  • Windows, Java, Unix, Oracle, DB2, Sybase, Mac,
    Linux,
  • Languages
  • Java, JavaScript, C, Perl 5.6.0, C, C, SQL,
  • Cross-platform libraries
  • ICU, Rosette,

35
Indic Support in Unicode
  • ISCII the basis for characters and allocation
  • DIT is member of Consortium
  • Reports have been submitted on missing
    characters, clarifications or corrections of usage

36
ISCII Similarities
  • Within script, layout and contents nearly
    identical
  • Independent dependent vowels
  • Halant model for representing conjuncts
  • conjuncts / half-forms not directly encoded
  • represented by sequences instead
  • Phonetic sequence order in syllables

37
ISCII Differences
  • Unicode is stateless
  • No shifting to get different scripts
  • Each character has a unique number
  • Unicode is uniform
  • No extension bytes necessary
  • All characters coded in the same space

38
Advantages
  • Accessible Information across the globe
  • Seamless multilingual documents
  • Opens up software export market, beyond English
  • Connects India to the world

39
The Future
  • The world is moving rapidly to Unicode
  • Unicode makes India open to the world
  • The world comes to you, and
  • You go to the world

40
Multiple Forms
  • UTF-8 maximal compatibility with 8-bit systems
  • UTF-16 good storage, interoperability with
    Windows/Java
  • UTF-32 simplest processing
  • Fast, lossless conversion

41
W3C Internationalization Activity
42
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Styling of first character
  • If some styling feature is to be applied to the
    starting character, then whether it will be
    applied to a single character, conjunct
    character, a syllable or a Grapheme cluster.
  • e.g.

?????? (Position) ???????? (Departure)
???? (Vowel) ??? (Dictionary)
????? (Hindi)
?????? (Hindi)
?????????  (Regional)
43
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Styling of first character

44
Some Issues under discussion in IL
  • Presentation / Styling issues
  • In Cursive Text
  • like Arabic and Urdu
  • the styling is applied
  • to whole word

Saabiq -gt Former
Urdu
Source Rashtriya Sahara
45
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Vertical arrangement of characters
  • If some string is written in vertical mode,
    then writing each character on a new line may not
    be suitable

http//www.w3.org/International/notes/firstletter.
html
46
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Horizontal spacing
  • e.g.

47
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Bullets and numbers
  • Number schemes to be supported in Indian
    languages also.

48
Some Issues under discussion in IL
  • Presentation / Styling issues
  • Collation
  • A means to search and order data in a way that
    makes sense in their particular culture
  • Myths - One collation is good enough
  • Unicode enabled sorting is already
    covered

49
Some Issues in Indian Languages
  • Presentation / Styling issues

50
Some Issues under discussion in IL
  • Presentation issues
  • Underlining of the characters
  • ???? ?????? ??? ?? ??????

51
Some Issues
  • Searching issues
  • Problem in searching in languages sharing same
    script and some words being same but semantically
    different

52
Issues on presentation on other devices
  • Addressing Input mechanism, predictive input for
    vernacular languages
  • Handling display issues in Hand held devices with
    smaller screen, in cases of translation
  • Standardizing encoding issues in communication
    for taking care of cost of bandwidth (ISCII /
    Unicode / Compressed Unicode), connectivity and
    on-the-fly conversion of encodings

53
References and acknowledgements
  • http//www.w3.org/international
  • Articles by Richard Ishida, Felix Sasaki, W3C
  • http//macchiato.com/slides/UnicodeAndIndia.ppt ,
    Presentation by Mark Davis
  • www.site.uottawa.ca/ftppub/courses/Winter/csi5122/
    coursenotes/5122Internationalization.ppt

54
  • Thank you
Write a Comment
User Comments (0)