Introductions to Software Internationalization for Saba 210KB

1 / 45
About This Presentation
Title:

Introductions to Software Internationalization for Saba 210KB

Description:

Introductions to. Software Internationalization. Internal Training. By: Ernie Huang ... The method of developing a program whose feature and code design are not based ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 46
Provided by: ernie58

less

Transcript and Presenter's Notes

Title: Introductions to Software Internationalization for Saba 210KB


1
Introductions to Software Internationalization
Internal Training
By Ernie Huang Date Nov 10, 2000
2
Course Outlines
  • Internationalization (I18N) (3 - 24)
  • Locale-Specific Issues (25 - 28)
  • Double-byte Enabling (29 - 38)
  • Unicode Issues (39 - 44)
  • Q A (45)

3
Internationalization
  • The method of developing a program whose
    feature and code design are not based on a single
    language or locale and the source code base
    facilitates the creation of different language
    editions
  • Marketing Consideration
  • Design Consideration
  • Coding Consideration
  • Source Code Control Consideration

4
Marketing Consideration
  • Reach global market with less development effort
  • Meet the need of international enterprise
  • Leverage international partners for sale and
    support of international market

5
Shipment of International Products
  • Begin working on international editions after the
    domestic edition has been released or when it is
    almost finished.
  • Plan for international products in advance, work
    on several language editions concurrently, and
    ship them all at roughly the same time.

6
Categories of Internationalization for Microsoft
Windows 95
  • European (Western/Central/Eastern Europe, etc.)
  • single-byte, left to right
  • Middle Eastern (Arabic, Hebrew)
  • single-byte, bidirectional
  • Far Eastern (Trad. Chinese, Simp. Chinese,
    Japanese, Korean)
  • double-byte, horizontal and vertical, input
    method
  • Thai (Thai) ltGo to Source Code
    Controlgt
  • single-byte, left to right, text layout

7
Design Consideration (1 of 2)
  • Features important to international markets are
    included
  • Icons and bitmaps are generic, are culturally
    acceptable, and do not contain text
  • Menu and dialog-box designs leave room for text
    expansion
  • Text and messages are devoid of slang and
    specific cultural references
  • Consistent English user interface terminology is
    used in strings.

8
Design Consideration (2 of 2)
  • Strings are documented using comments to provide
    context for translators. Strings or characters
    that should not be localized are marked.
  • Shortcut-key combinations are accessible on
    international keyboards
  • International laws affecting feature designs are
    considered
  • Third party agreements support international
    design issues

9
Coding Consideration (1 of 4)
  • Code doesnt concatenate strings to form
    sentences - Example
  • Code doesnt use a given string variable in more
    than one context
  • Code doesnt contain hard-coded character
    constants, numeric constants, screen positions,
    filenames, or path names that presume a
    particular language - Example
  • Buffers are large enough to handle translated
    words and phrases - Example

10
Coding Consideration (2 of 4)
  • Program allows input of international data
  • All language editions share a common file format
    - Example
  • Code contains support for locale-specific
    hardware, if necessary.
  • Features that dont apply to international
    markets can be removed easily.

11
Coding Consideration (3 of 4)
  • Code properly handles accented characters -
    Example
  • Program handles non-homogeneous network
    environments in which machines are running
    different code page
  • Code uses API to retrieve lead-byte range for Far
    East code pages
  • Code correctly parses double-byte characters
    unless based on Unicode

12
Coding Consideration (4 of 4)
  • Code supports Unicode or conversion between
    Unicode and the local code page
  • Code doesnt assume that all characters are 8-bit
    or 16-bit - Example
  • Code uses generic data types and generic function
    prototypes
  • Program displays and prints text using
    appropriate fonts.
  • lt Jump to Source Code Controlgt

13
Avoid Hard-Coding Localizable Elements
  • Hard coded strings, characters, constants, screen
    positions, filenames, and file paths are
    difficult to track down and localize.
  • Example
  • If (szInputString0 O)
  • DoOpen( ) // when it is Open
  • ltGo Backgt

14
Make Buffers Large Enough to Hold Translated Text
  • Buffers that are declared to be the exact size of
    a word or a sentence will probably overflow when
    text is translated.
  • char szOK3
  • GetButtonName(szOK)
  • With the Win32 API, stack space is not so
    limited as Win16. So feel free to make a large
    buffer. Change 3 to 4095. ltGo Backgt

15
Do Not Limit Character Parsing to Latin Script
  • //Search until you find a noncharacter
  • Wrong
  • while ((pch gt A) pch lt Z)
  • (pch gt a) pch lt z))
  • pch
  • Correct
  • while (IsCharAlpha(pch))
  • pch
    ltGo Backgt

16
Do not assume that characters are always 8-bit
  • //Skip two characters
  • Wrong
  • szString szString 2
  • Correct
  • szString CharNext(CharNext(szString))
  • ltGo Backgt

17
Do not localize strings saved as part of your
file format
  • For example, you should not localize the keyword
    \bold which is used in RTF (Rich Text Format).
    Otherwise the RTF file cannot be recognized among
    different languages.
  • Another example - HTML tag should not be
    localized.
  • ltGo backgt

18
Do not concatenate strings to form sentences
  • English
  • String1 Not Enough memory to
  • String2 the file
  • 1 Variable name for open, create, edit, etc.
  • 2 Variable name for the file name
  • Bad coding example
  • String1 1 String2 2
  • In other language could be
  • ltGo Backgt

19
Source Code Control Consideration (1 of 2)
  • Use No-Compile Strategy for various localized
    product build - Example
  • Localizable items are stored in resource files
  • All language editions using single-byte character
    sets are based on a single executable - Example

20
Source Code Control Consideration (2 of 2)
  • All language editions using double-byte character
    sets are based on a single executable
  • All language editions using Unicode are based on
    a single executable

21
Internationalized Product for Localization
  • Key Success Factors
  • Localizable resources can be easily out-sourced
  • Bug fixing for base code can be easily applied
    to all
  • localized version
  • Leverage the testing result of base code with
    No-
  • Compile strategy

22
Isolating Localizable Resources
  • Certain algorithm
  • Constants
  • Dialogs
  • Macro Languages
  • Menus
  • Messages
  • Prompts
  • Sounds
  • Status bars
  • Toolbars

Separating all localizable items into one or more
files makes localization much easier to be
completed.
23
Sample Build Tree
Developers update files in the native directory
and use batch files to propagate the changes to
other language directory.
All files that need to be customized based on
language are in resources directory.
24
No-Compile Strategy
  • Core source code doesnt require recompiling
    every time when creating international editions
    of a product.
  • Compile your main executable only once. To create
    localized editions, you compile only the
    localized resource files and link them to the
    executable or to a separate DLL.
  • If your program is not based on Unicode, you may
    need one EXE for SBCS and one EXE for DBCS.
  • ltGo Backgt

25
Localization
The process of adapting a program for a
specific international market, which includes
translating the user interface, resizing dialog
boxes, customizing features, and testing results
to ensure that the program retains same
functionality and performance. It is not just
a translation process. The following will focus
on non-translation issues that may impact design
or coding.
26
Locale-Specific Coding Consideration
  • Windows operating systems (Win32 NLS API) and
    development tools may have enough API support you
    want. So it is not desirable to write proprietary
    sort, case, or character property tables in your
    code unless your system or development tools does
    not support.
  • If you are concerned about the overhead of
    continually calling the API, call it at startup
    time to create static tables.

27
Locale-Specific Issues (1 of 2)
  • Character set (Code Page) and Font
  • Date and Time Format
  • Calendar Format
  • Currency and Number Format
  • First Name and Last Name Format
  • Address Format
  • Phone Number
  • Culture or Political Sensitive Issues
  • Word or Phrase search

28
Locale-Specific Issues (1 of 2)
  • Word-warp (Line Breaking)
  • Character Sorting
  • Many languages dont have upper / lower cases
  • Unique ID for the citizens of the country
  • Laws, government regulation, taxes
  • Dependent hardware environment availability
  • Dependent software environment availability

29
DBCS-Enabling
The method of adapting a western-language
based program to be able to display, input,
store, retrieve and process double-byte
characters (in Japanese, Traditional Chinese,
Simplified Chinese and Korean) correctly. It
could be included in Internationalization or even
localization effort. In short, C Program has more
to do with DBCS-Enabling.
30
What is DBCS
  • DBCS - Double Byte Character Set
  • The characters for Traditional Chinese,
    Simplified Chinese, Korean and Japanese are more
    than 256, so one byte cannot encode all of them.
    In Windows, two bytes are used.
  • Many Chinese characters were borrowed or adapted
    for Japanese (Kanji) and Korean (banja) long time
    ago.

31
Lead-byte and Trail-byte Ranges for DBCS Code
Pages
32
Potential DBCS-Enabling Effort (1 of 4)
  • Input
  • Make sure Input Method Editor (IME) can be
    activated for DBCS data input.
  • Display
  • Font names, size and character set are changed
    for DBCS data.
  • Store
  • Database field types are OK for DBCS data.
  • Retrieve
  • Buffer length unit should be consistent (byte
    or character) or conversion is required.

33
Potential DBCS-Enabling Effort (2 of 4)
  • Search
  • A DBCS character must be distinguished from a
    SBCS character when searching a delimiter (such
    as \) in a string (such as path)
  • Compare
  • A comparison should work on character basis,
    not on byte basis.
  • Truncate/Concatenate
  • Should work on character basis

34
Potential DBCS-Enabling Effort (3 of 4)
  • Cursor
  • In cursor placement and cursor movement, it
    should never stay in the middle of the
    double-byte character.
  • Locale Specific issues
  • Sorting, line breaking, font, etc. mentioned
    in the previous section.
  • Code Conversion
  • Conversion to and from Internet mail format,
    locale-specific standard (eg. EUC-JP) and Unicode
    standards.

35
Potential DBCS-Enabling Effort (4 of 4)
  • Case conversions
  • No upper/lower case conversions for DBCS
  • data.
  • Third-Party code
  • If there is third-party code that cause
    double-
  • byte issues, solutions need to be
    implemented.

36
Dual Compilation
  • ifdef DBCS
  • ...
  • else
  • ...
  • endif
  • DBCS-enabled code doent affect base code
  • Create dual code base that you have to compile,
    test and maintain separately

37
Input Method Editor in DBCS Windows
38
Some DBCS-Enabling Notes on non-C/C products
  • Input Method Editor can be activated in VB,
    Delphi, browser, etc.
  • DBCS Character display can be enabled by
    specifying a proper font
  • Font Association capability in non-Japanese DBCS
    languages usually confuses the DBCS-Enabling
    effort
  • With Unicode built in, Java program perform the
    conversion between Unicode and code page
    implicitly

39
Unicode
Define all available characters of the
languages in the world as two-byte code (under
Windows) so that every string of data has the
same interpretation across different language of
operating systems Example The
character has different code values under
the Windows code pages of Japanese, Korean,
Simplified Chinese and Traditional Chinese, but
its code value in Unicode is consistent.
40
Advantages of Unicode
  • Sort and process international characters
    efficiently (lists, database indexes, network
    user names, etc.)
  • Eliminate the code to handle multiple code pages
    and double-bytes character set
  • Code that work for more than one language gets
    thoroughly tested in the process of releasing the
    first language edition
  • Unicode provides details for characters with
    semantic rules that can simplify text layout
  • Unicode does not significantly increase file size

41
Disadvantages of Unicode
  • Unicode doesnt help with complex text-based
    operations
  • sorting
  • hyphenation
  • line breaking
  • Unicode is not currently supported in many
    applications and fonts
  • Unicode font with DBCS characters are not built
    in under Windows 95/98/NT

42
Unicode on Windows NT
  • Handles characters internally in Unicode
  • Supports all of the wide-character variants of
    Win32 API
  • GDI processes all text in Unicode
  • Resource compiler compiles strings into Unicode
  • System information files are stored as Unicode
  • NTFS filenames are always in Unicode
  • Exchanges data on the network in Unicode format
    with other Unicode machine

43
Potential Unicode Implementation Issues for Asian
Products (1 of 2)
  • Some performance impact due to heavy conversions
  • - For non-Unicode program
  • - Convert back to code page for display (no
    DBCS
  • Unicode font)
  • Buffer length unit for Unicode is always 2 bytes
    (one Unicode character). This is not the same as
    non-Unicode AP.

44
Potential Unicode Implementation Issues for Asian
Products (2 of 2)
  • Some code values in code page dont have
    round-trip conversions.
  • For database with data in multiple languages and
    Unicode data is stored, the access program needs
    to be able to get code page information and
    convert the Unicode data.
  • For database with data in multiple languages and
    code page data is stored, the database needs to
    have a language type field so that access
    program can be used for conversion.

45
Q A
Question?
Write a Comment
User Comments (0)