Introductions to Software Internationalization for Saba 210KB

1 / 45

About This Presentation

Title:

Introductions to Software Internationalization for Saba 210KB

Description:

Introductions to. Software Internationalization. Internal Training. By: Ernie Huang ... The method of developing a program whose feature and code design are not based ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 46

Provided by: ernie58

more less

Transcript and Presenter's Notes

Title: Introductions to Software Internationalization for Saba 210KB

1
Introductions to Software Internationalization
Internal Training
By Ernie Huang Date Nov 10, 2000
2
Course Outlines

Internationalization (I18N) (3 - 24)
Locale-Specific Issues (25 - 28)
Double-byte Enabling (29 - 38)
Unicode Issues (39 - 44)
Q A (45)

3
Internationalization

The method of developing a program whose
feature and code design are not based on a single
language or locale and the source code base
facilitates the creation of different language
editions
Marketing Consideration
Design Consideration
Coding Consideration
Source Code Control Consideration

4
Marketing Consideration

Reach global market with less development effort
Meet the need of international enterprise
Leverage international partners for sale and
support of international market

5
Shipment of International Products

Begin working on international editions after the
domestic edition has been released or when it is
almost finished.
Plan for international products in advance, work
on several language editions concurrently, and
ship them all at roughly the same time.

6
Categories of Internationalization for Microsoft
Windows 95

European (Western/Central/Eastern Europe, etc.)
single-byte, left to right
Middle Eastern (Arabic, Hebrew)
single-byte, bidirectional
Far Eastern (Trad. Chinese, Simp. Chinese,
Japanese, Korean)
double-byte, horizontal and vertical, input
method
Thai (Thai) ltGo to Source Code
Controlgt
single-byte, left to right, text layout

7
Design Consideration (1 of 2)

Features important to international markets are
included
Icons and bitmaps are generic, are culturally
acceptable, and do not contain text
Menu and dialog-box designs leave room for text
expansion
Text and messages are devoid of slang and
specific cultural references
Consistent English user interface terminology is
used in strings.

8
Design Consideration (2 of 2)

Strings are documented using comments to provide
context for translators. Strings or characters
that should not be localized are marked.
Shortcut-key combinations are accessible on
international keyboards
International laws affecting feature designs are
considered
Third party agreements support international
design issues

9
Coding Consideration (1 of 4)

Code doesnt concatenate strings to form
sentences - Example
Code doesnt use a given string variable in more
than one context
Code doesnt contain hard-coded character
constants, numeric constants, screen positions,
filenames, or path names that presume a
particular language - Example
Buffers are large enough to handle translated
words and phrases - Example

10
Coding Consideration (2 of 4)

Program allows input of international data
All language editions share a common file format
- Example
Code contains support for locale-specific
hardware, if necessary.
Features that dont apply to international
markets can be removed easily.

11
Coding Consideration (3 of 4)

Code properly handles accented characters -
Example
Program handles non-homogeneous network
environments in which machines are running
different code page
Code uses API to retrieve lead-byte range for Far
East code pages
Code correctly parses double-byte characters
unless based on Unicode

12
Coding Consideration (4 of 4)

Code supports Unicode or conversion between
Unicode and the local code page
Code doesnt assume that all characters are 8-bit
or 16-bit - Example
Code uses generic data types and generic function
prototypes
Program displays and prints text using
appropriate fonts.
lt Jump to Source Code Controlgt

13
Avoid Hard-Coding Localizable Elements

Hard coded strings, characters, constants, screen
positions, filenames, and file paths are
difficult to track down and localize.
Example
If (szInputString0 O)
DoOpen( ) // when it is Open
ltGo Backgt

14
Make Buffers Large Enough to Hold Translated Text

Buffers that are declared to be the exact size of
a word or a sentence will probably overflow when
text is translated.
char szOK3
GetButtonName(szOK)
With the Win32 API, stack space is not so
limited as Win16. So feel free to make a large
buffer. Change 3 to 4095. ltGo Backgt

15
Do Not Limit Character Parsing to Latin Script

//Search until you find a noncharacter
Wrong
while ((pch gt A) pch lt Z)
(pch gt a) pch lt z))
pch
Correct
while (IsCharAlpha(pch))
pch
ltGo Backgt

16
Do not assume that characters are always 8-bit

//Skip two characters
Wrong
szString szString 2
Correct
szString CharNext(CharNext(szString))
ltGo Backgt

17
Do not localize strings saved as part of your
file format

For example, you should not localize the keyword
\bold which is used in RTF (Rich Text Format).
Otherwise the RTF file cannot be recognized among
different languages.
Another example - HTML tag should not be
localized.
ltGo backgt

18
Do not concatenate strings to form sentences

English
String1 Not Enough memory to
String2 the file
1 Variable name for open, create, edit, etc.
2 Variable name for the file name
Bad coding example
String1 1 String2 2
In other language could be
ltGo Backgt

19
Source Code Control Consideration (1 of 2)

Use No-Compile Strategy for various localized
product build - Example
Localizable items are stored in resource files
All language editions using single-byte character
sets are based on a single executable - Example

20
Source Code Control Consideration (2 of 2)

All language editions using double-byte character
sets are based on a single executable
All language editions using Unicode are based on
a single executable

21
Internationalized Product for Localization

Key Success Factors
Localizable resources can be easily out-sourced
Bug fixing for base code can be easily applied
to all
localized version
Leverage the testing result of base code with
No-
Compile strategy

22
Isolating Localizable Resources

Certain algorithm
Constants
Dialogs
Macro Languages
Menus
Messages
Prompts
Sounds
Status bars
Toolbars

Separating all localizable items into one or more
files makes localization much easier to be
completed.
23
Sample Build Tree
Developers update files in the native directory
and use batch files to propagate the changes to
other language directory.
All files that need to be customized based on
language are in resources directory.
24
No-Compile Strategy

Core source code doesnt require recompiling
every time when creating international editions
of a product.
Compile your main executable only once. To create
localized editions, you compile only the
localized resource files and link them to the
executable or to a separate DLL.
If your program is not based on Unicode, you may
need one EXE for SBCS and one EXE for DBCS.
ltGo Backgt

25
Localization
The process of adapting a program for a
specific international market, which includes
translating the user interface, resizing dialog
boxes, customizing features, and testing results
to ensure that the program retains same
functionality and performance. It is not just
a translation process. The following will focus
on non-translation issues that may impact design
or coding.
26
Locale-Specific Coding Consideration

Windows operating systems (Win32 NLS API) and
development tools may have enough API support you
want. So it is not desirable to write proprietary
sort, case, or character property tables in your
code unless your system or development tools does
not support.
If you are concerned about the overhead of
continually calling the API, call it at startup
time to create static tables.

27
Locale-Specific Issues (1 of 2)

Character set (Code Page) and Font
Date and Time Format
Calendar Format
Currency and Number Format
First Name and Last Name Format
Address Format
Phone Number
Culture or Political Sensitive Issues
Word or Phrase search

28
Locale-Specific Issues (1 of 2)

Word-warp (Line Breaking)
Character Sorting
Many languages dont have upper / lower cases
Unique ID for the citizens of the country
Laws, government regulation, taxes
Dependent hardware environment availability
Dependent software environment availability

29
DBCS-Enabling
The method of adapting a western-language
based program to be able to display, input,
store, retrieve and process double-byte
characters (in Japanese, Traditional Chinese,
Simplified Chinese and Korean) correctly. It
could be included in Internationalization or even
localization effort. In short, C Program has more
to do with DBCS-Enabling.
30
What is DBCS

DBCS - Double Byte Character Set
The characters for Traditional Chinese,
Simplified Chinese, Korean and Japanese are more
than 256, so one byte cannot encode all of them.
In Windows, two bytes are used.
Many Chinese characters were borrowed or adapted
for Japanese (Kanji) and Korean (banja) long time
ago.

31
Lead-byte and Trail-byte Ranges for DBCS Code
Pages
32
Potential DBCS-Enabling Effort (1 of 4)

Input
Make sure Input Method Editor (IME) can be
activated for DBCS data input.
Display
Font names, size and character set are changed
for DBCS data.
Store
Database field types are OK for DBCS data.
Retrieve
Buffer length unit should be consistent (byte
or character) or conversion is required.

33
Potential DBCS-Enabling Effort (2 of 4)

Search
A DBCS character must be distinguished from a
SBCS character when searching a delimiter (such
as \) in a string (such as path)
Compare
A comparison should work on character basis,
not on byte basis.
Truncate/Concatenate
Should work on character basis

34
Potential DBCS-Enabling Effort (3 of 4)

Cursor
In cursor placement and cursor movement, it
should never stay in the middle of the
double-byte character.
Locale Specific issues
Sorting, line breaking, font, etc. mentioned
in the previous section.
Code Conversion
Conversion to and from Internet mail format,
locale-specific standard (eg. EUC-JP) and Unicode
standards.

35
Potential DBCS-Enabling Effort (4 of 4)

Case conversions
No upper/lower case conversions for DBCS
data.
Third-Party code
If there is third-party code that cause
double-
byte issues, solutions need to be
implemented.

36
Dual Compilation

ifdef DBCS
...
else
...
endif
DBCS-enabled code doent affect base code
Create dual code base that you have to compile,
test and maintain separately

37
Input Method Editor in DBCS Windows
38
Some DBCS-Enabling Notes on non-C/C products

Input Method Editor can be activated in VB,
Delphi, browser, etc.
DBCS Character display can be enabled by
specifying a proper font
Font Association capability in non-Japanese DBCS
languages usually confuses the DBCS-Enabling
effort
With Unicode built in, Java program perform the
conversion between Unicode and code page
implicitly

39
Unicode
Define all available characters of the
languages in the world as two-byte code (under
Windows) so that every string of data has the
same interpretation across different language of
operating systems Example The
character has different code values under
the Windows code pages of Japanese, Korean,
Simplified Chinese and Traditional Chinese, but
its code value in Unicode is consistent.
40
Advantages of Unicode

Sort and process international characters
efficiently (lists, database indexes, network
user names, etc.)
Eliminate the code to handle multiple code pages
and double-bytes character set
Code that work for more than one language gets
thoroughly tested in the process of releasing the
first language edition
Unicode provides details for characters with
semantic rules that can simplify text layout
Unicode does not significantly increase file size

41
Disadvantages of Unicode

Unicode doesnt help with complex text-based
operations
sorting
hyphenation
line breaking
Unicode is not currently supported in many
applications and fonts
Unicode font with DBCS characters are not built
in under Windows 95/98/NT

42
Unicode on Windows NT

Handles characters internally in Unicode
Supports all of the wide-character variants of
Win32 API
GDI processes all text in Unicode
Resource compiler compiles strings into Unicode
System information files are stored as Unicode
NTFS filenames are always in Unicode
Exchanges data on the network in Unicode format
with other Unicode machine

43
Potential Unicode Implementation Issues for Asian
Products (1 of 2)

Some performance impact due to heavy conversions
- For non-Unicode program
- Convert back to code page for display (no
DBCS
Unicode font)
Buffer length unit for Unicode is always 2 bytes
(one Unicode character). This is not the same as
non-Unicode AP.

44
Potential Unicode Implementation Issues for Asian
Products (2 of 2)

Some code values in code page dont have
round-trip conversions.
For database with data in multiple languages and
Unicode data is stored, the access program needs
to be able to get code page information and
convert the Unicode data.
For database with data in multiple languages and
code page data is stored, the database needs to
have a language type field so that access
program can be used for conversion.

45
Q A
Question?

Write a Comment

User Comments (0)