DEV10: Supporting Multiple Languages In Your Application - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

DEV10: Supporting Multiple Languages In Your Application

Description:

Each with regionally-limited repertoire of characters. Unicode. Uni code = One ... UTF-8 database with 'basic' collation. Names: beet, carrot, edilla, entry, ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 47
Provided by: PSC64
Category:

less

Transcript and Presenter's Notes

Title: DEV10: Supporting Multiple Languages In Your Application


1
DEV-10 Supporting Multiple Languages In Your
Application
Salvador Viñals
Consultant Product Manager
2
Agenda
  • International support with OpenEdge 10
  • OpenEdge internationalization update
  • GB18030
  • Sorting and Collations
  • Unicode Normalization
  • Default word-break tables and double-byte
  • For more information, go to
  • Summary

This presentation includes annotations with
additional, complementary information
3
Code-Pages and Unicode
  • Code-pages
  • Many code-pages
  • Max 255 characters each
  • Each with regionally-limited repertoire of
    characters
  • Unicode
  • Uni code One
  • Uni code Universal
  • Virtually all the world's characters
  • Distinguishes characters by script, but not by
    language.
  • UTF-8, UTF-16, UTF-32
  • Unicode binary representations (8,16,32 bits)

4
OpenEdge Products
International readiness
  • OpenEdge 10 products support UTF-8 (Unicode)
  • Database (Personal, Workgroup, Enterprise)
  • Application Servers AppServer, WebSpeed (Basic,
    Enterprise)
  • GUI Clients (Client Networking, WebClient) and
    Batch Client
  • Exceptions
  • Character Client and DataServers Use code-pages
    instead
  • Code-pages and Unicode can interoperate

5
Configurations
6
Translation Products
  • Translation Manager (TranMan)
  • Visual Translator (VisTran)
  • Products life cycle
  • Progress V9 Functionally Stable
  • OpenEdge 10 Active

TranMan and VisTran run on Windows only, however
they can be used to manage translations of ChUI
or GUI applications.
7
Agenda
  • International support with OpenEdge 10
  • OpenEdge internationalization update
  • GB18030
  • Sorting and Collations
  • Unicode Normalization
  • Default word-break tables and double-byte
  • For more information, go to
  • Summary

This presentation includes annotations with
additional, complementary information
8
Support for GB18030 Code Page
  • Chinese code page
  • Required for all new software sold in mainland
    China

9
Support for GB18030 Code Page
  • Why is this code page unique?
  • Does not fit into lead-byte / trail-byte model
  • It has 1, 2, and 4 byte characters
  • Cannot tell from lead-byte if there are 2 or 4
    bytes in the character

10
Support for GB18030 Code Page
  • Supported by making conversions of the GB18030
    code page to and from UTF-8
  • Requires cpinternal to be UTF-8
  • No cpinternal for GB18030
  • Reading and writing a file in GB18030
  • Converts to/from UTF-8

11
Linguistic Sorting
The goal
  • Unicode sorting for UTF-8
  • Language-sensitive collations
  • Tailor app to expectations of locale
  • Language
  • Location (country, region, etc.)
  • Easy to use
  • Functions just like any other collation for ABL,
    and OpenEdge Database or SQL users
  • Prior to 10.0B UTF-8 collation was binary sort

12
Some collation examples Latin alphabet
13
Linguistic Sorting
Internals
  • OpenEdge Database meta-schema
  • Table _DB-collate
  • Already used for single-byte sort weights
  • New functionality used for summary information
  • Table _Collation
  • Added in 10.0A in preparation
  • Can hold any amount of collation data

14
Linguistic Sorting
  • ABL Usage
  • Reference collation by name
  • For example ICU-fr for French
  • Specify using
  • -cpcoll lttable namegt
  • Identifies collation table to use with code page
    in memory at session startup
  • lttable namegt is the collation table in convmap.cp
    or the name of the ICU collation
  • ABL Statements
  • COMPARE
  • COLLATE

15
Linguistic Sorting
  • COMPARE and COLLATE new strengths supported
  • 10.0A strengths CASE-INSENSITIVE,
    CASE-SENSITIVE, CAPS and RAW
  • Added strengths
  • PRIMARY
  • SECONDARY CASE-INSENSITIVE
  • TERTIARY CASE-SENSITIVE
  • QUATERNARY

16
Linguistic Sorting
Sort order depends on selected collation
/ French collation / DISPLAY ICU-fr
COMPARE("côte", "lt", "coté", "case-insensitive",
"ICU-fr") / Spanish collation / DISPLAY
ICU-es COMPARE("côte", "lt", "coté",
"case-insensitive", "ICU-es")
  • Output of above statements

ICU-fr yes ICU-es no
17
Linguistic Sorting
  • OpenEdge uses collations for
  • The cpcoll startup parameter
  • The database collation
  • The collation of a database CLOB column
  • An argument to the COMPARE function or COLLATE
    option of the BY phrase

18
Linguistic Sorting
Rules
  • Once a collation is specified for the database in
    the _Collation table, it cannot be modified
  • Once the collation is written to the _Collation
    table, it is the only collation with that name
    that can be used by that database
  • It is strongly recommended that databases should
    be backed up before using an ICU collation

19
Linguistic Sorting
Example 1 of 4
  • The following examples assume
  • UTF-8 database with basic collation
  • Names
  • beet, carrot, çedilla, entry, école, trust, zoom

FOR EACH words WHERE name lt t DISPLAY
name. END.
  • Output result

beet carrot entry
20
Linguistic Sorting
Example 2 of 4
FOR EACH words WHERE name gt t DISPLAY
name. END.
  • Output result

trust zoom école çedilla
21
Linguistic Sorting
Example 3 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive,ICU-en) DISPLAY
name. END.
  • Output result
  • Before, without COMPARE

beet carrot entry école çedilla
beet carrot entry
22
Linguistic Sorting
Example 4 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive,ICU-en) BY
COLLATE(name,case-insensitive,ICU-en) DISPLA
Y name. END.
  • Output result
  • Before, without BY COLLATE

beet carrot çedilla école entry
beet carrot entry école çedilla
23
Linguistic Sorting
Supported Collations
  • OpenEdge supports ICU collations in the icui18n
    library for supported OpenEdge languages
  • One additional collation is supported - Japanese
    Hiragana Quaternary as case-sensitive
  • Uses the QUATERNARY strength as the
    CASE-SENSITIVE strength
  • ICU-ja__HQ Japanese Hiragana Quaternary

24
Linguistic SortingICU Collations Available 1 of 3
  • ICU-UCA UCA (default Unicode Collation
    Algorithm)
  • ICU-ar Arabic
  • ICU-be Belarusian
  • ICU-bg Bulgarian
  • ICU-ca Catalan
  • ICU-cs Czech
  • ICU-da Danish
  • ICU-de__PHONEBOOK German phonebook
  • ICU-el Greek
  • ICU-en_BE English Belgium
  • ICU-eo Esperanto
  • ICU-es Spanish
  • ICU-es__TRADITIONAL Spanish traditional
  • ICU-et Estonian
  • ICU-fa Persian
  • ICU-fi Finnish
  • ICU-fr French
  • ICU-gu Gujarati

25
Linguistic SortingICU Collations Available 2 of 3
  • ICU-he Hebrew
  • ICU-hi Hindi
  • ICU-hi__DIRECT Hindi direct
  • ICU-hr Croatian
  • ICU-hu Hungarian
  • ICU-is Icelandic
  • ICU-ja Japanese
  • ICU-ko Korean
  • ICU-kn Kannada
  • ICU-lt Lithuanian
  • ICU-lv Latvian
  • ICU-mk Macedonian
  • ICU-mr Marathi
  • ICU-mt Maltese
  • ICU-nb Norwegian Bokmål
  • ICU-nn Norwegian Nynorsk
  • ICU-pl Polish
  • ICU-ro Romanian

26
Linguistic SortingICU Collations Available 3 of 3
  • ICU-ru Russian
  • ICU-sh Saint Helena
  • ICU-sk Slovak
  • ICU-sl Slovenian
  • ICU-sq Albanian
  • ICU-sr Serbian
  • ICU-sv Swedish
  • ICU-ta Tamil
  • ICU-te Telugu
  • ICU-th Thai
  • ICU-tr Turkish
  • ICU-uk Ukrainian
  • ICU-vi Vietnamese
  • ICU-zh Chinese
  • ICU-zh__PINYIN Chinese Pinyin
  • ICU-zh_HK Chinese Hong Kong
  • ICU-zh_MO Chinese Macau
  • ICU-zh_TW Chinese Taiwan

27
Collations Gotchas
  • If Database, Clients and Servers use different
    collations (-cpcoll), indexed and non-indexed
    queries may return different results
  • If a client needs different collation than
    database, you can use COMPARE, COLLATE on the
    client
  • Performance impact with large results sets

28
Configuration Gotchas
Typical character client configuration, 1/2
  • Database code-page is 1252 on Windows server
  • OpenEdge install startup.pf setting is
  • cpinternal 1252 cpstream 1252
  • French Windows Client with
  • a default Windows code page of 1252, and
  • a DOS system code page of ibm850
  • DOS Character Client starts without specifying
    -cpinternal and cpstream
  • so uses 1252 from startup.pf

29
Configuration Gotchas
Typical character client configuration, 2/2
  • User enters è (Hex 8A in ibm850)
  • Since session is started with cpinternal 1252
    OpenEdge doesnt convert when writing to the
    database.
  • The entered value is written to the database as
    8A, when it should be E8 (1252)
  • Start Character Client with cpinternal and
    cpstream set to ibm850

30
Unicode Normalization
What is Normalization?
  • Unicode has different ways of expressing the same
    characters
  • Decomposed
  • Á (U0041, Latin Capital Letter A)
  • (U0301, Combining Acute Accent )
  • Composed
  • Á (U00C1, Latin Capital Letter A with Acute)

31
Unicode Normalization
Why Normalization?
  • XML (and other W3C entities) expects data in
    NFC form
  • Best way to convert from Unicode to other code
    pages
  • Useful when doing tasks such as making
    comparisons

NFC Canonical Decomposition, followed by
Canonical Composition
32
Unicode Normalization
NORMALIZE Language Function
  • NORMALIZE
  • Returns either CHAR or LONGCHAR
  • Matches the source string
  • CHAR variable must be UTF-8
  • LONGCHAR variable can be any form of Unicode
  • UTF-8, UTF-16, UTF-32

result-string NORMALIZE(source-string,
normalization-mode)
33
Normalization Modes Supported
Normalization modes from ICU library
  • NFD Canonical Decomposition
  • NFC Canonical Decomposition, followed by
    Canonical Composition (default)
  • NFKD Compatibility Decomposition
  • NFKC Compatibility Decomposition, followed by
    Canonical Composition
  • None No change to source string. Turns off
    normalization when normalization-mode is a
    variable

34
Unicode Normalization
Additional information
  • Unicode Normalization Forms
  • Recommended for understanding normalization forms
    used with NORMALIZE function
  • http//www.unicode.org/unicode/reports/tr15/
  • International Components for Unicode (ICU)
    libraries globalization, in-depth information
  • http//icu.sourceforge.net/userguide/intro.html

35
Default Word-Break Tables
  • Prior to 10.1A
  • User had to configure word-break tables for use
    with double-byte and UTF-8 databases

36
Default Word-Break Tables
10.1A simplifies implementing double-byte
databases
  • Default Word-Break Tables added for
  • Double-byte
  • UTF-8 Databases
  • These are available out of the box
  • Either in product or for download
  • Simplifies accessing non-single-byte databases

37
Default Word-Break Tables
10.1A simplifies implementing double-byte
databases
  • 10.1A provides 10 compiled files
  • See list on next slide
  • Ranging from proword.245 to proword.254
  • Located in subdirectory with corresponding empty
    databases
  • Subdirectory prolang/ltlanguagegt

38
Default Word-Break TablesCompiled, Available out
of the box
10.1A simplifies implementing double-byte
databases
  • Available as part of the Supplemental PROMSGS
    package
  • Available for download
  • Japanese SHIFT-JIS                 proword.253
  • Japanese EUCJIS                    proword.250
  • Korean CP949                       proword.248
  • Korean KSC5601                     proword.252
  • Chinese (simplified) CP936         proword.247
  • Chinese (simplified) GB2312        proword.251
  • Chinese (traditional) CP950        proword.249
  • Chinese (traditional) BIG-5        proword.246
  • Chinese (traditional) CP950-HKSCS  proword.245
  • UTF-8                              proword.254

39
Default Word-Break Tables
  • What if you are using proword file in the range
    of 245 254?
  • Copy the file to proword.ltnnngt
  • Where ltnnngt is less than 240
  • Apply word rule to the database
  • No index-build is required for this change
  • Remember, apply the change in all tiers (Client,
    Server, Database) to prevent corruption!

40
Agenda
  • International support with OpenEdge 10
  • OpenEdge internationalization update
  • GB18030
  • Sorting and Collations
  • Unicode Normalization
  • Default word-break tables and double-byte
  • For more information, go to
  • Summary

This presentation includes annotations with
additional, complementary information
41
For More Information, go to
  • Expand to New Countries Business Empowerment
    Program
  • Contact your Account Manager
  • Product documentation
  • OpenEdge Development Internationalizing
    Applications
  • OpenEdge Development Visual Translator
  • OpenEdge Development Translation Manager
  • Visit PSDN for white papers and presentations,
    for example
  • Understanding Internationalization web seminar
  • Training and Professional Services
    www.progress.com

42
Agenda
  • International support with OpenEdge 10
  • OpenEdge internationalization update
  • GB18030
  • Sorting and Collations
  • Unicode Normalization
  • Default word-break tables and double-byte
  • For more information, go to
  • Summary

This presentation includes annotations with
additional, complementary information
43
In Summary
  • Use UTF-8
  • GB18030
  • Linguistic Sorting and Collations
  • Use ICU-
  • Unicode Normalization
  • Default word-break tables and double-byte
  • Expand to New Countries Business Empowerment
    Program

44
Questions?
45
Thank you foryour time
46
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com