ARCH-12 Broaden Your Potential Customer Base Using Unicode presentation

About This Presentation

Transcript and Presenter's Notes

Title: ARCH-12 Broaden Your Potential Customer Base Using Unicode

1
ARCH-12Broaden Your Potential Customer Base
Using Unicode

David Lund
Sr. Training Program Manager, Progress

2
Broaden Your Potential Customer Base Using Unicode

Unicode is the best way to support multiple
languages
A number of recent OpenEdge enhancements
facilitate Unicode
OpenEdge tools simplify the task

3
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

4
Unicode Essentials

Unicode foundation for creating internationalized
and localized applications
Unicode provides a unique number for every
character
Lossless round tripping
Mapping from any Unicode coded character sequence
S to a sequence of bytes and back will produce S
again

5
Unicode Essentials

UTF Unicode Transformation Format
Algorithm for mapping (encoding) Unicode scalar
value to a unique sequence
3 formats (mappings)
UTF-8, UTF-16, UTF-32
Formats vary in how they handle mapping
Impacts access, storage, and performance

6
Unicode Essentials

Code Page table that assigns a numeric value
Letters, numbers, punctuation, control codes,
etc.
prolang\list-cp.p lists code pages in convmap
Sample Code Page IBM850 (partial)
Character 2 is hex 32

7
Progress I18N Essentials
I18N (Internationalization)

Undefined code page
Tells Progress not to do any conversions when
reading or writing data
For example
Sports database uses undefined
Can be used with any character set

8
Progress I18N Essentials
I18N (Internationalization)

Startup parameters
-cpinternal
Code page used for internal data processing
-cpstream
Code page used for stream files
Parameter file prolang\UTF\UTF-8.pf

-cpinternal utf-8 -cpstream utf-8
9
Progress I18N Essentials

Performing code page conversions
Progress provides a character set management
facility
Automatically converts data between the code
pages of different data sources and targets
Must be in CONVMAP file
Targets for code page conversion
Memory (-cpinternal)
Streams (-cpstream)
Databases

10
Progress I18N Essentials
Referenced code pages must be in CONVMAP

Modifying CONVMAP
Edit convmap.dat
Compile CONVMAP
Make convmap.cp available to session
Progress installation directory
PROCONV environment variable
-convmap startup parameter

proutil ltdbnamegt C CODEPAGE-COMPILER
convmap.dat convmap.cp
11
Progress I18N Essentials

Converting characters or strings in memory
Specify code page in functions
ASC
CHR
CODEPAGE-CONVERT
Converting input and output data
Specify code page in statements
INPUT FROM (input source to memory target)
OUTPUT TO (memory source to output target)

12
Fonts for Unicode

Locating fonts on windows
C\WINDOWS\Fonts
Control Panel, select Font icon
Unicode fonts may need to be purchased
Setting Unicode fonts for Progress
Progress.ini
Use ini2reg.exe to place in registry

13
System Resources
14
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

15
Migrating a Database to Unicode

Two ways to migrate database to Unicode
Dump and Load
Converting the database without doing a dump and
load
Start an OpenEdge session
Use startup parameters
-cpinternal UTF-8
-cpstream UTF-8

16
Migrating a Database to UnicodeCautions
Using dump and load 1 of 3

Backup your database
Dump definitions and data
Do not do a binary dump and load
Binary data is not converted to the code page of
the database when it is loaded
Always use Data Admin tool
Goes through automatic conversion

17
Migrating a Database to Unicode
Using dump and load 2 of 3

Create an empty UTF-8 database
Data Administration tool
DatabasegtCreate Database
Create Database dialog
Select radio set to create a copy of some other
database
Select an empty database from prolang/UTF-8
For example empty4.db

18
Migrating a Database to Unicode
Using dump and load 3 of 3

Load the Definitions
Load will convert to UTF-8 automatically
Load the Data
Data will be automatically converted to UTF-8
from the dumped code page when it is loaded

19
Migrating a Database to Unicode
Converting without a dump and load

Backup your database
Use proutil to convert the database
Load the UTF-8 collation table
prolang/UTF/_tran.df
Assign a word break rules to the database
Rebuild the indexes

proutil ltdb-namegt -C convchar convert UTF-8
proutil ltdb-namegt -C idxbuild
20
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

21
Benefits of GUI Unicode Client
Added in OpenEdge 10.0A release

Multi-lingual
Able to use data from multiple languages in the
same session
Fully enables AppBuilder to build multilingual
UTF-8 applications
Easier deployment
Lower costs, higher ROI
No need to have different configurations using
specific settings per language
Increased competitive advantage
No (or very few changes) required to existing
apps to take advantage of GUI Unicode client

22
Unicode Editor

RichEdit editor in OpenEdge 10
Supports Unicode
Selecting an editor
Modify UseSourceEditor in progress.ini
Default SlickEdit
UseSourceEditoryes
For Unicode use RichEdit
UseSourceEditorno

23
Demonstration
GUI Unicode Client
Multiple Languages
24
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

25
Linguistic Sorting
The goal

Language sensitive collations
Tailor to expectations of locale
Language
Country
Easy to use
Functions just like any other collation for 4GL

26
Unicode Sorting

OpenEdge 10.0A supports binary sorting
Basic collation support
Sorts by value in code page
Possible to do user defined sorting
OpenEdge 10.0B also supports linguistic sorting
Supports ICU collations
International Components for Unicode
OpenEdge does not support multiple collations in
the database

27
Binary versus Linguistic Sorting -A Visual
Linguistic Sort
Binary Sort

beet
carrot
entry
trust
zoom
école
çedilla

beet
carrot
çedilla
école
entry
trust
zoom

English (ICU-en)
28
Linguistic Sorting

Progress uses collations for
-cpcoll session startup parameter
Database collation
Collation of database CLOB column
Argument to
COMPARE function
COLLATE option of the BY phrase

29
Linguistic SortingSupported Collations

OpenEdge supports all ICU collations in the
icui18n library
Beyond icui18n one additional collation is
supported
Japanese Hiragana Quaternary as
case-sensitive

30
Linguistic Sorting

4GL Usage - Reference collation by name
For example ICU-fr for French
Specify using
-cpcoll lttable namegt
Identifies collation table to use with code page
in memory at session startup
lttable namegt is the collation table in convmap.cp
or the name of the ICU collation
4GL Statements
COMPARE
COLLATE

31
Linguistic Sorting
Sort order depends on selected collation
/ French collation / DISPLAY ICU-fr
COMPARE("côte", "lt", "coté", "case-insensitive"
, "ICU-fr") / Spanish collation / DISPLAY
ICU-es COMPARE("côte", "lt", "coté",
"case-insensitive", "ICU-es")

Output of above statements

ICU-fr yes ICU-es no
32
Linguistic SortingExamples 1 of 4

Examples
UTF-8 database with basic collation
Names beet, carrot, çedilla, entry, école, zoom,
trust

FOR EACH words WHERE name lt t DISPLAY
name. END.

Output result

beet carrot entry
33
Linguistic SortingExamples 2 of 4
FOR EACH words WHERE name gt t DISPLAY
name. END.

Output result

trust zoom école çedilla
34
Linguistic SortingExamples 3 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) DISPLAY
name. END.

Output result

beet carrot entry école çedilla
35
Linguistic SortingExamples 4 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) BY
COLLATE(name, case-insensitive,
ICU-en) DISPLAY name. END.

Output result

beet carrot çedilla école entry
36
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

37
Unicode Normalization

Why is this needed?
Puts in NCF format as expected by XML (and
other W3C entities)
Best way to convert from Unicode to other code
pages
Useful when doing tasks such as making
comparisons

38
Unicode Normalization
What is normalization?

Unicode has different ways of expressing the same
characters
Base letter plus combining marks (accents) as two
Unicode code points
Á composite (composed)
(U0041, Latin Capital Letter A)
(U0301, Combining Acute Accent )
Base letter and accents as one Unicode code point
Á precomposed
(U00C1, Latin Capital Letter A with Acute)

39
Unicode Normalization

NORMALIZE
4GL function new in OpenEdge 10.0B
Returns either CHAR or LONGCHAR
Matches the source string
CHAR variable must be UTF-8
LONGCHAR variable any form of Unicode
UTF-8, UTF-16, UTF-32

result-string NORMALIZE(source-string,
normalization-mode)
40
Normalization Modes Supported

NFD
Canonical Decomposition
NFC
Canonical Decomposition, followed by Canonical
Composition
NFKD
Compatibility Decomposition
NFKC
Compatibility Decomposition, followed by
Canonical Composition
None
No change to source string
Turns off normalization when normalization-mode
is a variable

41
Agenda - Implementing Unicode

Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider

42
Bidi Support

Bi-directional (bidi)
Behavior of individual widgets and/or the
complete window to go from right to left or left
to right
Supported
Fill-in widget
Can type right to left of left to right
Not-Supported
Whole frame
Cannot switch labels from left side to right side

43
GB18030 Code Page SupportAdded in OpenEdge 10.0B

New Chinese code page
Required for all new software sold in mainland
China as of Jan. 1, 2001

44
Broaden Your Potential Customer Base Using Unicode
In summary

Unicode is the best way to support multiple
languages
A number of recent OpenEdge enhancements
facilitate Unicode
OpenEdge tools simplify the task

45
Documentation

OpenEdge Development
Internationalizing Applications

46
Unicode Resources

Unicode Home page
http//www.unicode.org
Unicode Standard, Unicode Consortium
International Components for Unicode
http//www-124.ibm.com/icu/docs/
http//www-124.ibm.com/icu/docs/papers/forms_of_un
icode/

47
System Resources

Viewing keyboard layouts
http//www.microsoft.com/globaldev/reference/keybo
ards.aspx
Select the language and the keyboard layout is
displayed
Use shift key to toggle to lower/upper case
characters
Use MS Internet Explorer to display

48
Questions?
49
Thank you for your time!
50
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

ARCH-12 Broaden Your Potential Customer Base Using Unicode PowerPoint PPT Presentation