Title: ARCH-12 Broaden Your Potential Customer Base Using Unicode
1ARCH-12Broaden Your Potential Customer Base
Using Unicode
- David Lund
- Sr. Training Program Manager, Progress
2Broaden Your Potential Customer Base Using Unicode
- Unicode is the best way to support multiple
languages - A number of recent OpenEdge enhancements
facilitate Unicode - OpenEdge tools simplify the task
3Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
4Unicode Essentials
- Unicode foundation for creating internationalized
and localized applications - Unicode provides a unique number for every
character - Lossless round tripping
- Mapping from any Unicode coded character sequence
S to a sequence of bytes and back will produce S
again
5Unicode Essentials
- UTF Unicode Transformation Format
- Algorithm for mapping (encoding) Unicode scalar
value to a unique sequence - 3 formats (mappings)
- UTF-8, UTF-16, UTF-32
- Formats vary in how they handle mapping
- Impacts access, storage, and performance
6Unicode Essentials
- Code Page table that assigns a numeric value
- Letters, numbers, punctuation, control codes,
etc. - prolang\list-cp.p lists code pages in convmap
- Sample Code Page IBM850 (partial)
- Character 2 is hex 32
7Progress I18N Essentials
I18N (Internationalization)
- Undefined code page
- Tells Progress not to do any conversions when
reading or writing data - For example
- Sports database uses undefined
- Can be used with any character set
8Progress I18N Essentials
I18N (Internationalization)
- Startup parameters
- -cpinternal
- Code page used for internal data processing
- -cpstream
- Code page used for stream files
- Parameter file prolang\UTF\UTF-8.pf
-cpinternal utf-8 -cpstream utf-8
9Progress I18N Essentials
- Performing code page conversions
- Progress provides a character set management
facility - Automatically converts data between the code
pages of different data sources and targets - Must be in CONVMAP file
- Targets for code page conversion
- Memory (-cpinternal)
- Streams (-cpstream)
- Databases
10Progress I18N Essentials
Referenced code pages must be in CONVMAP
- Modifying CONVMAP
- Edit convmap.dat
- Compile CONVMAP
- Make convmap.cp available to session
- Progress installation directory
- PROCONV environment variable
- -convmap startup parameter
proutil ltdbnamegt C CODEPAGE-COMPILER
convmap.dat convmap.cp
11Progress I18N Essentials
- Converting characters or strings in memory
- Specify code page in functions
- ASC
- CHR
- CODEPAGE-CONVERT
- Converting input and output data
- Specify code page in statements
- INPUT FROM (input source to memory target)
- OUTPUT TO (memory source to output target)
12Fonts for Unicode
- Locating fonts on windows
- C\WINDOWS\Fonts
- Control Panel, select Font icon
- Unicode fonts may need to be purchased
- Setting Unicode fonts for Progress
- Progress.ini
- Use ini2reg.exe to place in registry
13System Resources
14Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
15Migrating a Database to Unicode
- Two ways to migrate database to Unicode
- Dump and Load
- Converting the database without doing a dump and
load - Start an OpenEdge session
- Use startup parameters
- -cpinternal UTF-8
- -cpstream UTF-8
16Migrating a Database to UnicodeCautions
Using dump and load 1 of 3
- Backup your database
- Dump definitions and data
- Do not do a binary dump and load
- Binary data is not converted to the code page of
the database when it is loaded - Always use Data Admin tool
- Goes through automatic conversion
17Migrating a Database to Unicode
Using dump and load 2 of 3
- Create an empty UTF-8 database
- Data Administration tool
- DatabasegtCreate Database
- Create Database dialog
- Select radio set to create a copy of some other
database - Select an empty database from prolang/UTF-8
- For example empty4.db
18Migrating a Database to Unicode
Using dump and load 3 of 3
- Load the Definitions
- Load will convert to UTF-8 automatically
- Load the Data
- Data will be automatically converted to UTF-8
from the dumped code page when it is loaded
19Migrating a Database to Unicode
Converting without a dump and load
- Backup your database
- Use proutil to convert the database
- Load the UTF-8 collation table
- prolang/UTF/_tran.df
- Assign a word break rules to the database
- Rebuild the indexes
proutil ltdb-namegt -C convchar convert UTF-8
proutil ltdb-namegt -C idxbuild
20Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
21Benefits of GUI Unicode Client
Added in OpenEdge 10.0A release
- Multi-lingual
- Able to use data from multiple languages in the
same session - Fully enables AppBuilder to build multilingual
UTF-8 applications - Easier deployment
- Lower costs, higher ROI
- No need to have different configurations using
specific settings per language - Increased competitive advantage
- No (or very few changes) required to existing
apps to take advantage of GUI Unicode client
22Unicode Editor
- RichEdit editor in OpenEdge 10
- Supports Unicode
- Selecting an editor
- Modify UseSourceEditor in progress.ini
- Default SlickEdit
- UseSourceEditoryes
- For Unicode use RichEdit
- UseSourceEditorno
23Demonstration
GUI Unicode Client
Multiple Languages
24Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
25Linguistic Sorting
The goal
- Language sensitive collations
- Tailor to expectations of locale
- Language
- Country
- Easy to use
- Functions just like any other collation for 4GL
26Unicode Sorting
- OpenEdge 10.0A supports binary sorting
- Basic collation support
- Sorts by value in code page
- Possible to do user defined sorting
- OpenEdge 10.0B also supports linguistic sorting
- Supports ICU collations
- International Components for Unicode
- OpenEdge does not support multiple collations in
the database
27Binary versus Linguistic Sorting -A Visual
Linguistic Sort
Binary Sort
- beet
- carrot
- entry
- trust
- zoom
- école
- çedilla
- beet
- carrot
- çedilla
- école
- entry
- trust
- zoom
English (ICU-en)
28Linguistic Sorting
- Progress uses collations for
- -cpcoll session startup parameter
- Database collation
- Collation of database CLOB column
- Argument to
- COMPARE function
- COLLATE option of the BY phrase
29Linguistic SortingSupported Collations
- OpenEdge supports all ICU collations in the
icui18n library - Beyond icui18n one additional collation is
supported - Japanese Hiragana Quaternary as
case-sensitive
30Linguistic Sorting
- 4GL Usage - Reference collation by name
- For example ICU-fr for French
- Specify using
- -cpcoll lttable namegt
- Identifies collation table to use with code page
in memory at session startup - lttable namegt is the collation table in convmap.cp
or the name of the ICU collation - 4GL Statements
- COMPARE
- COLLATE
31Linguistic Sorting
Sort order depends on selected collation
/ French collation / DISPLAY ICU-fr
COMPARE("côte", "lt", "coté", "case-insensitive"
, "ICU-fr") / Spanish collation / DISPLAY
ICU-es COMPARE("côte", "lt", "coté",
"case-insensitive", "ICU-es")
- Output of above statements
ICU-fr yes ICU-es no
32Linguistic SortingExamples 1 of 4
- Examples
- UTF-8 database with basic collation
- Names beet, carrot, çedilla, entry, école, zoom,
trust
FOR EACH words WHERE name lt t DISPLAY
name. END.
beet carrot entry
33Linguistic SortingExamples 2 of 4
FOR EACH words WHERE name gt t DISPLAY
name. END.
trust zoom école çedilla
34Linguistic SortingExamples 3 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) DISPLAY
name. END.
beet carrot entry école çedilla
35Linguistic SortingExamples 4 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) BY
COLLATE(name, case-insensitive,
ICU-en) DISPLAY name. END.
beet carrot çedilla école entry
36Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
37Unicode Normalization
- Why is this needed?
- Puts in NCF format as expected by XML (and
other W3C entities) - Best way to convert from Unicode to other code
pages - Useful when doing tasks such as making
comparisons
38Unicode Normalization
What is normalization?
- Unicode has different ways of expressing the same
characters - Base letter plus combining marks (accents) as two
Unicode code points - Á composite (composed)
- (U0041, Latin Capital Letter A)
- (U0301, Combining Acute Accent )
- Base letter and accents as one Unicode code point
- Á precomposed
- (U00C1, Latin Capital Letter A with Acute)
39Unicode Normalization
- NORMALIZE
- 4GL function new in OpenEdge 10.0B
- Returns either CHAR or LONGCHAR
- Matches the source string
- CHAR variable must be UTF-8
- LONGCHAR variable any form of Unicode
- UTF-8, UTF-16, UTF-32
result-string NORMALIZE(source-string,
normalization-mode)
40Normalization Modes Supported
- NFD
- Canonical Decomposition
- NFC
- Canonical Decomposition, followed by Canonical
Composition - NFKD
- Compatibility Decomposition
- NFKC
- Compatibility Decomposition, followed by
Canonical Composition - None
- No change to source string
- Turns off normalization when normalization-mode
is a variable
41Agenda - Implementing Unicode
- Essentials
- Migrating a Database
- Unicode Client
- Sorting
- Normalization
- Other Areas to Consider
42Bidi Support
- Bi-directional (bidi)
- Behavior of individual widgets and/or the
complete window to go from right to left or left
to right - Supported
- Fill-in widget
- Can type right to left of left to right
- Not-Supported
- Whole frame
- Cannot switch labels from left side to right side
43GB18030 Code Page SupportAdded in OpenEdge 10.0B
- New Chinese code page
- Required for all new software sold in mainland
China as of Jan. 1, 2001
44Broaden Your Potential Customer Base Using Unicode
In summary
- Unicode is the best way to support multiple
languages - A number of recent OpenEdge enhancements
facilitate Unicode - OpenEdge tools simplify the task
45Documentation
- OpenEdge Development
- Internationalizing Applications
46Unicode Resources
- Unicode Home page
- http//www.unicode.org
- Unicode Standard, Unicode Consortium
- International Components for Unicode
- http//www-124.ibm.com/icu/docs/
- http//www-124.ibm.com/icu/docs/papers/forms_of_un
icode/
47System Resources
- Viewing keyboard layouts
- http//www.microsoft.com/globaldev/reference/keybo
ards.aspx - Select the language and the keyboard layout is
displayed - Use shift key to toggle to lower/upper case
characters - Use MS Internet Explorer to display
48Questions?
49Thank you for your time!
50(No Transcript)