ARCH-12 Broaden Your Potential Customer Base Using Unicode - PowerPoint PPT Presentation

About This Presentation
Title:

ARCH-12 Broaden Your Potential Customer Base Using Unicode

Description:

Progress Exchange 2005 5 - 8 June, 2005 Orlando, Florida USA. ARCH-12 ... Puts in NCF format as expected by XML (and other W3C entities) ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 51
Provided by: DavidL250
Category:

less

Transcript and Presenter's Notes

Title: ARCH-12 Broaden Your Potential Customer Base Using Unicode


1
ARCH-12Broaden Your Potential Customer Base
Using Unicode
  • David Lund
  • Sr. Training Program Manager, Progress

2
Broaden Your Potential Customer Base Using Unicode
  • Unicode is the best way to support multiple
    languages
  • A number of recent OpenEdge enhancements
    facilitate Unicode
  • OpenEdge tools simplify the task

3
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

4
Unicode Essentials
  • Unicode foundation for creating internationalized
    and localized applications
  • Unicode provides a unique number for every
    character
  • Lossless round tripping
  • Mapping from any Unicode coded character sequence
    S to a sequence of bytes and back will produce S
    again

5
Unicode Essentials
  • UTF Unicode Transformation Format
  • Algorithm for mapping (encoding) Unicode scalar
    value to a unique sequence
  • 3 formats (mappings)
  • UTF-8, UTF-16, UTF-32
  • Formats vary in how they handle mapping
  • Impacts access, storage, and performance

6
Unicode Essentials
  • Code Page table that assigns a numeric value
  • Letters, numbers, punctuation, control codes,
    etc.
  • prolang\list-cp.p lists code pages in convmap
  • Sample Code Page IBM850 (partial)
  • Character 2 is hex 32

7
Progress I18N Essentials
I18N (Internationalization)
  • Undefined code page
  • Tells Progress not to do any conversions when
    reading or writing data
  • For example
  • Sports database uses undefined
  • Can be used with any character set

8
Progress I18N Essentials
I18N (Internationalization)
  • Startup parameters
  • -cpinternal
  • Code page used for internal data processing
  • -cpstream
  • Code page used for stream files
  • Parameter file prolang\UTF\UTF-8.pf

-cpinternal utf-8 -cpstream utf-8
9
Progress I18N Essentials
  • Performing code page conversions
  • Progress provides a character set management
    facility
  • Automatically converts data between the code
    pages of different data sources and targets
  • Must be in CONVMAP file
  • Targets for code page conversion
  • Memory (-cpinternal)
  • Streams (-cpstream)
  • Databases

10
Progress I18N Essentials
Referenced code pages must be in CONVMAP
  • Modifying CONVMAP
  • Edit convmap.dat
  • Compile CONVMAP
  • Make convmap.cp available to session
  • Progress installation directory
  • PROCONV environment variable
  • -convmap startup parameter

proutil ltdbnamegt C CODEPAGE-COMPILER
convmap.dat convmap.cp
11
Progress I18N Essentials
  • Converting characters or strings in memory
  • Specify code page in functions
  • ASC
  • CHR
  • CODEPAGE-CONVERT
  • Converting input and output data
  • Specify code page in statements
  • INPUT FROM (input source to memory target)
  • OUTPUT TO (memory source to output target)

12
Fonts for Unicode
  • Locating fonts on windows
  • C\WINDOWS\Fonts
  • Control Panel, select Font icon
  • Unicode fonts may need to be purchased
  • Setting Unicode fonts for Progress
  • Progress.ini
  • Use ini2reg.exe to place in registry

13
System Resources
14
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

15
Migrating a Database to Unicode
  • Two ways to migrate database to Unicode
  • Dump and Load
  • Converting the database without doing a dump and
    load
  • Start an OpenEdge session
  • Use startup parameters
  • -cpinternal UTF-8
  • -cpstream UTF-8

16
Migrating a Database to UnicodeCautions
Using dump and load 1 of 3
  • Backup your database
  • Dump definitions and data
  • Do not do a binary dump and load
  • Binary data is not converted to the code page of
    the database when it is loaded
  • Always use Data Admin tool
  • Goes through automatic conversion

17
Migrating a Database to Unicode
Using dump and load 2 of 3
  • Create an empty UTF-8 database
  • Data Administration tool
  • DatabasegtCreate Database
  • Create Database dialog
  • Select radio set to create a copy of some other
    database
  • Select an empty database from prolang/UTF-8
  • For example empty4.db

18
Migrating a Database to Unicode
Using dump and load 3 of 3
  • Load the Definitions
  • Load will convert to UTF-8 automatically
  • Load the Data
  • Data will be automatically converted to UTF-8
    from the dumped code page when it is loaded

19
Migrating a Database to Unicode
Converting without a dump and load
  • Backup your database
  • Use proutil to convert the database
  • Load the UTF-8 collation table
  • prolang/UTF/_tran.df
  • Assign a word break rules to the database
  • Rebuild the indexes

proutil ltdb-namegt -C convchar convert UTF-8
proutil ltdb-namegt -C idxbuild
20
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

21
Benefits of GUI Unicode Client
Added in OpenEdge 10.0A release
  • Multi-lingual
  • Able to use data from multiple languages in the
    same session
  • Fully enables AppBuilder to build multilingual
    UTF-8 applications
  • Easier deployment
  • Lower costs, higher ROI
  • No need to have different configurations using
    specific settings per language
  • Increased competitive advantage
  • No (or very few changes) required to existing
    apps to take advantage of GUI Unicode client

22
Unicode Editor
  • RichEdit editor in OpenEdge 10
  • Supports Unicode
  • Selecting an editor
  • Modify UseSourceEditor in progress.ini
  • Default SlickEdit
  • UseSourceEditoryes
  • For Unicode use RichEdit
  • UseSourceEditorno

23
Demonstration
GUI Unicode Client
Multiple Languages
24
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

25
Linguistic Sorting
The goal
  • Language sensitive collations
  • Tailor to expectations of locale
  • Language
  • Country
  • Easy to use
  • Functions just like any other collation for 4GL

26
Unicode Sorting
  • OpenEdge 10.0A supports binary sorting
  • Basic collation support
  • Sorts by value in code page
  • Possible to do user defined sorting
  • OpenEdge 10.0B also supports linguistic sorting
  • Supports ICU collations
  • International Components for Unicode
  • OpenEdge does not support multiple collations in
    the database

27
Binary versus Linguistic Sorting -A Visual
Linguistic Sort
Binary Sort
  • beet
  • carrot
  • entry
  • trust
  • zoom
  • école
  • çedilla
  • beet
  • carrot
  • çedilla
  • école
  • entry
  • trust
  • zoom

English (ICU-en)
28
Linguistic Sorting
  • Progress uses collations for
  • -cpcoll session startup parameter
  • Database collation
  • Collation of database CLOB column
  • Argument to
  • COMPARE function
  • COLLATE option of the BY phrase

29
Linguistic SortingSupported Collations
  • OpenEdge supports all ICU collations in the
    icui18n library
  • Beyond icui18n one additional collation is
    supported
  • Japanese Hiragana Quaternary as
    case-sensitive

30
Linguistic Sorting
  • 4GL Usage - Reference collation by name
  • For example ICU-fr for French
  • Specify using
  • -cpcoll lttable namegt
  • Identifies collation table to use with code page
    in memory at session startup
  • lttable namegt is the collation table in convmap.cp
    or the name of the ICU collation
  • 4GL Statements
  • COMPARE
  • COLLATE

31
Linguistic Sorting
Sort order depends on selected collation
/ French collation / DISPLAY ICU-fr
COMPARE("côte", "lt", "coté", "case-insensitive"
, "ICU-fr") / Spanish collation / DISPLAY
ICU-es COMPARE("côte", "lt", "coté",
"case-insensitive", "ICU-es")
  • Output of above statements

ICU-fr yes ICU-es no
32
Linguistic SortingExamples 1 of 4
  • Examples
  • UTF-8 database with basic collation
  • Names beet, carrot, çedilla, entry, école, zoom,
    trust

FOR EACH words WHERE name lt t DISPLAY
name. END.
  • Output result

beet carrot entry
33
Linguistic SortingExamples 2 of 4
FOR EACH words WHERE name gt t DISPLAY
name. END.
  • Output result

trust zoom école çedilla
34
Linguistic SortingExamples 3 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) DISPLAY
name. END.
  • Output result

beet carrot entry école çedilla
35
Linguistic SortingExamples 4 of 4
FOR EACH words WHERE COMPARE(name lt
t,case-insensitive, ICU-en) BY
COLLATE(name, case-insensitive,
ICU-en) DISPLAY name. END.
  • Output result

beet carrot çedilla école entry
36
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

37
Unicode Normalization
  • Why is this needed?
  • Puts in NCF format as expected by XML (and
    other W3C entities)
  • Best way to convert from Unicode to other code
    pages
  • Useful when doing tasks such as making
    comparisons

38
Unicode Normalization
What is normalization?
  • Unicode has different ways of expressing the same
    characters
  • Base letter plus combining marks (accents) as two
    Unicode code points
  • Á composite (composed)
  • (U0041, Latin Capital Letter A)
  • (U0301, Combining Acute Accent )
  • Base letter and accents as one Unicode code point
  • Á precomposed
  • (U00C1, Latin Capital Letter A with Acute)

39
Unicode Normalization
  • NORMALIZE
  • 4GL function new in OpenEdge 10.0B
  • Returns either CHAR or LONGCHAR
  • Matches the source string
  • CHAR variable must be UTF-8
  • LONGCHAR variable any form of Unicode
  • UTF-8, UTF-16, UTF-32

result-string NORMALIZE(source-string,
normalization-mode)
40
Normalization Modes Supported
  • NFD
  • Canonical Decomposition
  • NFC
  • Canonical Decomposition, followed by Canonical
    Composition
  • NFKD
  • Compatibility Decomposition
  • NFKC
  • Compatibility Decomposition, followed by
    Canonical Composition
  • None
  • No change to source string
  • Turns off normalization when normalization-mode
    is a variable

41
Agenda - Implementing Unicode
  • Essentials
  • Migrating a Database
  • Unicode Client
  • Sorting
  • Normalization
  • Other Areas to Consider

42
Bidi Support
  • Bi-directional (bidi)
  • Behavior of individual widgets and/or the
    complete window to go from right to left or left
    to right
  • Supported
  • Fill-in widget
  • Can type right to left of left to right
  • Not-Supported
  • Whole frame
  • Cannot switch labels from left side to right side

43
GB18030 Code Page SupportAdded in OpenEdge 10.0B
  • New Chinese code page
  • Required for all new software sold in mainland
    China as of Jan. 1, 2001

44
Broaden Your Potential Customer Base Using Unicode
In summary
  • Unicode is the best way to support multiple
    languages
  • A number of recent OpenEdge enhancements
    facilitate Unicode
  • OpenEdge tools simplify the task

45
Documentation
  • OpenEdge Development
  • Internationalizing Applications

46
Unicode Resources
  • Unicode Home page
  • http//www.unicode.org
  • Unicode Standard, Unicode Consortium
  • International Components for Unicode
  • http//www-124.ibm.com/icu/docs/
  • http//www-124.ibm.com/icu/docs/papers/forms_of_un
    icode/

47
System Resources
  • Viewing keyboard layouts
  • http//www.microsoft.com/globaldev/reference/keybo
    ards.aspx
  • Select the language and the keyboard layout is
    displayed
  • Use shift key to toggle to lower/upper case
    characters
  • Use MS Internet Explorer to display

48
Questions?
49
Thank you for your time!
50
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com