Migration%20of%20a%204GL%20and%20Relational%20Database%20to%20Unicode - PowerPoint PPT Presentation

About This Presentation
Title:

Migration%20of%20a%204GL%20and%20Relational%20Database%20to%20Unicode

Description:

Creates character, GUI, or web-based clients with common source ... Pi ce de R sistance 1998, Progress Software Corporation. 17. Futures ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 19
Provided by: texte5
Learn more at: http://www.unicode.org
Category:

less

Transcript and Presenter's Notes

Title: Migration%20of%20a%204GL%20and%20Relational%20Database%20to%20Unicode


1
Migration of a 4GL and Relational Database to
Unicode
Tex Texin
International Product Manager
2
Presentation Goals
  • Outline Migration Steps
  • Describe Design Considerations
  • Leverage Existing Double-byte Implementation
  • Describe Impact on 4GL and Report Formats

3
PROGRESS Application Development Suite
  • Powerful tools for the rapid creation of
    distributed business applications
  • Creates character, GUI, or web-based clients with
    common source
  • Host-based, client-server, or n-tier distribution
    on variety of platforms
  • Scalable, robust RDBMS and open
  • International, double-byte enabled

4
Possible Configuration Options
GUI Client Client-Server
Web-based Client
Database Server
Progress Database
Optional n-tier Application Server
Host-based Character Client
Other Database
5
Why do our customers need Unicode?
  • Many do not... However,
  • Multinationals deploy across regions with
    incompatible character sets, yet they must share
    data between them.
  • Programs are distributed worldwide with one
    container of text in many languages.
  • Certain applications require multilingual
    databases. E.g. Translation systems and web-based
    applications.

6
The Existing Architecture
  • 1.5M lines of C code
  • 0.3M lines of 4GL code
  • Double-byte enabled
  • CJK, 9 double-byte charsets supported
  • 2-byte only, no 3 or 4-byte
  • No shift-sequenced charsets
  • DBE changes earmarked, easy to find
  • 4 years, 3 developers, 2 QA

7
Estimated cost of implementing UCS-2, was very
big!
  • Changing to 16-bit text units affects almost
    every source module
  • Largest cost is separating char variables based
    on usage for text or binary data.
  • Use 16-bit null terminators, ignore 8-bit
  • A Þ 0041, 0000 Ã Þ 0100, 0000
  • Pointer arithmetic (advance 2 bytes)
  • Sizing (bytes or characters)
  • New API to use new WIDE TEXT datatype

8
Product requirements for a multilingual version
  • Minimize cost for application migration
  • Minimize cost for application upgrade
  • Minimize support cost
  • One executable!
  • Maintain user-definable character sets
  • Add UTF-8 as just another character set
  • UTF-8 algorithms are compatible with other
    charsets

9
Scaled down multilingual proposal UTF-8
implementation
  • Implement UTF-8 as 3-byte character set
  • Leverage extend double-byte enabling
  • Places to change are already earmarked
  • Restrict to composed characters for now
  • Restrict to no surrogates
  • Supports all the markets we are in
  • UTF-8-enable 4GL and RDBMS first
  • Provides multilingual logic and storage
  • Javaother client technologies coming

10
Architecture changesUTF-8-enabling the string
library
  • N-byte enable characterstring functions
  • GetNextChar, GetPreviousChar
  • GetCharacterSize (table-based)
  • Modified IsFirstByte
  • New GetColumnLength
  • New datatype normalized BIG char
  • Minor algorithm changes for efficiency
  • Find Character

11
Architecture changesUTF-8-enabling character
tables
  • String libraries use character tables
  • Alphanumeric, Lead-byte, Tail-byte
  • Upper, lower case (700 characters)
  • New property ColumnCount
  • New table formats
  • Old architecture presumed 256 byte table
  • Now organized by range lists and trie
  • Update table compiler allow hex entry

12
Architecture changesUTF-8-enabling sorting
  • How to sort multilingual data?
  • Binary sort used for double-byte data
  • With UTF-8, Europe is 2-byte, CJK 3-byte
  • Solution
  • Binary sort on server
  • Client uses native sort
  • Bump key length limit for UTF-8
  • Next phase will be enhanced sort

13
Architecture changesCharacter conversion
algorithms
  • Existing, user-definable, conversions
  • Single-byte character set table maps
  • Double-byte Shift-JIS - EUCJIS algorithm
  • New table-driven automated conversions
  • Single-byte to UTF-8, and back
  • Double-byte to UCS-2 and back
  • UTF-8 - UCS-2
  • Trie for speed and memory optimization
  • Requires significant QA for data integrity

14
Architecture changesImpact on the 4GL user
  • 4GL is character set independent
  • Almost all functions are character-based
  • 3 functions require optional byte-basing
  • Length, Substring, Overlay
  • Options Byte, Character
  • Add new option Column
  • Format (Picture) Phrase
  • XXXX has different meaning for UTF-8

15
Status
  • Functioning Well
  • Going to second beta
  • Implemented with very low cost
  • Performance is OK
  • Metrics not yet available
  • Testing is most significant cost
  • Reviewing all character set properties
  • Evaluating all conversions

16
Pièce de Résistance
17
Futures
  • For the Progress International Team
  • Multilingual Clients
  • Enhanced Character Folding
  • Enhanced Sorting
  • For Progress Customers
  • Deployment of multilingual databases
  • Worldwide access to these databases
  • Worldwide deployment of multi-language
    applications

18
Conclusions
  • Migration can be achieved in phases
  • Migration thru UTF-8 can be low cost
  • Double-byte applications can migrate easily to
    UTF-8
  • Asian users can integrate with other languages
    now
  • Non-English users can integrate with Asian
    languages now

19
Any questions?
Write a Comment
User Comments (0)
About PowerShow.com