Title: Going Global: Publishing in Asia with Dynatext and Dynaweb
1Going Global Publishing in Asia with Dynatext
and Dynaweb
Your Logo Here
- Jason Olson
- Basis Technology Corporation
2Motivation Being Multilingual
Motivation
- Target Population
- Market Importance
3Target Population
Motivation
- 90 Million people speak Korean as their first
language - 1.5 Billion people speak Chinese as their first
language
4Why are these Markets Important?
Motivation
- Significant Growth
- Your multinational customers are in these markets
- Your competition is in these markets
5Challenges of Asian Languages
Challenges
- Unfamiliar characters, and a lot of them
- Encoding issues - Multiple encoding schemes
- SGML issues
- Testing
- Dynatext and Dynaweb
6What are these characters?
Challenges Characters
- Fear of the unknown (its all Greek to me)
7What is an Encoding?
Challenges Encoding
- A mapping between the characters of a script and
a sequence of numbers. - ASCII EBCDIC Basic Latin, numerals, and
symbols - ISO 8859-1 ( Latin-1) Extended Latin (Western
Europe) - ISO 8859-2 ( Latin-2) Extended Latin (Eastern
Europe) - ISO 8859-5 Latin, Cyrillic
- ISO 8859-6 Latin, Arabic
- ISO 8859-7 Latin, Greek
- JIS X 0208-1990 Latin, Greek, Cyrillic,
Hiragana, Katakana, Kanji - KSC 5601-1987 Latin, Greek, Cyrillic, Hiragana,
Katakana, Hangul, Hanja - GB 2312-80 (CP 936) Latin, Greek, Cyrillic,
Hiragana, Katakana, Hanzi
8Multiple Encoding Schemes
Challenges Encoding
- Korean
- KSC5601
- Chinese
- GB2312
- BIG5
- CP950
- Japanese
- Shift-JIS
- JIS
- ISO-2022JP
9SBCS vs. DBCS vs. MBCS
Challenges Encoding
- Single Byte per Character
- ASCII, ISO8859-1-10
- Two (Double) Bytes per Character
- Shift-JIS, EUC-KR, GB2312, Big5
- Multiple Bytes per Character
- EUC-JP, ISO-2022, UTF-8
- Lead-byte Trail-byte processing
10Testing
Challenges Testing
11SGML Issues
Challenges SGML
- Getting your SGML translated
- Trail-byte gotchas
- Style Sheet Restrictions
12Trail-byte Gotchas
Challenges SGML
Overlap between ASCII and Trail Byte Range
13Style Sheet Restrictions
Challenges SGML
- Unrecognized Characters / Symbols
- Encoding notation
14The Solution
Solution
- Technical Side
- PLS Architecture
- Rosette
- KCCPLS
- Application Side
- Getting Translated Docs
- Using the PLS Correctly
- Testing
15PLS Architecture
Solution Technical Side
- INSO and Basis Tech jointly designed the PLS
architecture - PLM - Does Language Specific Processing
- word-breaking
- date formats
- printing
- ECM - Does Encoding Conversions (Unicode / Legacy)
16Rosette Library for Unicode Handling
Solution Technical Side
- Provides Encoding Conversions
- Unicode / Legacy Encodings
- Provides Character Classifications and Transforms
- Cross Platform
- WinNT, Win95, Macintosh, Unix, Mainframe
17Marriage of PLS and Rosette
Solution Technical Side
- The creation of the KCCPLS
- Internals are completely Unicode
- ECM directly uses Rosette encoding conversions
- PLM is built upon Rosette character
classification
18The Result The KCC PLS
Solution Technical Side
- The KCCPLS is a simple addition to your extant
Dynaweb or Dynatext installation - You can correctly process and display Korean,
Simplified Chinese, Traditional Chinese
19What does it mean?
Solution Technical Side
20Future PLS
Solution Technical Side
- On the fly conversions between TC and SC
- Source data in Unicode - Multilingual Documents
21Getting Translated Docs
Solution Application Side
- Identify materials to be translated
- Select a reputable Localization vendor with a
good track record - Prepare an English Glossary
- Translate glossary into target language
- Translation and editing
- Charlotte
22Using the PLS with Dynaweb
Solution Application Side
- Run the Installer
- InstallShield on Windows
- install script on Unix
- Make sure your pls.map has a locale, character
set and language definition that matches your
books
23Using the PLS with Dynaweb
Solution Application Side
24Using the PLS with Dynaweb
Solution Application Side
25Using the PLS with Dynaweb
Solution Application Side
- Setup of the Browser
- Proper fonts on system
- Set encoding correctly
26Using the PLS with Dynatext
Solution Application Side
- Run the Installer
- Check pls.map
- Change font declarations in Stylesheets
- Have proper language support
- Native language version of Windows
- English Windows with Language Packs
- Unix with Locales
27Testing
Solution Application Side
28References
References
- CJKV Information Processing
- Ken Lunde OReilly Press ISBN 1-56592-224-7
- Developing International Software
- Nadine Kano Microsoft Press ISBN 1-55615-840-8
- The Unicode Standard, Version 2.0
- The Unicode Consortium ISBN 0-201-48345-9
- unicode.basistech.com
29Get more information
30Other Products from Basis Technology
- Japanese Morphological Analyzer
- Chinese Morphological Analyzer
- Chinese Script Converter