Knowledge Transfer - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge Transfer

Description:

Lessons from Double-Byte Experiments. EUC-KR: 4.1 server works ... String handling need not be double-byte aware, if ASCII always means ASCII. Solution: UTF-8! ... – PowerPoint PPT presentation

Number of Views:247
Avg rating:3.0/5.0
Slides: 23
Provided by: docum8
Learn more at: http://unicode.org
Category:

less

Transcript and Presenter's Notes

Title: Knowledge Transfer


1
18th International Unicode Conference
Documentum and UTF-8 Converting Content
Management Software Product Line to Unicode
27 April 2001 Donald Ziff
2
Agenda
  • What is Documentum?
  • Documentums I18N Problem
  • How Unicode UTF-8 Saved the Day
  • Other Success Factors
  • Demo

Documentum Proprietary and Confidential
3
About Documentum
  • Documentum NASDAQ DCTM
  • The Leader in Web and Enterprise Content
    Management Solutions
  • gt 128M in revenue 1999. gt 800 employees.
  • Over 900 Global 2000 customers with strong
    vertical focus
  • Over 25 Offices in 10 countries

4
DCTMs I18N Problem
  • Everyone agrees we need I18N to fuel growth
    especially in Asia
  • Asian-certified product much more important than
    multi-lingual
  • Although demand for multi-lingual is growing
  • So why not I18N?

5
I18N Perception Problems
  • Too Difficult wont fit into a development
    cycle
  • Too much Overhead multiplies QA and Support
  • Not Sexy no new functionality
  • Lets look at these problems

6
I18N is too difficult
  • Product Layers
  • Server (built on RDBMS Verity)
  • DMCL Client Library (C)
  • DFC Foundation Classes (Java)
  • DTC Desktop Client Win32 end-user client
  • WDK Web Development Kit
  • RightSite Legacy Web-Server Integration
  • Web Publisher Web Content Management App
  • Legacy clients Workspace (Win32), Intranet

7
History Lesson
  • Server v3.1.6.INT, created by consultants for
    Japanese market, was expensive and time-consuming
  • 3.1.6.INT attempted to internationalize all the
    layers in the DCTM architecture at once
  • 4.0 was released without I18N changes
  • 4.1 followed, the deltas from 3.1.6 to 3.1.6.INT
    became hard to apply

8
I18N requires too much overhead
  • The DCTM server requires pharmaceutical-strength
    certification
  • Dimensions of certifications
  • 3 RDBMS platforms Oracle, Sybase, SQL-Server
  • 4 Server OSs NT, Solaris, HPUX, AIX
  • The 3.1.6.INT architecture introduced new
    dimensions, leading us to

9
Certification Hell!
  • New certification dimensions
  • 5 DCTM Server code-pages
  • 5 RDBMS code-pages
  • Market requires another dimension
  • 5 Server OS Localizations
  • 125 new times 12 old ? 1500 certs!
  • Exaggeration, of course But still

10
I18N not sexy
  • DCTM is a growth company, needs sizzle as well as
    steak
  • I18N grows markets, but doesnt add much to
    marketing message
  • To be fair new functionality is not just sexy
    it is essential to DCTMs continued survival
  • Other priorities will move to the top

11
DCTMs I18N Requirements
  • Crucial need support Asia from the main
    code-line. One binary for the world
  • Backward compatibility essential
  • Multi-lingual features would be a side-benefit.
    High on the wish list for a few key customers
  • I18N project must be scoped down to be achievable

12
How UTF-8 Saved the Day
  • UTF-8 moves safely through the server because
    anything that looks like ASCII actually is
  • Standardizing on UTF-8 as the only supported
    internal code-page cuts down certification matrix

13
Lessons from Double-Byte Experiments
  • EUC-KR 4.1 server works (basically)
  • SJIS problems! double-byte characters whose
    second bytes are ASCII \
  • Lessons
  • Non-ASCII moves through the server safely
  • String handling need not be double-byte aware, if
    ASCII always means ASCII
  • Solution UTF-8!

14
UTF-8 ASCII is ASCII
  • No need for special string handling
  • Server 3.1.6.INT replaced all standard c string
    handling with calls to 3rd-party library
  • With UTF-8, we stick with standard yacc and
    other legacy tools work fine
  • Greatly improved perception (and reality) of how
    difficult I18N would be
  • Now, its relatively low-impact

15
Its UTF-8, dummy!
  • Use UTF-8 everywhere, cut down on certification
    dimensions
  • Provides safe character-handling for Asia
  • Even though multi-lingual is not a requirement
  • Easier to support

16
Other Success Factors
  • Rely on RDBMS services to translate between RDBMS
    code-page and UTF-8
  • Market research cut back on OS localization
    constraints
  • Transcoding infrastructure

17
RDBMS transcodes to/from UTF-8
  • Oracle and Sybase transcode automatically SQL
    Server is a problem
  • No need for new transcoding calls between Server
    and RDBMS lower impact
  • Upgrade customers have non-unicode RDBMS no
    need for them to convert
  • One less certification dimension!

18
Cut back on Localized OS certs
  • Limit RDBMS for Asia for 4.2, just Oracle
  • Localized OS certification not necessary for
    Europe

19
Transcoding Infrastructure
  • Server must be aware of interface code-pages
  • Transcoding done at the interfaces
  • 3rd party transcoding used Uniscapes GlobalC

20
New I18N Architecture
Intranet Client
Administrator
Web Publisher
Custom WebApp
Desktop Client
WDK (Unicode)
Rightsite(NCS)
WorkSpace
DFC (Unicode)
ARP(NCS)
Web Cache
( UTF8) DMCL (4.2)
DMCL 4.1 (NCS)
e-Content Server (UTF8)
Legend
National Character Set
Unicode
File System
Verity
RDBMS (Unicode)
21
Demo
  • Demo multilingual WDK
  • If theres time, a quick look at localized
    Desktop Client (Win32 Client)

22
Conclusion
  • UTF-8 was a crucial technology in DCTMs I18N
    strategy
  • Provided an easy path for legacy C
  • Supported specific Asian languages consistently,
    minimizing certifications
  • Prepared infrastructure for multi-lingual
    requirements
Write a Comment
User Comments (0)
About PowerShow.com