Status of Proposed Unicode Changes to the SQL Standard

About This Presentation

Title:

Status of Proposed Unicode Changes to the SQL Standard

Description:

Status of Proposed Unicode Changes to the SQL Standard by Michael G. McKenna / Sybase, Inc. Stefan Buchta, Hirotaka Yoshioka / Oracle Corporation – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 23

Provided by: MikeMc72

Learn more at: http://www.globalisation.org

Category:

more less

Transcript and Presenter's Notes

Title: Status of Proposed Unicode Changes to the SQL Standard

1
Status of Proposed Unicode Changes to the SQL
Standard

by
Michael G. McKenna / Sybase, Inc.
Stefan Buchta, Hirotaka Yoshioka / Oracle
Corporation
v.1.3
March 1999

2
Introduction

SQL Character Set Internationalization is
out-of-date
New accepted standards
Unicode
Posix
WG20
Java
Some changes made to I18N for SQL3
Still under review by ISO and ANSI for future

WWW
XML
Windows/NT
Oracle/Sybase

3
Scope

Concerned Parties
Implementer vendor
Designer / Admin customer
User interface
This is a tool
Discussion catalyst
Proposal for changes to SQL
Suggestion for implementation

4
Implementers

Major Database Companies
Oracle, Sybase, Informix, Ask/Ingres, Adabas,
DB2, Borland, ..., Microsoft
Concerns
Feasibility to implement
Relevance to customer
Migration costs/issues
Future markets

Maintainability
Backward compatibility
Installed customer base
Competitive position

5
Issues with SQL92 and SQL3

Old I18N (pre standards)
non conformant
awkward
not implemented
Character Data/Character Columns
Multiple Character Sets
CREATE CHARACTERSET
Character set introducer

6
X
7
X
8
Issues, continued (2)

SQL Names and LiteralsExample of SQL92/SQL3
literal
SELECT from employeeWHERE name
_iso88591'Müller
Now uses Unicode lexical types for identifiers
SQL_TEXT
Superset of all installed character sets
Ideally, should explicitly be Unicode

9
X
10
Issues, continued (3)

Collation Handling
SQL92 contains features that (almost) allow the
definition of collations
Example
CREATE COLLATION german_dictionary FOR
iso8859_1 FROM (USING(german_default), MODIFY
(A lt Ä, a lt ä, O lt Ö, o lt ö, U lt Ü, u lt
ü, ß ss), WHEN NOT FOUND MAX)
No multi-pass ordering like ISO 14651
Drastically changed for SQL3

11
X
12
X
13
Issues, continued (4)

Text element versus Unicode character (10646
levels, applies to collations)
How long is a character?
Composite Characters/Canonical Equivalence

ñ ? n
14
Issues, continued (5)

Upper-/Lowercase Translations (FOLD)
Example
German Ü lt-gt ü
But German ß has the uppercase equivalent
SS, but not all sequences SS correspond to
ß when returned to lowercase.
FOLD function to use Unicode case-mapping, as of
January 1999

15
Issues, continued (6)

Client Character Encoding through CLI (Locale
negotiation)
MESSAGE TEXT
User Defined Characters (UDC)

16
Proposed Changes to SQL

Synchronize with present standards
Character Handling
Collations
Locales

17
Synchronize with present standards

JTC1/SC22/WG20
Programming Languages and I18N
JTC1/SC2/WG2 ISO 10646-1
Character Set handling
Unicode concepts
JTC1/SC2/ WG3
Single byte character sets
ISO 14651/14652
Standardized collations
Unicode Technical Report

18
Synchronize with present de Facto standards

Java
Unicode String type
RFC 2277
UTF-8 as default internet encoding
XML
Potential Universal data stream
Default encoding is Unicode
ODBC 3.5
Mapping with SQL_WCHAR

19
Gratuitous Animated Grahpics ...
Gratuitous Animated Grahpics ...
20
Character Sets

SQL_TEXT º Unicode
Eliminate introducer for identifiers
\Uxxxx
\\ escape
Keep schema default character set
Add UNICHAR datatype

21
X
22
Character Sets (2)

Surrogate characters
User-defined character mechanism
CREATE UDC ltchar valuegt
FOR ltcharset namegt AS ltunicode binary valuegt
WHERE LEXICAL PROPERTY LIKE ltunicode binary
valuegt
WITH UPPER LOWER ltunicode binary
valuegt

23
X
24
Character Sets (3)

Canonical Equivalence for Identifiers
Entry Level 1 Â ¹ A Â ¹ Â
Intermediate Level 2 Â Â
(Vietnamese, Indic, Arabic)Full Level 3 Â
A Â A

25
Collations

Unicode Consortium Technical Paper 10 for
Universal sorting
Has mechanism for cultural differences,
overlays
Proven in actual implementation (Java, Sybase
internal testing)
Issue No standard cultural variations yetBeing
developed by National Bodies, de-Facto,
TC304/Europe, ISO 14652 Cultural Registries
Map all data to Unicode for collation results

26
Collations (2)