Title: Unicode and Collation Support in Microsoft SQL Server
1Unicode and Collation Support in Microsoft SQL
Server
- Michael S. Kaplan
- Globalization Infrastructure and Font Technology
- Windows International
- Microsoft
2Unicode Support
- Uses the "N" or national data types from the
SQL-92 specification - NCHAR, NVARCHAR, NTEXT
- What the SQL-99 spec says about Unicode
- Interoperability with other clients
3Collation in SQL Server lt 6.5
- No Unicode support at all
- One code page per server
- One collation per server
- No good solution for multilingual support
4Collation in SQL Server 7.0
- Unicode datatypes supported
- Two collations
- Unicode
- Non-Unicode
- Number of collations distilled down to the
minimum necessary
57.0 flattening of collations
- Example the General Unicode sort order handles
Afrikaans, Albanian, Arabic, Basque, Belarusian,
Bulgarian, English, Faeroese, Farsi, Georgian
(Traditional), Greek, Hebrew, Hindi, Indonesian,
Malay, Russian, Serbian, Swahili, and Urdu
6OS independence
- Collation independent of operating system
- Based on the Jet Unicorn DLLs
7SQL Language Support(limited locale information)
- Messages
- Date/Time
- First Day of Week
- Currency and currency symbols
- Month/day names and abbreviated month names
8SQL Language Support(list of languages)
- Arabic
- British English
- Brazilian
- Bulgarian
- Simplified Chinese
- Traditional Chinese
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Finnish
- French
- German
- Greek
- Hungarian
- Italian
- Japanese
- Korean
- Latvian
- Lithuanian
- Norwegian
- Polish
- Portuguese
- Romanian
- Russian
- Slovak
- Slovenian
- Spanish
- Swedish
- Thai
- Turkish
9Getting at the list of languages
- sp_helplanguage stored procedure
- syslanguages/sysmessages tables
- SET LANGUAGE
- SET LANGUAGE cetina
- SET LANGUAGE ???
- Each language has a langid (0 32)
10Collation in SQL Server 2000
- Combined code pages and collations into a single
entity
11"Windows" collations
- Added for unique code pages(Example Arabic)
- Added for unique ordering (Example French)
- Removed for identical ordering(Example
Finnish_Swedish)
1243 Windows Collations
- Albanian
- Arabic
- Chinese_PRC
- Chinese_PRC_Stroke
- Chinese_Taiwan_Bopomofo
- Chinese_Taiwan_Stroke
- Cyrillic_General
- Croatian
- Czech
- Danish_Norwegian
- Estonian
- Finnish_Swedish
- French
- Georgian_Modern_sort
- German_PhoneBook
- Greek
- Hebrew
- Hindi
- Hungarian
- Japanese_Unicode
- Korean_Wansung
- Korean_Wansung_Unicode
- Latin1_General
- Latvian
- Lithuanian
- Lithuanian_Classic
- FYRO Macedonian
- Spanish (Spain)
- Polish
- Romanian
- Slovak
- Slovenian
- Thai
- Traditional_Spanish
- Turkish
- Ukrainian
- Vietnamese
13Windows collations, continued
- Suffix meanings
- _BIN (Binary)
- _CI/_CS (Case sensitivity)
- _AI/_AS (Accent sensitivity)
- _KS - kanatype sensitivity (hiragana/katakana)
- _WS - width sensitivity (full/half width)
14SQL Collations
- Provided for backwards compatibility with prior
versions of SQL Server
15SQL Collations
- SQL_1xCompat_CP850
- SQL_Estonian_CP1257
- SQL_Latin1_General_Pref_CP437
- SQL_AltDiction_CP1253
- SQL_Hungarian_CP1250
- SQL_Latin1_General_Pref_CP850
- SQL_AltDiction_CP850
- SQL_Icelandic_Pref_CP1
- SQL_Latvian_CP1257
- SQL_AltDiction_Pref_CP850
- SQL_Latin1_General_CP1
- SQL_Lithuanian_CP1257
- SQL_Croatian_CP1250
- SQL_Latin1_General_CP1250
- SQL_MixDiction_CP1253
- SQL_Czech_CP1250
- SQL_Latin1_General_CP1251
- SQL_Polish_CP1250
- SQL_Danish_Pref_CP1
- SQL_Latin1_General_CP1253
- SQL_Romanian_CP1250
- SQL_EBCDIC037_CP1
- SQL_Latin1_General_CP1254
- SQL_Scandinavian_CP850
- SQL_EBCDIC273_CP1
- SQL_Latin1_General_CP1255
- SQL_Scandinavian_Pref_CP850
- SQL_EBCDIC277_CP1
- SQL_Latin1_General_CP1256
- SQL_Slovak_CP1250
- SQL_EBCDIC278_CP1
- SQL_Latin1_General_CP1257
- SQL_Slovenian_CP1250
- SQL_EBCDIC280_CP1
- SQL_Latin1_General_CP437
- SQL_SwedishPhone_Pref_CP1
- SQL_EBCDIC284_CP1
- SQL_Latin1_General_CP850
- SQL_SwedishStd_Pref_CP1
- SQL_EBCDIC285_CP1
- SQL_Latin1_General_Pref_CP1
- SQL_Ukrainian_CP1251
- SQL_AltDiction_CP1253
- SQL_Hungarian_CP1250
- SQL_Latin1_General_Pref_CP850
16Collation at four levels
- Server
- Database
- Column
- Expression
17At the server level
- Acts as a default for all databases
- Can be changed with RebuildM.exe in the
tools\BINN dir - Querying the server collation
- SELECT CONVERT(char, SERVERPROPERTY('collation'))
18At the database level
- Every database has a collation (default is the
server collation) - Collation can be changed under some circumstances
19At the column level
- Overrides database level collation
- Specifies code page for non-Unicode columns
- Again, can be changed under some circumstances
- No multilingual columns with separate collations
20At the expression level
- Can be used to override any other collation
- uses the COLLATE keyword
21Metadata in System Tables
- All stored as Unicode no matter what the database
collation is - Unicode 2.0 repertoire is used for identifiers
(use brackets or quotes around anything else)
22More on the COLLATE keyword
- COLLATE ltWindows_Collation_namegtltSQL_Collation_N
ame - Specific rules of precedence
- Explicit (two explicits runtime error)
- Implicit (two implicits no collation)
- Default
- ltno collationgt
23Limitations
- Features people will want for future versions
- LCID --gt Collation
- ISO string lt--gt Collation
- Creating custom collations?
24References
- http//microsoft.com/globaldev/
- International Features in Microsoft SQL Server
2000(by Michael Kaplan) at http//msdn.microsoft
.com/
25Questions?
26Dont Forget Your Evaluations!
- Unicode and Collation Support
- in Microsoft SQL Server