Title: IDN: Technology, Status, Overview, and Directions
1IDN Technology, Status, Overview, and Directions
- John C KLENSIN, Ph.D.
- APT Workshop on ENUM and IDN
- August 2003
2Internationalized Domain Names (IDN)
- Term used in many ways
- Strictly, domain name labels that represent names
containing non-host name characters. - Only host name (or LDH) strings are actually
entered into the DNS. - Sometimes, IDN is used to refer to a
fully-qualified domain name that contains at
least one non-LDN label/ - Sometimes used to refer to other ways of
internationalization or localization - Keywords,
- Special searching or directory mechanisms, etc.
3Internationalization and Users
- Users typically do not want internationalization
(or multilingual capability) but - Systems that are localized adapted to their
particular - Language
- Writing system and character codes
- Location
- Interests
- Internationalization is
- A means to localization
- Necessary given the global nature of the Internet
4Internationalization and the Internet
- Consideration given to international characters
in the 1970s - Character set standards werent ready
- Project that led to MIME
- multimedia email capability
- initiated largely to standardize and permit
non-ASCII characters - Web
- Recognized requirement early
- Details only for Western European languages until
mid-90s - All were done by tagging
- Tagging is consistent with localization approaches
5DNS Internationalization
- Tension between
- Network-facing identifier
- User-facing name (of a company, product,
organization,) - Constraints on solutions
- Short label strings no reasonable way to tag
- Uniqueness of names
- Potential for confusion or fraud
- Requirement for non-ASCII names is clear but
- Caution is in order many possible traps and
risks - Hard to go back if too permissive
6Why Look at this Now
- Many opportunities for confusion
- Some national regulation may be in order.
- Some real technical constraints
- Assuming the DNS is like something else and
proceeding on that basis can be problematic - Good tutorial in forthcoming US National Academy
of Engineering/ National Research Council report - Much better to think through things now than to
try to undo or redo later. - Easier and safer to adopt narrow rules and then
expand as understanding grows than to try to
restrict what was previously permitted. - Advice Permit only what you fully understand and
need.
7IETF Encoding Standards or Local Variations
- Utility and spread of the Internet has depended
critically on - End to end connectivity
- Internet hosts can reach each other or understand
why not - DNS integrity
- Any DNS reference means the same thing,
worldwide - There is huge flexibility within the existing
standards for per-zone (local/national) policies
and decisions. - A country that adopts its own protocols or DNS
string interpretations is likely to isolate its
businesses and users from global connectivity and
global markets.
8The History of the LDH Name
- Concerns in the 1970s about user confusion and
transcription from non-computer forms - Eliminating, where possible, characters that
could be confused when written - Hence
- Case-insensitive
- Prohibition of _ could be confused with -
- Prohibition of national use character positions
- Resulted in host name rules letters, digits,
and hyphen - Host name rules are about
- What can be registered in a zone
- Applications restrictions
- Ultimately not the DNS technology, which can
store binary strings with few restrictions.
9Why Internationalize Domain Names
- Important concern about people using their own
languages and characters - Use of domain names in interfaces by end users,
not just as system/network identifiers.
10Representing Unicode/ ISO10646
- No tagging equals no national character sets
- Unlike applications (such as the web), no room in
DNS for character set tagging, so a
comprehensive, universal character set UCS--
is a requirement - More characters, mixing scripts
- Many opportunities for problems from look-alikes
that were not present in ASCII alone - Ambiguities about
- Scripts
- Case-matching
- Unification
11Applications International Characters
- Most Internet application protocols defined for
ASCII, or at least seven-bit characters - Often not an accident or ignorance consider use
of IA4 and IA5 in many ITU Recommendations - Waiting for applications to be upgraded could
- Be a long wait
- Involve some unpredictability with sender not
knowing receiver capabilities - Plug-ins and patches do not yield a consistent
user experience
12The IETF IDNA Standard
- Internationalizing Domain Names in Applications
- Some mappings within Unicode
- Normalization of different ways to represent some
characters - Mapping of some similar or identical characters
- Some case-mappings
- Some forbidden characters/ code points
- But many issues not addressed
- Encoding Unicode characters into LDH form for the
DNS - The xn string
- Applications and character representations.
13Current Status IETF
- IDNA complete and awaiting more implementation
and user experience - General recognition that additional registration
restrictions are needed but - IETF is not going to specify it
- Unlike LDH, seen as a per-zone problem
- EPP Registrar-Registry protocol
- Can accommodate internationalized names
- Some registry decisions about extensions
registrars may not be able to use same techniques
with different registries - Preliminary efforts underway (no working groups
yet) on - Fully-internationalized URIs (IRIs)
- Email addresses
14Technology Developments
- Several browser plug-ins
- No known implementations in widely-available
general-purpose browsers or other applications
yet. - DNS diagnostic tools (nslookup, dig, etc) not yet
upgraded to permit entry/display of Unicode
strings.
15Current Status ICANN
- Prohibition/ recommendation against labels
starting with two characters and two hyphens if
they are not IDNA strings. - Recommendation established just before Montreal
meeting - Specifies language-based registry restrictions
but no details - Agreed to by CJK registries
- Some resistance from some gTLD registries
- Growing feeling that it will need some revision,
but no plan about how and when to do one. - Continuing uncertainty about gTLDs
16The Meaning of Language
- JET, ICANN, etc., use the term language to
describe tables and rules. - Not the normal usage
- Really Zone-Language-Script
- No one really knows what the limits of a
language are, although governments can make
decisions within their territories. - Scripts actually overlap in strange ways.
Neither Unicode Consortium nor ISO have been able
to define scripts associated with particular
languages - E.g., for some zones in Western Europe the
appropriate language-script is generic
European, i.e., Latin-1. For others, more
specific lists of characters may be needed.
17Look-alike Character Confusion
- Much focus so far on CJK
- Characters based on Chinese Han writing
- Making differently-encoded or different-appearing
characters match - Alphabetic language problem may turn out to be
harder - Common origins ?
- Avoiding having similar-looking, but distinct,
characters confused. - Not new 1 and l, 0 and O
- USA
- pectopan
18JET Guidelines and Their Extensions
- Per-zone, per-language restrictions on
registration - The idea of a variant character and IDN Package
- Mixed-script labels are
- Particularly good opportunities for deception
- Sometimes useful
- Well-defined (now) for CJK, but
- Alphabets may be harder
- Particularly difficult issues with
Roman-Greek-Cyrillic (pecopan, EAH,) - Labels as words in a language
- Not a traditional approach excludes fanciful
labels - Dictionary lookup is an approach, but may cause
other problems.
19Major Issues
- Multilingual strings
- Labels and names
- Variant charging in JET-like models
- Cost of a reserved label
- Cost of activation given that the label has no
value to anyone else - DNS as an administrative hierarchy
- New types of conflict/ dispute problems
20Technical Interoperability
- IDNA is entirely a client algorithm and
procedure, hence depends on correct client
implementations. - Plug-ins may help, but only with specific
applications. - Open source development effort being put
together. - JET Guidelines and similar approaches are
registry-dependent - Do not raise interoperability issues.
- May raise user experience ones
21Administrative Hierarchy Issues
- Policy and trust relationships
- No cross-tree cross-references to branches of
hierarchy - Organizational branding
- http//www.product.tld/ or
- http//www.organization.tld/product
22New Dispute and Resolution Issues
- ICANN-WIPO UDRP assumes
- Homogeneous scripts and language characters
- Conflicts about rights to identical names
- but not
- Labels constructed from line or box-drawing
characters - Look-alike characters and strings from different
scripts - Translations, transcriptions, transcodings
- Is the relevant name the IDNA encoding or its
display/presentation form?
23Problems IDNs Dont Solve
- Registration policy issues
- This language is more important
- The gTLD problem
- Applications and local character sets
- Even JET Guidelines wont eliminate confusion
- DNS is a poor search mechanism and getting
worse.
24The Whois Policy Issues
- Registration in non-ASCII and data in ???
- Searching of a multilingual/ multiscript database
- Reading the records
- Information about variants and IDN Package
contents
25Economics
- Domain Name Market has collapsed.
- Original success projections and the ICANN
Seven - Notions of a profitable monopoly over
multilingual TLDs do not seem to be going
anywhere.
26Competition and Policy
- Policy tradeoff between
- More flexibility of registrations
- Less risk of conflicts, deception, or fraud
- Each domain or zone will need to develop its own
policy, and there will probably be wide
variations. - So-called ML.ML introduces complex questions of
allocations - essentially independent TLDs.
- ICANN policy so far apply separately, no
rights to added domains - Implications of a country deciding to go its own
way with, e.g., local character codings.
27The Path Forward with IDNs
- Implementation on a per-zone and per-application
basis - Development of new dispute resolution policies
- Discovery of new interface, confusion, and
user-level interoperability problems - What do you do with a domain name in a script you
cant read or write? - The two hundred-sided business card
- Local character codings and Unicode mapping
- Running out of those names too
- DNS name guessing is not a good search/location
procedure in a growing network.
28Where Will We End Up
- Increasing use of search engines??
- Clear trend, but
- As the Internet gets larger, general-purpose free
text ones may have already peaked - Increasing use of interest-specific portals??
- DNS labels, IDNA strings, or label translation
- The idea of unique keywords
- More separation of searching or locating and
retrieval?? - Stable-reference URIs ?
- Information retrieval experience and bookmarks
- Deliberately-populated directories??
29Conclusions
- IDN deployment is starting and will succeed, but
- Registries, application developers, and users
have a lot to learn - Except in special cases, early user experiences
may not be wonderful. - An Internet that is
- Optimized to local language and culture
- Globally accessible and useful
- may not be easily attained
- IDNs are, at best, a useful tool in effective
localization and use in user languages and
scripts.
30Balancing Localization, Internationalization, and
User Experience
- Probably requires going beyond the DNS
- May require an Internet presentation layer
- Rethinking, not just patching
- Ways to find information
- Ways to remember what was found and accessing it
again - Thinking about things librarians have known for
centuries - New ideas about user interfaces
- Translation of a good French-language-oriented
interface into Chinese or Arabic may not produce
a good Chinese or Arabic interface.
31Selected Further Readings
- Role of the DNS RFC 3467
- IDNA RFCs 3490, 3491, 3492, 3454
- Unicode evolution and stability
- draft-faltstrom-unicode-synchronization-00.txt
(forthcoming) - JET Guidelines for CJK, applications to other
scripts - draft-jseng-idn-admin-04.pdf
- draft-xdlee-idn-cdnadmin-00.txt
- draft-klensin-reg-guidelines-00.txt
- draft-hoffman-idn-reg-00.txt
- Tradeoffs between labels and translations
- draft-klensin-idn-tld-00.txt
- Issues with domain names in unexpected forms
- draft-klensin-name-filters-02.txt
- Alternate searching and retrieval models
- draft-klensin-dns-search-05.txt
- draft-mealing-sls-02.txt
- ICANN Policy Statements and Recommendations
- IDN deployment statement http//www.icann.org/ann
ouncements/announcement-20jun03.htm - More generally http//www.icann.org/document-name
32Finding these documents
- RFCs
- ftp//ftp.rfc-editor.org/in-notes/rfcNNNN.txt
(NNNN is the RFC number) - Internet-Drafts (draft-xxxx)
- http//www.ietf.org/internet-drafts/document-name.
- Note that these documents are transient and that
the two-digit number is a version number. If the
version cited is not found, try a higher number
or the search engine at http//www.ietf.org/ID