Title: ML_MUA_1
1Testing multilingual support in Mail User
AgentsTERENA Pilot Project
- Yuri Demchenko, TERENA ltdemch_at_terena.nlgtTNC98
Dresden October 5-8, 1998
2TERENA Pilot Project on Testing Multilingual MUAs
- Officially started in April 1998 till September
1998 - The project objectives can be described as
- Develop benchmarking methodology for Multilingual
MUAs, and specify templates for collecting the
results in a coherent way. - Design a set of composite multilingual test
messages - Configure each MUA for all supported national
character sets and send the test messages to
other MUAs and to themselves. - Compile the results, analyzing how the MUA
composes, sends, receives and displays the test
messages. - Prepare recommendations for users - correct setup
and operation of popular multilingual MUAs
3The list of mail clients to be tested
- Derived from TERENA MUAs usage statistics based
on analysis of more than 3000 messages from
TERENA Mail archives collected during the period
August 1997 - March 1998
- Microsoft Windows (NT, 3.11, 95)
- Microsoft Outlook Express
- Netscape Mail 3.x and 4.x
- Netscape Messenger
- Qualcomm Eudora 3.0 and 4.0 beta
- Pegasus Mail
- The Bat!
- ESYS Simeon
- Alis Tango Mailer
- UNIX Terminal
- Elm
- MH
- Pine
- UNIX GUI (with X11R6)
- Netscape Mail
- EXMH
- Z-Mail
4Activity and Projects in i18n and Multilingual
Support
- i18n activity (ISO, IETF, ECMA, TERENA, Unicode
Consortium) - CEN/TC304 works on European character sets and
keyboard - MAITS project
- Internet Mail Consortium - Report on using
International Characters in Internet Mail - Terena Pilot Project on Testing Multilingual
support in MUAs
5Internet Mail Consortium - i18n Report
- Summary of recommendations
- 1. Explicit charset parameter
- 2. Sending UTF-8
- 3. Displaying UTF-8
- 4. Choosing charsets on creation
- 5. Specifying languages
- 6. Multi-language text
- 7. Non-ASCII headers
- 8. Handling all common charset
- 9. MTAs and 8-bit content
Report strongly recommends that all mail-creating
and mail-displaying programs created or revised
after January 1, 1999, must be able to create and
display mail using UTF-8 and have ability to
handle all common charsets in addition to UTF-8
6Standard on i18n and Character Sets Technologies
- ISO standards
- ISO 2022 Character Set Concept and Terminology
- ISO 8859-x Character Sets
- ISO Standards on APIs i18n and FDCC
- Unicode standards
- RFC 2277 IETF Policy on Character Sets and
Languages - Recommendation of IAB Workshop on character sets
technology (RFC 2130) - MIME format of messages (Using MIME in Internet
Mail) RFC 2045-RFC 2049 - RFC 822 - Syntax of electronic messages format
according
7Standards in i18n and Multilingual Support in
Internet Mail
- RFC 2045 - RFC 2049, RFC 2231 - MIME
- Coded Character Set
- Character Encoding Scheme specified by the
Charset parameter to the Content-Type header
field - Transfer Encoding Syntax like Base64, QP
specified by the Content-Transfer-Encoding header
field - RFC 2277 - IETF Policy on Character Sets and
Languages - main definitions and requirement for language
tagging - RFC 2130 - Recommendation of IAB Workshop on
character sets technology - framework for interoperability between the many
characters in use - an architecture model for on-the-wire
transmission of text - recommendations for tagging transmitted (and
stored) text
8RFC 2130 Architecture model
- User interface issues (OS, GUI, API)
- Layout
- Culture
- Locale
- Language
- On-the-wire
- The Coded Character
- The Character Encoding Scheme
- The Transfer Encoding Syntax
9The testing and the evaluation scheme
10Testing of Multilingual support in MUAs
- Includes the following phases
- Evaluation of Multilingual features/settings of
MUAs - Testing Message Reading procedure
- Testing Message Composing procedure
- Testing Message Sending and Receiving procedure
11Evaluation of Multilingual features/settings of
MUAs
- READ operation mode
- choose Language/Encoding
- choose Fonts (Optional for Address, Subject,
Message Body, Quoted Text) - Optional - Font mapping
- COMPOSE operation mode
- choose Language/Encoding Settings
- Optional - Possibility to switch
Language/Encoding during composition/typing - choose Fonts (Optional for Address, Subject,
Message Body, Quoted Text) - Optional - choose Spelling/Language/Dictionary
- SEND operation mode
- set MIME encoding (Quoted Printable, Base64)
- Optional - select/disable Uuencode mode (non
standard) - Allow/disallow 8-bit in Header Fields
- select/disable HTML in body parts
12Message Reading procedure
- Multilingual MUAs should support the following
features - Reading/Displaying non-ASCII characters in
Message Body - Reading/Displaying non-ASCII characters in
Message Header (Address, Subject Lines) - Reading Forwarded Message with non-ASCII
characters in Address, Subject, Message Body,
using the same or different MIME character set
attributes - Reading Attached non-ASCII Text File (Document)
- Possible problems are detected comparing the
original and the delivered test messages
appearance - This includes the evaluation of the MUAs
correct/incorrect processing of the MIME
attributes of the test message.
13Message Composing procedure
- Message composition operations to be tested
- Typing message from keyboard
- Copy and Paste operations
- Text/File attachments
- Quoted text/message
- Edit different parts of message
- Charset/Encoding processing by Message
Composer/Editor - Real Message composition also includes operations
like - Typing non-ASCII text in Message Body and Message
Header - Pasting non-ASCII-Text into Body and Header
fields - Reply to message with non-ASCII Text
- Forward message with non-ASCII content
- Attach text documents containing non-ASCII
characters
14Test messages set
- Each test is performed in at least 2 character
sets, one of which is US ASCII (or ISO 8859-1),
and the other with characters that are not part
of US-ASCII or ISO 8859-1. - Mandatory
- tmsg1 - Message with non-ASCII characters/text in
the Subject line - tmsg2 - Message with non-ASCII characters/text in
Mail Address free-form name - tmsg3 - Message with non-ASCII characters/text in
the Message Body text (single part) - tmsg4 - Message with non-ASCII characters/text in
text/plain attachment - Optionally
- tmsg6 - Message with UTF-7/UTF-8 Character set
in Message Body and Header (optional)
15Testing program map
16Testing Methodology - The tests to be performed
- test-1 - Receive all 4 test messages tmsg1-tmsg4
and display them correctly (Change
Language/Alphabet/Encoding Options if needed) - test-2 - Print all 4 messages tmsg1-tmsg4 to the
standard printer - test-3 - Reply to messages tmsg1 and tmsg2, and
check that information is returned in the same
character set as it arrived in - test-4 - Reply to message tmsg3 using "reply
including quote of body" - test-5 - Reply to message tmsg3 using the
environment's "cut and paste" function to insert
the non-ASCII characters into the outgoing
message - test-6 - Forward all 4 messages to the originator
address - test-7 - Generate, as completely as possible, the
same messages from the keyboard of the IUT - test-8 - Check possible text distortion when
exchanging by tmsg1-2-3 with non-ASCII Default
Language/Alphabet/Encoding - test-9 - Provide tests 1-5 for message tmsg6
with UTF-7/UTF-8
17Testing Results Presentation
18ML MUAs Testing Results and Data Analysis
- Testing results are documented and presented at
- http//park.kiev.ua/multiling/ml-mua/prjdocs/mlmua
-repv1.html - Standards overview on Internationalisation and
Multilinguality - http//park.kiev.ua/multiling/ml-mua/mldoc-review.
html - Test messages constructor pilot version
- http//park.kiev.ua/multiling/ml-mua/testcon.html
19Evaluation of ML MUAs
- First group - includes MUAs that support multiple
languages/alphabets by means of multiple charsets
support and use internal language/charset
transformation - Microsoft Outlook Express
- Netscape Messenger 4.04 and previous product
Netscape Mail 3 - exmh for X Windows
- Second group - provides ML support by selecting
proper font for creating and displaying messages
- Eudora Pro 3.0
- Pegasus
- Forte Agent
- The Bat!
- Simeon
- UNIX Terminal Products
- pine
- elm
20First group - Full Multilingual Support
ñëîâî
- Microsoft Outlook Express
- has the best and richest multilingual support
- use effective internal conversion scheme that is
good controlled by users via setup and
Alphabet/Charset selection menu - Netscape Messenger 4.04 and Netscape Mail 3.04
- provide rich multilingual support for many
charsets/encodings - but are very inflexible for Languages that have
many charsets in use (F.E., Cyrillic Windows
CP-1251 and KOI8-R/U for Russian/Ukrainian, or
ISO 8859-2 and Windows CP-1250 for Central
European Languages - Netscape products for X Windows - the same
features. - exmh for X Windows
- provides good support for main groups of
European languages using Latin 1, Latin 2
Cyrillic charsets
21Second group Simplified Multilingual Support
òâåðäî
- Popular in Latin1 (ISO 8859-1) and English
speaking community - Languages and charsets/encodings support is
provided by selecting proper font for creating
and displaying messages. - Eudora Pro 3.0
- Pegasus
- Forte Agent
- The Bat! provide simple conversion between
Cyrillic encodings (ISO 8859-5, Windows CP-1251,
KOI8-R) - Simeon
- pine and elm for UNIX
22Common problems of multilingual support in MUAs
óê
- Conversion between different Encodings/Charsets
for the same language - Correct processing of MIME tags in message Header
fields (Subject and Address lines) during
displaying when charset name in header is
different from Message Body - The same problems occur when user tries to change
Charset/Encoding when displaying or composing
message, or use CopyPaste operations for
different Charsets - View message source code and/or message info
(charset/encoding for the Header and Body,
Multipart MIME structure, so on) - Using common and correct terminology for
language/charset settings in MUAs
23Projects Main Results
ôåðòü
- The international environment of the project
allowed to discover the main problems in
multilingual MUAs support - Multilingual test messages set
- Evaluation scheme for the forthcoming ML MUAs
- Project activity was conducted in coordination
with other multilingual related projects - IMC MAIL-I18N report on Internationalization and
Character Set technologies - Mozilla i18n project (Netscape 5.0)
- PT members have contributed to the new Ukrainian
Language enabled Mozilla - proposed model of multilingual support in MUAs
was discussed - ESYS Simeon IMAP Mail multilingual features
testing
24Follow-on Projects and activity
õåð
- Testing new products using proposed methodology
- New releases of OutLook Express 98, Netscape
Messenger 4.5 and 5.0 - New products of 1999 that is expected will
implement recommendations of IETF/IMC - Another areas of further activity
- Establishing ML/i18n supporting Charsets
repository for online support of Multilingual
mail (mapping reference tables download,
translation, configuration, etc.) - Creating Web based ML test messages Constructor
which pilot version is demonstrated at projects
page - http//park.kiev.ua/multiling/ml-mua/testcon.html
25Test Messages Constructor http//park.kiev.ua/mul
tiling/ml-mua/testcon.html
26Test Messages Constructor - Creating test message
27Project Team
- Yuri Demchenko, TERENA
- Konstantin Chuguev, Ural Technical University,
Russia - Janja Faganel, Jozef Stefan Institute, Slovenia
- Vadim Shevchenko, Kiev Polytechnic Institute
- Alexey Medvedev, Kiev Polytechnic Institute
28Acknowledgments
øòà
- Borka Jerman-Blazic, Jozef Stefan Institute,
Slovenia - Claudio Allocchio, Sincrotrone Trieste INFN
Trieste, Italy - Peter Heijmens Visser from TERENA for provided
MUAs usage statistics - Harald T. Alvestrand, Maxware Norway
29IMPORTANT NOTE
åð
- Multilingual page will be moved and supported at
TERENA webserver http//www.terena.nl/multiling/
30åðû
31åðü
32ÿòü
33þ
34èà
35þñ ìàëûé
36þñáîëüøîé
37êñè
38ïñè
39Russian/Ukrainian LanguagesHistorical overview
ôèòà
- VI-XI cent. - Ancient Rus written language
- X-XIV cent. - Cyrillic written language
- Invented by Cyrill and Methody (Saloniki) in IX
cent - First introduced in Moravia with advent of
Christianity - Introduced in Kiev Rus with advent of
Christianity in X cent. - XIV-XVII - Forming Russian literature language
- With Forming Moscow State after Mongol higo
- XVII - Developing modern Russian literature
language - Lomonosov, Puskin
40Ukrainian Literature Language
èæèöà
- Common ancient roots with Russian and all Slavic
languages - Was influenced by centuries of conquerors
languages - features of analytical language (as English)
- 1818 - Published Gramatics of Ukrainian
(malorussian) dialect - introduced ukr. i, (for kg sounds),
spelling of äç, äæ - Forming modern Ukrainian literature language
(Taras Shevchenko) - 1921 - Published Main rules of Ukrainian
orthography - 1984 - introduction of new/lost ukr. letter
41(No Transcript)
42(No Transcript)