Health Information Standardization and Asian Languages - PowerPoint PPT Presentation

About This Presentation
Title:

Health Information Standardization and Asian Languages

Description:

Korea: KS X 1001, and 1001 annex 3. Hanguls(phonetic) and Ideographics. China ... Japan: JIS X 0208-1997. Katakana, Hiragana(Ph.) and Ideographics ... Pok mon ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 17
Provided by: mae14163
Learn more at: https://dicom.nema.org
Category:

less

Transcript and Presenter's Notes

Title: Health Information Standardization and Asian Languages


1
Health Information Standardization and Asian
Languages
  • Michio Kimura M.D. Ph.D.
  • Director and Professor of Medical Informatics
    Department
  • Hamamatsu University School of Medicine
  • HL7 Japan chair

2
Three types of representation-- We have 2
patient names in HIS
  • Alphabetic
  • Ideographic
  • Phonetic
  • Ideographic names
  • have many ways to pronounce
  • are difficult to sort

3
Multi-Byte Character Codesin Use in Asia
  • Korea KS X 1001, and 1001 annex 3
  • Hanguls(phonetic) and Ideographics
  • China(PR) GB 18030-2000
  • Taiwan(ROC) CNS 11643, and Big-5
  • Japan JIS X 0208-1997
  • Katakana, Hiragana(Ph.) and Ideographics
  • Junior school pupils must read/write 810 letters.
  • Varieties 6879(JIS) to 48711(CNS)

4
ISO 2022-1983 Multi-Byte Extension Technique
  • Base set is usually ASCII 1-byte(ISO 646)
  • Defines ESCAPE sequence to set character set to
    G0 or G2
  • Not necessarily multi-byte, to set ISO8859-1 ESC
    . A
  • If the set is 2-byte, it is assumed that
    following codes are recognized 2 bytes each.
  • To set JIS X 0208 ESC B
  • To set KS C 5601 ESC ( C
  • To set GB 2312 ESC A
  • To come back to ASCII ESC ( B

5
Byte-wise Representation of ISO2022
6
RFC 1468 Japanese Character Encoding for
Internet Messages
  • ISO-2022-JP
  • Within 7-bit, safe for most nodes
  • Every line starts/ends with ASCII
  • No carryover shifting
  • ISO-2022-KR is also used in Korea
  • Same method is in DICOM(Supplement 9), and HL7
    v.2.3.1

7
UNICODE ISO10646
  • Allocating 2 bytes for every character, UNICODE
    can represent every character in the world
    without any status nor shifting technique.
  • 16 bits65,536
  • -gt CJK unified ideographics

8
CJK Unified Ideographics
9
Why we do not use UNICODE as Message? (I know it
is used inside, but, we do not like it go outside
as message format.)
  • If Chinese Bone and our Bone are to be
    recognized same, because of symmetry, how about
    using these?
  • UNICODE consortium says Introduction of Language
    information.
  • We cannot write Chinese language textbook
    written in Japanese.
  • We cannot accommodate Koreans living in Japan
    with their name properly in Korean letter, but
    their address is Japanese, of course.
  • Original UNICODE dream is gone.

10
UTF-8 Transformation format of UNICODE
  • UNICODE is originally 2 byte for every character.
  • 0000-007F 0xxxxxxx
  • 0080-07FF 110xxxxx 10xxxxxx
  • 0800-FFFF 1110xxx 10xxxxxx 10xxxxxx
  • 1 Byte ASCII
  • 2 Bytes Latin extensions, Greek, Russian,
    Arabic, Thai, Hangul, Katakana, Hiragana, etc.
  • 3 Bytes CJK ideographics
  • ASCII characters are compatible ASCII, ASCII
    users can say we are universal, because we use
    UNICODE, in the demerit of ideographic users.

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
HL7 Japans answer to HL7 v.3
  • In XML, UNICODE will be default in 2003.
  • Even in UNICODE v3.1, over-unification problem
    is not solved.
  • But with XML schema and XML namespace, font
    information can be set in each tag.
  • By this, Korean name in Japanese address can be
    described.
  • Original UNICODE dream (all languages in the same
    time) is gone, but many 1 byte languages one 2
    byte language is not bad.
  • Pokémon
  • Answer UNICODE can be default, provided that we
    can continue to use each local practice now being
    used.

15
Language representation is not the only issue
  • Language used in
  • Conversation with patients
  • School education
  • Medical, Nurse, Technicians
  • Medical record
  • Signs and symptoms
  • Reports
  • Structure of data types
  • Address
  • 250 Wu-Hsing street
  • 1-20-1 Handa cho

16
Final Remarks
  • Some OS (Windows NT 4.0 or later) are using
    UNICODE inside.
  • I do not blame their ignorance, maybe they just
    didnt know.
  • I oppose any proposals with UNICODE is the only
    way.
  • When using UNICODE, pay attention to each
    languages proper fonts
  • Lets collaborate and agree on XML namespace for
    language to be used, and submit to standards.
  • Please take part in APAMI census for healthcare
    languages
Write a Comment
User Comments (0)
About PowerShow.com