863?? - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

863??

Description:

Title: 863 Author: Ma Dian Fu Last modified by: Lin Created Date: 3/10/2004 10:42:25 AM – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 61
Provided by: MaDi70
Category:
Tags: smil

less

Transcript and Presenter's Notes

Title: 863??


1
??
  • ??????????????????????
  • ??????????????(?????????????),??????????????????

2
XML
3
???????XML (eXtensible Markup Language)
  • XML?????????????SGML(Standard Generalized Markup
    Language)????????(HTML)?
  • 1996?,?????(W3C)???????????XML(eXtensible Markup
    language)????
  • 1998?2?10????XML1.0,??????????????????????????????
    ??
  • ????
  • http//www.w3.org/TR/REC-xml/
  • W3C Recommendation 04 February 2004
  • XML is a family of technologies XSL, XSLT,
    XPath, Xlink, Xpointer, DOM, etc.

4
(No Transcript)
5
An Example XML Document
lt?xml version"1.0" encoding"ISO-8859-1"?gt lttrade
Batchgt lttrade account"2520034" action"buy"
Duration "good-till-canceled"gt ltsymbolgtSUNWlt/sy
mbolgt ltquantitygt1000lt/quantitygt ltlimitgt20lt/limit
gt ltdategt2001-03-05lt/dategt lt/tradegt lttrade
account"9240196" action"sell"
duration"day"gt ltsymbolgtCSCOlt/symbolgt ltquantitygt50
0lt/quantitygt ltdategt2001-03-05lt/dategt lt/tradegt lt/tr
adeBatchgt lt!-- This is a comment --gt
6
XML Document
  • The document is composed of declarations,
    elements, comments, character references, and
    processing instructions, all of which are
    indicated in the document by explicit markup
  • A data object is an XML document if it is
    well-formed, as defined in XML specification
  • A well-formed XML document may in addition be
    valid if it meets certain further constraints

7
XML Document Contents
  • XML declaration
  • Processing Instructions
  • Elements
  • Tags Start-Tags, End-Tags
  • Attributes
  • Empty-Element Tags
  • PCDATA
  • CDATA
  • White Spaces
  • Comments

Prolog
Content
8
????????
  • ???????(lt)?????(gt)?????????????????????
  • ??????????????????????????
  • ????????????????-???
  • ??????,?????????
  • ??????
  • ????????????

9
XML ?????????????
  • XML ????? XML ??????,??????????????????
  • lt?xml version"1.0"
  • encoding"gb2312"?gt
  • ??lt!----gt
  • ????lt?......?gt
  • ??lt!ENTITY dw "developerWorks"gt

10
???
  • XML ??????????????????????????,??????????????????
  • lt?xml version"1.0"?gt
  • lt!-- A well-formed document --gt
  • ltXML??gt
  • Hello, World!
  • lt/XML??gt

11
XML Namespace - ????
  • ltdefinitions
  • xmlnsxsd"http//www.w3.org/2000/XMLSchema"
  • xmlnsxsd1http//example.com/stockquote.xsd
  • xmlnssoap"http//schemas.xmlsoap.org/wsdl/soap
    /"
  • xmlns"http//schemas.xmlsoap.org/wsdl/"
  • targetNamespace"http//example.com/stockquot
    e.wsdl"gt
  • lttypesgtlttypes /gt
  • lt/ definitions gt
  • ???????URI
  • ?????????????

12
XML Namespace(?)
  • XML?????????????????,?W3C???????
  • ?XML?,????????tag??????????????,?????XML????????,?
    ????????Namespaces????????????
  • ?XML Namespace??????Namespace??URI?????,?XML?????
    ?????????????????Namespace,???????????????????????
    ??
  • ???Namespace?XML 1.0???,??????????????????????????
    ????local names(????)????????????????????,????????
    ?????????,???????????????XML???????,?????????????

13
XML??(2)
  • lt?xml version"1.0" encoding"gb2312" ?gt
  • lt??gt
  •   lt??gt???lt/??gt
  •   lt??gt???????? ?????lt/??gt
  •   lt?? ????"??"gt??lt/??gt
  • lt??gt
  •   lt??gt????lt/??gt
  •   lt??gtweb ????lt/??gt
  •   lt/??gt
  •   lt??gt13701068603lt/??gt
  •   ltE-mailgtdfma_at_nlsde.buaa.edu.cnlt/E-mailgt
  •   lt??gt??????????lt/??gt
  • lt/??gt

14
Well-Formed XML Documents
  • A "Well Formed" XML document has correct XML
    syntax
  • A textual object is a well-formed XML document if
    it has the correct XML syntax
  • It contains one or more elements
  • There is exactly one element, called the root, or
    document element
  • The name in an element's end-tag must match the
    element type in the start-tag Names are
    case-sensitive
  • Each of the parsed entities which is referenced
    directly or indirectly within the document is
    well-formed

15
DTD
  • ??????(Document Type Definition ,DTD)?XML
    1.0??????
  • DTD????????????????,????XML??????

16
1 lt?XML version"1.0" standalone"yes"?gt 2
lt!DOCTYPE Students 4 lt!ELEMENT Students
(Student)gt 5 lt!ATTLIST Students Class
CDATA REQUIREDgt 6 lt!ELEMENT Student (Name,
Age?)gt 7 lt!ATTLIST Student SId CDATA
REQUIREDgt 9 lt!ELEMENT Name (PCDATA)gt 13
lt!ELEMENT Age (PCDATA)gt 18 gt 19 ltStudents
ClassSY9061gt 20 ltStudent SId"12345"gt 21
ltNamegtLinlt/Namegt 22
ltAgegt20lt/Agegt 23 ltAddressgt 24 ltCountrygtC
hinalt/Countrygt 25 ltCitygtBeiJinglt/Citygt 26
lt/Addressgt 27 lt/Studentgt 28 ltStudent
SId"12345"gt 29 ltNamegtLinlt/Namegt 30
lt/Studentgt 31 lt/Studentsgt
17
DTD???
  • ELEMENT ??
  • ??????
  • ATTLIST
  • ???????????????????????
  • ENTITY
  • ????????
  • NOTATION
  • ??????????(???????)?????,?????????????????

18
DTD??????
  • DOCTYPE ??
  • ??DTD??
  • lt!DocType catalog ???? gt
  • ??DTD??
  • lt!DOCTYPE catalog SYSTEM http//myserver/decs/pub
    catalog.dtdgt

19
?????
  • ??D T D ??????????,??????????????????? ?????X M L
    ?????D T D ,?????????????D T D ,???????D T D
    ??????

20
DTD???
  • DTD?????????,????XML????,????XML???,?????????????
    ??XML,?????????????????DTD,??????????XML??,??XML?
    ????
  • DTD???????????
  • ????,???????DTD???????,??,??XML???????DTD,?????XML
    ?????????

21
XML Information Set and Canonical XML
  • ?????XML??????????????
  • ???????????
  • ???????XML?????

UTF-8
lt?xml version"1.0" encoding"gb2312" ?gt
lt??gt   lt??gt???lt/??gt lt??gt  
lt??gt????lt/??gt   lt??gtweb ????lt/??gt  
lt/??gt ltE-mailgtlt/E-mailgt lt/??gt
White space, CR, CR-LF, and LF line termination
ltE-mail/gt
22
XML Information Set
  • ?W3C???????XML Information Set????????????????????
    ????XML??????
  • XML Information Set?????????,????(document)
    ???(element)???(attribute)???(character)???(commen
    t)????????????????????XML????????
  • ???????,??XML????????XML?????,???XML Information
    Set???????????XML??????
  • W3C?????XML??????????XML?????????XML Information
    Set?????????????,XML Information
    Set????????XML??????????????????

23
  • XML Information Set???????XML Information
    Set?????????,????????????XML?????,?????????
  • XML?????????????(Information Item)??,???????????(p
    roperties)?
  • XML Information Set????????,????????,??XML
    Information Set????????????
  • Information Set?Information Item????tree,no
    de???????

24
  • Information Set????11????Information Item?
  • The Document Information Item
  • Element Information Item
  • Attribute Information Item
  • Processing Instruction Information Item
  • Character Information Item
  • Comment Information Item
  • The Document Type Declaration Information Item
  • Unexpanded Entity Reference Information Item
  • Unparsed Entity Information Item
  • Notation Information Item
  • Namespace Information Item

25
  • What is not in Information set
  • The content models of elements, from ELEMENT
    declarations in the DTD.
  • The grouping and ordering of attribute
    declarations in ATTLIST declarations.
  • The document type name.
  • White space outside the document element.
  • White space immediately following the target name
    of a PI.
  • Whether characters are represented by character
    references.
  • The difference between the two forms of an empty
    element ltfoo/gt and ltfoogtlt/foogt.
  • White space within start-tags (other than
    significant white space in attribute values) and
    end-tags.
  • The difference between CR, CR-LF, and LF line
    termination.

26
  • The order of attributes within a start-tag.
  • The order of declarations within the DTD.
  • The boundaries of conditional sections in the
    DTD.
  • The boundaries of parameter entities in the DTD.
  • Comments in the DTD.
  • The location of declarations (whether in internal
    or external subset or parameter entities).
  • Any ignored declarations, including those within
    an IGNORE conditional section, as well as entity
    and attribute declarations ignored because
    previous declarations override them.
  • The kind of quotation marks (single or double)
    used to quote attribute values.
  • The boundaries of general parsed entities.
  • The boundaries of CDATA marked sections.
  • The default value of attributes declared in the
    DTD.

27
Canonical XML
  • ?W3C???????
  • ?XML 1.0??????,??????????,????????????,???????????
    ???,???????????????????????,???XML??????,?????????
    ????????XML??,??Canonical XML???????,???????
  • ????????????????,???????,???????(??XML
    Signature),???????XML???????????????????

28
The End!
29
???????
30
ASCII
  • ??,Internet????????ANSI?ASCII???(American
    Standard Code for Information Interchange,
    ?????????)
  • ???7 bits???????,????128???,?????000-07F?
  • ASCII??????????????????????????000-020?07F?33??
    ????

31
  • ISO-8859-1????????,????ASCII,??????000-0xFF,000-
    07F?????ASCII??,080-09F???????,0xA0-0xFF???????
    ?
  • ISO-8859-1??????ASCII??????,??????????????????????
    ????????????????????,??????ISO-8859-1???

32
  • Latin1?ISO-8859-1???,???????Latin-1?
  • ASCII?????7????,ISO-8859-1?????8?????
  • ??ISO-8859-1????????????????,???ISO-8859-1????????
    ?????????????????????,?????????????ISO-8859-1?????
    ??????????????,MySQL????????Latin1??????????

33
Unicode and UCS
  • ????????????????,????????????????,????????????????
    ???
  • ??,???????????????????????Unicode???ISO???????????
    ???,???????????????????????????????
  • Unicode????Universal Multiple-Octet Coded
    Character Set,???UCS?
  • UCS?????Unicode Character Set????

34
  • Unicode unicode.org???????, ??????????????.
  • ?1.0??16???, ?U0000?UFFFF. ??2byte???????
  • ?2.0?????16???, ???16????????, ?????16????,
    ???20???, ????0?0x10FFFF.

35
  • ISO 10646???????31??????
  • ????????(00000-0xFFFD)?????????(Basic
    Multilingual Plane, BMP)
  • ?????????????????
  • BMP????????????????,??????BMP????????????????

36
  • ?1991???,?????????????????????
  • ?unicode2.0??, unicode????ISO 10646-1????????,
  • ISO???ISO10646??????0x10FFFF?UCS-4????, ????????.
  • ??Unicode???????????ISO 10646????
  • Unicode 3.0???????BMP????

37
UCS-2 UTF-16
  • UCS????????,?????????????????
  • ?????UCS???6C49
  • UTF-8?UTF-7?UTF-16???????????
  • UTF?UCS Transformation Format????

38
  • UCS-2?UTF-16?UCS???(????Unicode???)?????????????
  • UCS-2??????????,????????????????,?????BMP????????
  • UCS-2???GBK?Big5,?????????,???????????,???????????
    ???????????

39
  • UTF-16?????,??????BMP??????,?4??????BMP???????????
    ????
  • UTF-16?UCS-2???,UTF-16?????????????UCS-2??,?????BM
    P????UCS-2?????UTF-16?

40
  • UTF-16??????????,?????UTF-16???,??????????????????
    ???Unicode???594E,??Unicode???4E59???????UTF-1
    6???594E,?????????
  • Unicode????????????????BOM?BOM??Bill Of
    Material?BOM?,??Byte Order Mark?

41
  • BOM???????????
  • ?UCS????????"ZERO WIDTH NO-BREAK
    SPACE"???,?????FEFF??FFFE?UCS????????,????????????
    ??UCS?????????????,?????"ZERO WIDTH NO-BREAK
    SPACE"?
  • ?????????FEFF,?????????Big-Endian?????FFFE,??????
    ???Little-Endian??????"ZERO WIDTH NO-BREAK
    SPACE"????BOM?

42
(????)
  • UCS-2?UTF-16??????????????????,???big
    endian?little endian(?????)?
  • ???(U554A)?big endian????0554A,?little
    endian????04A55?
  • UCS-2?UTF-16???????big endian???
  • ??????????????????????BOM(Byte order
    Mark),0xFEFF???big endian,0xFFFE???little endian?
  • UCS-2BE?UCS-2LE?????????????,???big endian?little
    endian,UTF-16BE?UTF-16LE??????????BE???,?????UCS-2
    ???UCS-2BE???

43
UTF-8
  • UTF-8?UCS???????????,UTF-16??????????(16?),?UTF-8?
    ?????????(8?)?
  • UTF-16????????????????,UTF-8?????????????????
  • ????UTF-8??????????UCS-2?????,?UCS-2?UTF-8????????
    ?

44
UCS-2 UTF-8
U0000 - U007F 0xxxxxxx
U0080 - U07FF 110xxxxx 10xxxxxx
U0800 - UFFFF 1110xxxx 10xxxxxx 10xxxxxx
45
  • ?????Unicode???6C49?6C49?0800-FFFF??,??????3????
    ?1110xxxx 10xxxxxx 10xxxxxx??6C49??????0110
    110001 001001, ??????????????x,??11100110
    10110001 10001001,?E6 B1 89?
  • ?????UCS-2???0554A,???????0101 0101 0100
    1010,??UTF-8?????????1110 0101 10 010101 10
    001010,????????0xE5958A?

46
  • ??UTF-8??????UCS??????,????UTF-8?????
  • UTF-8???ASCII??,????ASCII??????UTF-8??ASCII???????
    ???000-07F????????ASCII??,?????????????GBK?Big5?
    ??????UTF-8???????
  • ??U007F?UCS??,?UTF-8???????????
  • UTF-8??????????????000-0xFD??(???UCS-4?????,????0
    00-0xEF??)????????????????????
  • ???????????080-0xBF??0xFE?0xFF?UTF-8???????
  • GBK??????????UCS-2??????U0800 -
    UFFFF??,????GBK?????????UTF-8????3?????GBK???????
    ??UTF-8???????3????,?GBK???????
  • ?UTF-8?????????????????,?????????????????,????????
    ???????????,????????,???????????????????????UTF-8?
    ????????????

47
?????????
  • ????????????,??????????????????????????????????
  • ??????????????????????
  • ???????
  • ??????
  • ?????????
  • ???????????????????,???? Charset/encoding,???
    EF BB BF UTF-8FE FF UTF-16/UCS-2, little
    endianFF FE UTF-16/UCS-2, big endianFF FE
    00 00 UTF-32/UCS-4, little endian.00 00 FE FF
    UTF-32/UCS-4, big-endian.

48
  • ???????????????,?????????????????????????????,????
    ????????????????HTTP?????????,??????????????????HT
    TP?????????????
  • Content-Type text/htmlcharsetutf-8

49
  • ?????????Html??,??????????????
  • ltmeta http-equivContent-Type
    contenttext/html charsetUTF-8/gt
  • ?????????????????????,???????Charset????
  • UCS-2/UTF-16?BOM???????????????,??????????BOM?????
    ?

50
  • Java ?????????????? UTF-16??? Java ??? charset
    ??? 16 ? UTF-16 ???????????????????
  • Java ????????????????? charset?
  • US-ASCII, 7 ? ASCII ??,??? ISO646-US?Unicode
    ?????????
  • ISO-8859-1,  ISO ????? No.1,??? ISO-LATIN-1
  • UTF-8, 8 ? UCS ????
  • UTF-16BE, 16 ? UCS ????,Big Endian(??????????)????
  • UTF-16LE, 16 ? UCS ????,Little-endian(??????????)?
    ???
  • UTF-16, 16 ? UCS ????,?????????????????

51
??????
  • GB2312
  • GBK
  • Big5
  • GB18030

52
GB2312, 1980
  • GB2312????????????????????,?????????????????,?
    ????????,1981?5?1???,??????????????????
  • GB2312???????????????????7445?????,?????6763??
  • GB2312????????????????????,?????????????,???????
    ??????,??????????

53
  • GB2312?????????,?????????94??,?????94??,??????????
    ??????????????????? ?10??????,?1601???16?1?,??????
    ??
  • ????01-09????????,16-87?????,10-15?88-94??????????
    ?????????????????? ?3755?,??16-55?,???????/??????
    ????????????3008?,??56-87?,???/??????????
    ?????????,??????????????????????,?????????????????
    ??????????
  • ???????????????,??????????????

54
  • GB2312???????2121H-777EH,?ASCII???,??????GB???????
    ???1?????
  • ???????????????0xA0????GB2312???
  • EUC-CN?????GB2312???,?GB2312?????
  • ????GB2312???????? Unicode?UTF-8?

55
GBK, 1995
  • GB2312-80????6763?,?????????,??????????????????,??
    ???????,????????,????????,???GB2312-80,????????
    ?????(??)?(??)?(????)????,??????,????????????????
    ????,????????????????????,????????????
  • ????????,????UNICODE???,?????????????1995?12?1???
    ???????
  • GBK???GB2312 ????,????ISO 10646????,??????????????
    ????????

?????
56
  • 1995????????GBK1.0???21886???,??????????????????21
    003????
  • GBK ????????,???????8140-FEFE??,????81-FE??,????40
    -FE??,??XX7F????
  • GBK ????????, GBK?????????08140-0xFEFE,???????07
    F??????????081-0xFE,??????040-7E?080-0xFE?

57
  • ??????040-07E????(?)?????,???????????????????
    ????? GBK??????,??????????????GB2312?????????????
  • ????040-07E?GBK????????,?????????ASCII????,?????
    ????????
  • ??GBK??????080???????? ?ASCII????????????040?AS
    CII?????????,?????????,???????????????Big5????????
    ???

58
Big5
  • Big5??????,????????081-0xFE,????????040-07E?0xA
    1-0xFE??GBK??,??????080-0xA0????08140-0xA0FE????
    ?,????????
  • Big5????????????,???????,?????????????GBK?????????
    ?????Big5????????Big5?????????,??????Big5????????,
    ????????Windows?????????CP950????????Big5???,?Big5
    ???????7?????????Big5?????????GBK??????,????Big5??
    ????GBK????????,???????????
  • ??Big5????ASCII?????(???????040-07E),??Big5?????
    ??????GBK???????,???????040-07E???????????,?????
    ??05C(/)?07C()????????GBK???????

59
GB18030 , 2000
  • GB18030??????GBK?GB2312,????????????,?????????????
    GB18030?????Unicode3.1????,??????????,GBK?????????
    ?,???????????????????????
  • GB18030???????,?????????????????
  • GBK?GB2312?????????,?????ASCII?????????,??????????
    ????????????
  • GB18030
  • ????????000-07F,?????ASCII
  • ?????????GBK??,????081-0xFE,???
    ??????040-07E?080-FE
  • ??????????????????081-0xFE,??????030-039

60
??????
  • ??"   ?Google????E9BB91E799BD    
  • ???????BADAB0D7  
  • ??????utf8???,???????MBCS?GB2312???????  

back
Write a Comment
User Comments (0)
About PowerShow.com