The GREEN Digital Library A Specialized Materials Science Collection Of the National Science Digital - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

The GREEN Digital Library A Specialized Materials Science Collection Of the National Science Digital

Description:

It consists of the translation of textual material into the ... machine ... and translation, web page localization, machine translation and ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 38
Provided by: gregory106
Category:

less

Transcript and Presenter's Notes

Title: The GREEN Digital Library A Specialized Materials Science Collection Of the National Science Digital


1
Gregory M. Shreve
Software Localization and Internationalization Ho
w and Why
2
Internet, E-Commerce Foreign Markets
Internet World Stats estimates the current number
of WWW users at 785 million. Of these, 29 reside
in North America, 27.7 reside in Europe, and 31
reside in Asia with penetration rates of 69.8,
29.9 and 6.7 respectively. With 58.7 of
current users residing in regions with an average
penetration rate of only 18.3, it is clear that
these foreign markets offer substantial rewards
for those prepared to enter them.
The growth of the Internet and e-commerce over
the next decade will be driven by the expansion
of foreign markets.
3
Consumer as Foreigner
In 2003 e-commerce sales to foreign customers
exceeded domestic sales. This year the European
Internet economy is expected to break the 4
trillion dollar mark, growing at a compound
annual rate of 87. Western Europe is expected to
lead all regions with 692 billion dollars in
global online exports in 2004. North America will
move 23 of its exports online, with the U.S.
pumping 210 billion dollars into cross border
e-commerce. The Asia-Pacific region will reach
219 billion dollars in 2004, sparked by 57
billion dollars in Japanese online exports.
4
Global, Globalize, Globalization
Companies that intend to sell online will have to
globalize their web presence and their products
to reach the majority of the online marketplace.
They will have to make their web sites, software
interfaces, and product documentation available
in the languages and cultural styles of an
increasingly diverse and international market by
applying a process called localization the
translation of content and adaptation of
interface and form to reflect the expectations of
one or many given locales. For global-strategy
American companies, over 40 of total revenue
comes from international sales. These companies
market high- technology products such as
software, medical instrumentation, CAD / CAM
devices, and so on.
5
Global, Globalize, Globalization
Most of these products have a high document
overhead, with instructions on the assembly, use,
maintenance, and repair of the products delivered
via off- and on-line electronic documentation.
Most are marketed and supported online. Further,
many products may have embedded software
components and user interfaces use on-line
databases. These products and documents must be
delivered to locales, target markets with
different cultural and linguistics contexts.
Marketing packages, web
Support customer, technical, web
UI user interfaces
Documentation manuals, help files
CBT computer-based-training
6
Language Industry
While global marketing existed before the 1990s,
the translation / software localization industry
(or language industry for short) today has
evolved primarily as a result of the rapid global
expansion of the computer software market and the
increasing use of the Internet as a global
marketing and customer service tool all part of
globalization.
The corporate problem is, of course, that many
companies do not understand HOW to prepare their
many products, documents, web pages and database
interfaces for distribution in other linguistic
and cultural locales hence the need for the
services of the language industry.
7
New Media, New Markets
Experts estimate the current worth of the U.S.
language industry at just under 2 billion
annually, with the global market worth
approximately 6 billion. Indications are that
growth will continue to be strong into the next
decade because of new electronic media and
markets.
Consider the case of massively multi-player
online games (MMOGs) the language industry
enables the publishers of these games to leverage
their initial development investment by
translating and adapting the games for
international locales. Industry projections are
that MMOGs will post a 52 cumulative annual
growth rate between 2002 and 2006.
8
Initial Definitions
  • This presentation examines the issues and
    processes involved in software internationalizatio
    n and localization.
  • There are three related major processes to
    consider. We have already discussed
    globalization.
  • globalization, a strategic decision to reach an
    international audience or to include different
    linguistic and cultural materials in a product,
    software application, web site or digital
    collection
  • internationalization, a design process intended
    to enable efficient and cost-effective subsequent
    linguistic and cultural adaptation
  • localization, the preparation of locale-specific
    versions of an applications interface and
    content.

G11N L10N I18N
9
Internationalization Localization
Localization is the preparation of
locale-specific versions of a software
application, electronic document, internet
resource, or digital collection. It consists of
the translation of textual material into the
language and textual conventions of the target
locale and the adaptation of non-textual
materials and delivery / display mechanisms to
take into account the cultural requirements of
that locale.
Internationalization is an upstream engineering
process that should precede localization. Its aim
is to make subsequent localization/translation
easier, more efficient, and less costly.
10
Scope of Processes
Each of these processes has a different scope and
occurs at a different point in the business and
document cycles of an organization.
globalization
organizational policies strategies
Earlier
business, IT, document processes
internationalization
Later
documents, interfaces, tools
localization
translation
11
Evolution of Software Localization
Software localization developed as part of the
globalization of the personal computer software
market. Software applications and supporting
electronic documents were the first localized
products. The growth of the Internet and the
World Wide Web created a demand for localized web
pages and sites. Digital multimedia and digital
repositories (including digital libraries) are
emerging foci of localization.
2005
repositories
multimedia
WWW
PC software
1980
12
Document Display and Content
non-linguistic
display
color, graphics, icons, symbols, display
organization
date, time, calendar, currency, number, address
interface menus, dialogs, messages, prompts,
alerts, document organization, writing system
content
metadata, vocabularies
Localization focuses on both display (appearance,
presentation) and content. Thus, localization
includes a cultural adaptation as well as a
linguistic translation component.
content help files, auxiliary documents, HTML /
XML document content
linguistic
13
Localizing Software Applications
Software applications were the first localized
electronic documents Early localization included
finding all strings embedded in code
strings are directly in code
source.c
14
Extract Localizable Resources
PortfolioMenu MENU BEGIN POPUP "File"
BEGIN MENUITEM "Add Student",1
MENUITEM SEPARATOR MENUITEM "Delete
Student", 2 MENUITEM SEPARATOR
MENUITEM "Update Student", 3 MENUITEM
"Exit", 4 END POPUP "Tools" BEGIN
MENUITEM "Add Portrait", 5 END
POPUP "Help" BEGIN MENUITEM "About
Portfolio", 6 MENUITEM SEPARATOR
MENUITEM "Contents", 7 END END
  • Strings are not the only
  • localizable material
  • dialog boxes
  • controls
  • labels
  • menus
  • icons
  • graphics
  • tooltips

RESOURCES
15
Localizing Web Pages
Web sites are also now being localized. The link
below points to a commented HTML file that gives
a simple introduction to localizing an HTML web
page. At the localizers level some of the issues
(not an exhaustive list) are
  • character sets
  • localizing tag content
  • recognizing which tags have localizable content
  • not breaking tags
  • looking for text generated by attributes (title,
    alt)
  • looking for text generated by scripts
    (server-side, client-side)
  • evaluating CSS and stylesheet changes
  • making changes to graphics
  • dealing with graphics with integral text

Localization of HTML
16
A Solution Re-Engineer the Software
As one could imagine, localizing directly in code
led to problems. First, translator / localizers
were quite capable of breaking code. There were
also problems associated with the necessity for
multiple re-builds of the basic software for
each language version. Language expansion
(differences in textual volume) created sizing
problems in dialogs and controls. Localization
was labor-intensive, difficult and expensive. A
solution was to re-engineer the software with the
intent of separating language resources from the
underlying delivery mechanism.
17
Internationalization Separate Resources
Internationalization is a re-engineering and
re-design process intended to make localization
and translation easier, faster and more
cost-effective. A first step in the
inter-nationalization of software applications is
the separation or extraction of linguistic and
cultural resources from the application, leaving
a neutral software kernel. Extraction requires
specialized localization tools.
resources
applicationsoftware
kernel
18
Extract Localizable Materials
include ltstdio.hgt extern unsigned char
intl_m_msg(), intl_f_msg() main() int n
char y5 printf(intl_m_msg("","mypg",1))
while(1) printf(intl_m_msg("","mypg",2))
scanf("d",n) printf(intl_m_msg("","mypg",3),n
,n) printf(intl_m_msg("","mypg",4))
scanf("s",y) if(strcmp(y, (intl_m_msg("","mypg
",6))) printf(intl_m_msg("","mypg",5))
exit()
This program converts decimal numbers to
hexadecimal\n\n" \n Enter decimal number \n
Number entered is ltdgt decimal and ltxgt hexa \n
Do you want to continue? \n exiting ..\n yes"
1 2 3 4 5 6
source.c
mypg.en
19
Extract Localizable Materials
include ltstdio.hgt extern unsigned char
intl_m_msg(), intl_f_msg() main() int n
char y5 printf(intl_m_msg("","mypg",1))
while(1) printf(intl_m_msg("","mypg",2))
scanf("d",n) printf(intl_m_msg("","mypg",3),n
,n) printf(intl_m_msg("","mypg",4))
scanf("s",y) if(strcmp(y, (intl_m_msg("","mypg
",6))) printf(intl_m_msg("","mypg",5))
exit()
Ce programme convertit les nombres décimaux en
hexadécimal\n\n \nEntrer le nombre décimal \nLe
nombre entré est ltdgt décimal et ltxgt
hexadécimal \nVoulez vous continuer? \nSortie
..\n oui
1 2 3 4 5 6
source.c
mypg.fr
20
Content and Display in Web Pages
Web pages share the problem of separation of
content and coding with application software.
You can see from our web page example how true
this is. Internationalization solutions in web
pages also involve the extraction of linguistic
and cultural material from the software vehicle.
Cutting edge solutions create dynamic HTML from
XML-based language content.
ltgradinquirygt ltnamegt
ltfirstnamegtJoan lt/firstnamegt
ltlastnamegtSmithlt/lastnamegt lt/namegt
ltaddressgt ltaddressline1gt266 South
Prospect Streetlt/addressline1gt
ltaddressline2/gt ltcitygtKentlt/citygt
ltstategtOhiolt/stategt ltzipgt44240lt/zipgt
lt/addressgt ltcountrygtUSAlt/countrygt
ltphonegt330-673-9999lt/phonegt
ltfaxgt330-672-4017lt/faxgt ltemailgtgshreve_at_neo.r
r.comlt/emailgt lt/gradinquirygt
HTML
ltBODYgt ltTABLEgt ltTRgtltTDgtJoanlt/TDgtltTDgtSmithlt/TD
gtlt/TRgt ltTRgtltTDgt266 South Prospect
Streetlt/TDgtlt/TRgt ltTRgtltTDgtKentlt/TDgtlt/TRgt
ltTRgtltTDgt Ohiolt/TDgtlt/TRgt ltTRgtltTDgt
44240lt/TDgtlt/TRgt . . . ltTABLEgt ltBODYgt
XML
21
Two Multilingual Web Architectures
Multiple static versions of pages stored in a
folder hierarchy by language and navigated by
selection mechanism
Principle of separating linguistic from software
elements as used in software localization

static web page is selected and displayed
language selection
XSL transforms
OLD
NEW
multilingual XML content
22
I18N Content Management
Style Sheet Repository
format
deploy
localization
Dynamic Pages
translation
Content Repository (archive, database)
Display Medium
organize, classify
XML Representation (content only, strip format)
This system assumes an Internationalized dynamic
web page architecture
acquire information
23
Internationalization Control
Truly effective internationalization also
involves early intervention in and re-design of
upstream business and document processes like
authoring to exert greater control and to reduce
variability.
24
Internationalization Authoring
For instance, intervention in and re-design of
document creation processes (authoring) can yield
significant downstream benefits for
localization. Controlled language and terminology
control are two strategies.
technical writers
dependency
controlled languages terminology control
I18N
machine translation
help text
software documents
localizationvendor
L10N
25

Internationalization Localization
technical writers
controlled languages terminology control
Internationalization engineers work with or for
clients to create internationalized products.
help text
software documents
localizationvendor
I18N
L10N
resources
software internationalization tools
localizable software distribution
internationalization engineers

26

Localization Management Tools
A localization project requires its own processes
and tools.
project management tools
QA/testing / validation tools
localizable software distribution
localizationproject
L10N
localization tools
workflow management
document / version control
translators / localizers

27

Localization Management Tools
project manager localization engineer
Translation memories and terminology managers are
important tools for maintaining standardized
translations and glossaries. TMs provide the
focus of QA, ensure replicability /
repeatability, and allow re-use of linguistic and
cultural materials.
localizable software distribution
localizationproject
localization tool (enterprise)
localization tool (translator)
translation memory
localization toolkit (distribution)
terminology manager
translators / localizers

28

Localization Management Tools
Specialized localization for alignment and term
extraction are used to automate the construction
of TMs.
text alignment tool
localization tool (translator)
translation memory
localization toolkit (distribution)
terminology manager
translators / localizers
term extraction tool

29

Reusability
new version uses 70 same text
latest version uses 80 same text as previous
translation memory
Version 2
Version 3
20 change
30 change
Version 1
Reusability is an especially important objective
of internationalization and reduces the cost of
localization.
initial translation with TM tool

30
Goals of Internationalization
The goals of internationalization are
These goals are met by separating content from
display, defining and extracting culturally
variable material from fixed or neutral material,
intervening in the document cycle to exert
control over document processes, and using
translation memories and terminology management
to ensure critical characteristics such as
authority and reusability
reusability
translations
scalability
I18N solution
authority / quality
equivalence
accessibility
cross-language
accuracy / acceptability
target culture(s)
control
target document
31
Enhanced Corpora
Future directions in internationalization will
involve exploiting document corpora more
effectively and extracting useful linguistic and
textual objects for control and re-use. Control
of the document cycle begins with understanding
the documents we already own and enhancing
them.
32
New Localization Objects
Many linguistic objects useful in
computer-assisted authoring and translation, web
page localization, machine translation and
cross-language information retrieval (including
browsing) can be extracted from a well-understood
and deliberately structured document corpus.
33
Corpus Replication
Using statistical techniques it is possible to
replicate the contents of a monolingual corpus
and add multilingual equivalents for terms,
phrases, document segments and other objects to
it.
34
What The Industry is Doing Now
The language industry currently relies on using
translation memories and terminology managers.
There are significant drawbacks to this method
that prevent new gains in cost reduction and
profitability the goal of inter-nationalization.
35
A New Model
New approaches to internationalization and
automatic localization leverage the linguistic
value of existing corpora and allow the creation
of enhanced corpora whose contents are
understood and controlled. Statistical corpus
linguistics and XML combine to allow the next
step in localization technology.
36
Peer-to-Peer Localization Resources
A peer-to-peer networking platform with a
security and digital rights management layer can
be used to link clients in an XML resource
network. A vendor can assess per transaction
charges for access to corpus object stores.
37
Socio-Cultural Style Sheets
The peer-to-peer networking platform can also be
used to provide new capabilities for next
generation localization. Client-Side
Socio-Cultural Style-sheets (CSSCS) can provide
for automated solutions to on-the-fly provision
of web content in the languages and formats
desired by and expected by web users all over the
world.
Write a Comment
User Comments (0)
About PowerShow.com