Java

Internationalization
Frequently Asked Questions

This page answers common questions about internationalization of the Java 2 platform, Standard Edition, version 1.3.1, and of Sun's Java 2 Runtime Environments, Standard Edition, version 1.3.1. For more information, see the Internationalization home page.


General Questions

What is internationalization?

Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.

What is localization?

Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

How do I go about internationalizing an existing program?

See the steps outlined in the Checklist section of the The Java Tutorial.


Locales

What is a locale?

A locale is a geographic or political region that shares the same language and customs. In the Java programming language, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to locale.

Where can I find some coding examples that use Locale objects?

See the Setting the Locale section of the The Java Tutorial.

Which locales are supported?

The supported locales vary between different implementations of the Java 2 platform and between areas of functionality. Information about the supported locales in Sun's Java 2 Runtime Environments is provided by the Supported Locales document.

Can a Java application use multiple locales?

Yes. This capability allows you to create multilingual applications.


Resource Bundles

What is a resource bundle?

A ResourceBundle object allows you to isolate localizable elements from the rest of the application. With all resources separated into a bundle, the application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Where can I find some coding examples that use ResourceBundle objects?

See the Isolating Locale-Specific Data section of the The Java Tutorial.

How do I specify non-ASCII strings in a properties file?

You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:

s1=hello there
s2=\uff2d\uff33\u30b4

If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift-JIS, a popular Japanese encoding.

How do I compile a non-ASCII ListResourceBundle?

If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift-JIS encoding as follows:

javac -encoding SJIS LabelsResource_ja.java


Text Processing

How do I format a date?

You can use the SimpleDateFormat to format and parse dates in a locale-sensitive manner. See the section on formatting Dates and Times in the The Java Tutorial.

How does setting the default locale affect the results of sorting?

The Collator class, and its subclasses, are used for building sorting routines. These classes are locale-sensitive, and when created with the no-argument constructor will use the collating sequence of the default locale.

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Since decomposing takes time, turning decomposition off makes comparisons go faster. However, for Latin languages the NO_DECOMPOSITION mode is not useful if the text contains accents. You should use the default decomposition unless you really know what you're doing.

The strength property you choose depends on what your application is trying to accomplish. For example, when performing a text search you may allow a "weak" match, in which accents and differences in case (upper vs. lower) are ignored. This type of search employs the PRIMARY strength. If you are sorting a list of words, you might want to use the TERTIARY strength. In this mode the properties that must match are the base character, accent, and case.


Character Encodings

What is a character encoding?

A character encoding is a mapping between characters and code values.

What is Unicode?

In the Java programming language, char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium web site.

How do I convert data between Unicode and other character encodings?

The Converting Non-Unicode Text section of the The Java Tutorial explains how to peform the conversions within an application. To convert data files, use the native2ascii tool.

Which character encodings are supported when converting text to and from Unicode?

See the Supported Encodings web page.

How do I create my own character converters?

Version 1.3 of the Java 2 platform does not provide public interfaces that let application developers create their own character converters. There is a project underway that will define such public interfaces as part of the New I/O APIs. Licensees that create their own Java 2 runtime environments can use the internal interfaces in the sun.io package to create their own character converters.

What is the default encoding?

The default encoding is selected by the Java runtime based on the host operating system and its locale. For example, in the US locale on Windows, Cp1252 is used. In the Simplified Chinese locale on Solaris, either EUC_CN or GBK can be the default encoding, depending on the selection made when logging into Solaris.

The default encoding is significant because the Java programming language uses Unicode to represent characters, but the file system of the host operating system usually uses some other encoding. The default encoding has to match the encoding used by the host operating system to ensure correct interaction.

What is the UTF-8 encoding?

UTF-8 stands for Universal Transformation Format, 8-bit encoding form. It is a transmission format for Unicode that is suitable for use with many network protocols and UNIX file systems.

Are the Cp1252 and ISO8859_1 encodings identical?

No. Cp1252 contains some additional characters in the range from 0x80 to 0x9F. See the Microsoft documentation for more information.


Text Input

What is the Input Method Framework?

The input method framework enables all text editing components to receive Japanese, Chinese, or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.

What does it mean to switch input methods?

A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a particular language or the input method that provides the fastest input.

Can an input method be selected and activated programmatically?

An application can request an input method that supports a specific locale using the InputContext.selectInputMethod method, but it cannot select a specific input method - that selection is up to the user.

An application can activate an input method using the InputContext.setCompositionEnabled method.

Do the AWT and Swing (JFC) text components work with input methods?

See the Input Methods section of the JDK Software Internationalization Overview.


Text Rendering

What choices does an application have in selecting fonts?

An application can select fonts in three different ways:

What are the advantages and disadvantages of these three approaches?

Here's a brief summary:

Why doesn't my application display any Chinese, Japanese, or Korean characters even though I have fonts for these languages installed?

The answer depends on how your application selects fonts - see above.

What is a font.properties file?

The font.properties files are used in Sun's Java 2 Runtime Environments to map logical font names to physical fonts. There are several files to support different mappings depending on host operating system version and locale. The files are located in the lib directory within the J2RE installation.

Note that font.properties files are implementation dependent. Not all implementations of the Java 2 platform use them, and the format and content vary between different runtime environments as well as between releases.

How do I add a physical font to the mapping of a logical font?

Since the mapping from logical fonts to physical fonts is implementation dependent, the answer varies. For Sun's Java 2 Runtime Environments, you need to create or modify a font.properties file - see the web page Editing the font.properties Files. Note however that this is a modification of the J2RE, and Sun does not support modified J2REs. For other implementations, see their respective documentation.

Why can I see some characters in Swing components, but not in peered AWT components?

Swing user interface components use a different mechanism to render text than peered AWT components. The Swing components use the Graphics.drawString method, typically specifying a logical font name. The logical font name is then mapped to a set of physical fonts to cover a large range of characters. AWT components on the other hand are implemented using host operating system components. These host operating system components often do not support Unicode, so the text gets converted to some other character encoding, depending on the host operating system and locale. These encodings often cover a smaller range of characters than the physical fonts used to implement logical font names. For example, on a Japanese Windows system, many European accented characters are mapped to the Arial font for Swing components, but get lost when converting the text to the Shift-JIS encoding for peered AWT components.

Why can't my application display all Unicode characters even though I have a Unicode font installed?

As in the Chinese/Japanese/Korean case above, this may be because text is not rendered using the Unicode font at all or only for some characters. If your application selects the Unicode font using its physical font name, and it still cannot render all characters, it could be that the Unicode font doesn't in fact cover the entire Unicode character set - sometimes a font is called a Unicode font if it just provides the tables that support the Unicode character encoding.

What font types do Sun's Java 2 Runtime Environments support?

Sun's Java 2 Runtime Environment for Windows supports TrueType and Type1 fonts. Sun's Java 2 Runtime Environment for Solaris supports outline fonts that can be handled by an X11 server, such as F3, Type1, and TrueType.

Is it possible to display more than one language in Sun's Java 2 Runtime Environments?

The short answer is yes. The long answer needs to look at which languages you want to display at the same time, and how your application selects fonts.

Can Sun's Java 2 Runtime Environment render text in Thai, Lao, Burmese, or any of the Indic scripts?

No, the font rendering system in version 1.3.1 of Sun's Java 2 Runtime Environments cannot handle the complex layout rules of these scripts. There's work underway to add support for Thai and Hindi in a future release. Also, there may be other Java 2 runtime environments that do support these scripts.

Why do I see question marks and illegible text in Traditional Chinese?

Sun's Java 2 SDK and Runtime Environment version 1.3.0 for Windows had two serious bugs affecting Traditional Chinese:

These bugs have been fixed since version 1.3.0_01.


Component Orientation

Which user interface components implement component orientation in Sun's Java 2 Runtime Environments?

See the Supported Locales document.


Miscellaneous

Do Sun's Java 2 Runtime Environments support the Euro currency?

Yes, Sun's Java 2 Runtime Environments let you type the Euro character, render it, convert it from and to numerous character encodings, and use it when formatting numeric values as currency. For text input and rendering, you need the appropriate support in the host operating system - see the documentation for Windows and Solaris (general information and patches). For formatting, you just need to request a locale with the "EURO" variant.


Copyright © 1996-2001 Sun Microsystems, Inc. All Rights Reserved.

Please send comments to: java-intl@java.sun.com

Sun
Java Software