Table E-1lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet's PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType() method before the servlet retrieves its PrintWriter. For example:
res.setContentType("text/html; charset=Shift_JIS"); // A Japanese charset PrintWriter out = res.getWriter(); // Writes Shift_JIS Japanese
Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.
Language |
Language Code |
Suggested Charsets |
---|---|---|
Albanian |
sq |
ISO-8859-2 |
Arabic |
ar |
ISO-8859-6 |
Bulgarian |
bg |
ISO-8859-5 |
Byelorussian |
be |
ISO-8859-5 |
Catalan (Spanish) |
ca |
ISO-8859-1 |
Chinese (Simplified/Mainland) |
zh |
GB2312 |
Chinese (Traditional/Taiwan) |
zh (country TW) |
Big5 |
Croatian |
hr |
ISO-8859-2 |
Czech |
cs |
ISO-8859-2 |
Danish |
da |
ISO-8859-1 |
Dutch |
nl |
ISO-8859-1 |
English |
en |
ISO-8859-1 |
Estonian |
et |
ISO-8859-1 |
Finnish |
fi |
ISO-8859-1 |
French |
fr |
ISO-8859-1 |
German |
de |
ISO-8859-1 |
Greek |
el |
ISO-8859-7 |
Hebrew |
he (formerly iw) |
ISO-8859-8 |
Hungarian |
hu |
ISO-8859-2 |
Icelandic |
is |
ISO-8859-1 |
Italian |
it |
ISO-8859-1 |
Japanese |
ja |
Shift_JIS, ISO-2022-JP, EUC-JP[1] |
Korean |
ko |
EUC-KR[2] |
Latvian, Lettish |
lv |
ISO-8859-2 |
Lithuanian |
lt |
ISO-8859-2 |
Macedonian |
mk |
ISO-8859-5 |
Norwegian |
no |
ISO-8859-1 |
Polish |
pl |
ISO-8859-2 |
Portuguese |
pt |
ISO-8859-1 |
Romanian |
ro |
ISO-8859-2 |
Russian |
ru |
ISO-8859-5, KOI8-R |
Serbian |
sr |
ISO-8859-5, KOI8-R |
Serbo-Croatian |
sh |
ISO-8859-5, ISO-8859-2, KOI8-R |
Slovak |
sk |
ISO-8859-2 |
Slovenian |
sl |
ISO-8859-2 |
Spanish |
es |
ISO-8859-1 |
Swedish |
sv |
ISO-8859-1 |
Turkish |
tr |
ISO-8859-9 |
Ukranian |
uk |
[1] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-JP character set by the name EUCJIS, so for portability you can set the character set to EUC-JP and manually construct an EUCJIS PrintWriter.
[2] First supported in JDK 1.1.6. Earlier versions of the JDK know the EUC-KR character set by the name KSC_5601, so for portability you can set the character set to EUC-KR and manually construct a KSC_5601 PrintWriter.
Copyright © 2001 O'Reilly & Associates. All rights reserved.