Israel Science and Technology Homepage

Language Character Sets

Official names and codes for character sets (including Unicode) that may be used in the Internet.

Note: This page relates to an old method of character encoding using a single byte per character. This encoding was used in pre-HTML5 browsers. Current browsers are compatible with HTML5 that employs 2 byte Unicode using UTF-8 encoding. In such browsers, the coding shown on this page should not be used.

The table below lists Microsoft Windows Codepages for Single Byte Character Sets (SBCS). Chinese, Japanese and Korean require a double byte character set that is not listed here.

In a single byte character set, there are 256 codes from 0 to 255. The first 32 characters are control characters that include characters for tab, carriage return, line feed etc. Character codes 65-90 and 97-122 uniformly represent upper and lower case Latin characters similar to ASCII code. Codes from 128 to 255 are used to represent special characters and the characters of the second language of the character set, such as Greek, Hebrew, Turkish etc.

In retrieving data from a MS database such as Access, using ASP, the following code can be used to specify a codepage for the data retrieved.

Example for codepage 1252: <% @ CodePage=1252 Language="VBScript" %>
Example for unicode: <% @ CodePage=65001 Language="VBScript" %>

NOTE: Place this code at the top of the page.

For Keyboard Layouts see: Windows Keyboard Layouts from Microsoft web site.

ISO codeWindows equivalentUsed for Languages
Windows-1258Vietnamese
Windows-874Thai
iso-8859-1Windows-1252Latin 1 languages: Afrikaans, Basque, Catalan, Danish, Dutch, English, Faroese, Finnish, French, Galician, German, Icelandic, Indonesian, Italian, Malay, Norwegian, Portuguese, Spanish, Swahili, Swedish
iso-8859-2Windows-1250Central Europe languages: Albanian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (Latin), Slovak, Slovenian
iso-8859-4Windows-1257Baltic languages: Estonian, Latvian, Lithuanian
iso-8859-5Windows-1251Cyrillic languages: Azeri, Belarusian, Bulgarian, Macedonian, Kazakh, Kyrgyz, Mongolian, Russian, Serbian, Tatar, Ukrainian, Uzbek
iso-8859-6Windows-1256Arabic, Farsi, Urdu
iso-8859-7Windows-1253Greek
iso-8859-8-iWindows-1255Hebrew
iso-8859-9Windows-1254Turkic languages: Azeri (Latin), Turkish, Uzbek (Latin)