DOS codepages (and their history)
DOS has supported numerous character sets, also called codepages. This article documents official MS-DOS and PC DOS codepages, Windows "OEM" codepages and some rare Arabic codepages.
From their very beginning, PCs have run with a 8-bit character set with 256 characters. Compared to 7-bit ASCII, which only offered 95 visible characters, this was a great advantage. PC software could easily support many Latin-based languages. As PCs became widely popular it turned out that 256 characters were not enough. More national characters were needed. Codepages were introduced to DOS in 1987 to meet this need.
This article is intended for computing experts who already know what character sets and codepages are. We look into MS-DOS and PC DOS codepages, and also the codepages found in Windows command line mode ("DOS box"). We attempt to differentiate between "real" DOS codepages and "DOS-like" codepages. We compare codepages to other codepages. We point out differences in documented and actual behavior. We also document old Arabic codepages, for which no other online documentation existed as of 2014.
Contents
- Standalone DOS codepages
- Additional OEM codepages
- Euro codepages (IBM)
- DOS codepage charts
- Euro version of codepage 850
- Arabic codepages
Standalone DOS codepages
The DOS operating system originally supported just one character set, or code page. That was the 437 codepage, also known as PC-ASCII. Later on, several alternatives were released as DOS went into widespread international use. That happened by the release of DOS 3.3 in 1987.
The following table summarizes the code pages officially supported by standalone versions of PC DOS (IBM) and MS-DOS (Microsoft). The information is primarily based on MS-DOS versions from 3.3 to 6.22 and PC DOS versions 7 and 2000.
Page | Name | DOS | IBM | Notes | |
---|---|---|---|---|---|
437 | United States ("PC-ASCII") | 1.0 | 1981 | 1984 | |
710 | Arabic (Transparent Arabic) | 3.3 | 1988 | * Arabic MS-DOS 3.3 | |
720 | Arabic (Transparent ASMO) | 6.22 | 1994 | 1997 | * |
737 | Greek II | 6.2 | 1993 | 1996 | First a hardware solution (MDA & Hercules graphics cards) |
850 | Multilingual (Latin I) | 3.3 | 1987 | 1986 | Original version |
850 (€) | Multilingual (Latin I), euro version | 2000 | 1998 | PC DOS 2000 version with euro symbol | |
852 | Slavic/Eastern European (Latin II) | 5 | 1991 | 1993 | |
855 | Cyrillic I | 6.22 | 1994 | 1988 | |
857 | Turkish | 6.2 | 1993 | 1989 | |
860 | Portuguese | 3.3 | 1987 | 1986 | |
861 | Icelandic | 6.2 | 1993 | 1986 | |
862 | Hebrew | 4 | 1988 | 1986 | * |
863 | Canadian-French | 3.3 | 1987 | 1986 | |
864 | Arabic | 4 | 1988 | 1986 | * |
865 | Nordic | 3.3 | 1987 | 1986 | |
866 | Russian (Cyrillic II) | 4.01 | 1990 | 1991 | Russian MS-DOS 4.01. General support in MS-DOS 6.22 (1994). |
869 | Greek | 6.2 | 1993 | 1987 | |
912 | ISO 8859-2 (Latin) | 7 | 1995 | 1987 | PC DOS 7 |
915 | ISO 8859-5 (Cyrillic) | 7 | 1995 | 1988 | PC DOS 7 |
874 | Thai | 6.22 | 1994 | 1992 | † probably in Windows 3.x Thai edition |
932 | Japanese | 4 | 1988 | *† | |
934 | Korean | 4 | 1988 | * | |
936 | Chinese (Simplified) | 4 | 1988 | *† | |
938 | Taiwan | 4 | 1988 | *† | |
949 | Korean | 5 | 1991 | *† |
Additional OEM codepages
Codepages supported by the standalone versions of DOS are not the only DOS-like codepages. The following Microsoft-documented "OEM" codepages do not appear in any of the standalone PC DOS or MS-DOS versions reviewed (up to PC DOS 2000 from 1998). Most of them seem to be supported by the command prompt under the Windows operating system.
OEM | First documented | Notes | |
---|---|---|---|
851 | Greek 1 | 1986 IBM | listed as "Obsolete" by IBM, "MS-DOS" by Microsoft |
Windows OEM | |||
708 | Arabic (ASMO 708) | 1989–1993 Microsoft | * "MS-DOS Arabic ASMO", supported in Arabic Windows |
709 | Arabic (ASMO 449+, BCON V4) | 1989–1993 Microsoft | * supported in Arabic Windows |
711 | Arabic (Nafitha Enhanced) | 1989–1993 Microsoft | * supported in Arabic Windows |
775 | Baltic Rim | 1995 Microsoft | * apparently in Windows 95 (Pan European) and later |
858 | Multilingual Latin I + euro | 1998 Microsoft | * |
Windows ANSI and OEM | |||
950 | Traditional Chinese Big5 | † Supported by Windows 3.1 and later | |
1258 | Vietnam | 1996 Microsoft | † Supported by Windows 95 and later |
Euro codepages (IBM)
The European Union introduced the euro currency symbol (€), which had consequences to codepages in 1998. Based on the existing DOS codepages, several updated and new codepages were defined. Either the new euro symbol was added to an unused slot in an existing page, or a new page was created where an old symbol was replaced by €.
The following table is based on documentation by IBM. Microsoft documentation doesn't mention any of these changes except for codepages 858 and 874.
Original codepage | Euro codepage | Notes | |
---|---|---|---|
437 | United States | ||
737 | Greek II | ||
850 | Multilingual (Latin I) | ⇢ 850 euro version | dotless ı ⇒ € |
⇒ 858 Multilingual Latin I + euro | |||
852 | Slavic/Eastern European | ⇢ 852 | unused position AA ⇒ € |
855 | Cyrillic I | ⇒ 872 Cyrillic with euro | ¤ ⇒ € |
857 | Turkish | ⇢ 857 | unused position D5 ⇒ € |
860 | Portuguese | ||
861 | Icelandic | ||
862 | Hebrew | ⇒ 867 Israel | € + several other changes |
863 | Canadian-French | ||
864 | Arabic | ⇢ 864 | unused position A7 ⇒ € |
865 | Nordic | ||
866 | Russian (Cyrillic II) | ⇒ 808 Cyrillic, Russian with euro | ¤ ⇒ € |
869 | Greek | ⇢ 869 | unused position 87 ⇒ € |
874 | Thai (Microsoft) | ⇢ 874 Thai (Microsoft) | unused position 80 ⇒ € |
874 | Thai (IBM) | ⇒1161 Thai (IBM) | position DE ⇒ € |
912 | ISO 8859-2 (Latin) | ||
915 | ISO 8859-5 (Cyrillic) |
It remains unclear which systems actually supported any of these euro updated codepages.
DOS codepage charts
The following codepage charts list all official Latin-based DOS codepages, and also Greek, Hebrew and Cyrillic. Arabic codepages appear in their own chapter. 874 Thai and 1258 Vietnamese are presented among other Windows codepages.
Asian double-byte codepages are missing due to technical reasons. Unless otherwise mentioned, the codepages are screenshots that have been captured in MS-DOS.
Common area (00-7F)
Characters 00-7F (hex) in the following chart are common to all DOS codepages listed here.
The chart is similar to ASCII except for control characters. Codepoints 00-1F and 7F (hex), marked with pink, have a dual nature. They can be used both as invisible ASCII control characters and displayed on the screen. Because of this, DOS codepages are downwards compatible with ASCII.
Exception: Codepage 864 Arabic is different from all other DOS codepages. It supports different symbols in the control character area. We are not going any further into the differences in this article.
Codepage 437 United States
Alternative names: Personal Computer, MS-DOS United States, MS-DOS Latin US, OEM United States, DOS Extended ASCII (United States), PC-ASCII
Codepage 437 is the original IBM "PC-ASCII" codepage. It's the basis for all other codepages. Differences exist in the 80-FF (hex) range.
In the charts that follow, differences to 437 are highlighted in green. Click the images to compare to 437.
Codepage 737 Greek II
Alternative names: 437 G, MS-DOS Greek, OEM Greek
This codepage has formerly been known as 437G.
Codepage 775 Baltic Rim
Alternative names: MS-DOS Baltic Rim, OEM Baltic
Codepage 775 is not a DOS codepage in the strictest sense. It never appeared in standalone MS-DOS. The page covers Estonian, Lithuanian and Latvian (and even Polish). It conforms to Lithuanian Standard LST 1590-1.
Codepage 850 Multilingual (Latin I)
Alternative names: Personal Computer - Multilingual Page, MS-DOS Multilingual (Latin 1), OEM Multilingual Latin 1, Western European
This codepage covered most of Western Europe, Latin America and also Canada.
A euro version of 850 exists. It was formed by changing dotless ı to € (hex D5). It is confusingly known with two different codepage numbers, namely 850 and 858.
Codepage 852 Slavic/Eastern European (Latin II)
Alternative names: Latin 2 - Personal Computer, MS-DOS Slavic (Latin 2), OEM Latin 2, Central European
According to MS-DOS 6.22, this codepage covered Albania, Bosnia/Herzegovina, Croatia, Czech Republic, Hungary, Poland, Romania, (Russia), Slovakia, Slovenia and Yugoslavia (Latin).
According to IBM, a euro version exists where € was added to unused position hex AA.
Codepage 855 Cyrillic I
Alternative names: Cyrillic - Personal Computer, IBM Cyrillic, MS-DOS Cyrillic, OEM Cyrillic (primarily Russian)
According to MS-DOS 6.22, this codepage covered Yugoslavia (Serbia/Montenegro, Macedonia), Bulgaria and Russia.
According to IBM, a euro version (codepage 872) exists where € appears in place of ¤ (hex CF).
Codepage 857 Turkish
Alternative names: Latin #5, Turkey - Personal Computer, IBM Turkish, MS-DOS Turkish, OEM Turkish
According to IBM, a euro version exists where € was added to unused position hex D5.
Codepage 860 Portuguese
Alternative names: Portugal - Personal Computer, MS-DOS Portuguese, OEM Portuguese
According to MS-DOS 6.22, this codepage was for Portugal (but not Brazil).
Codepage 861 Icelandic
Alternative names: Iceland - Personal Computer, MS-DOS Icelandic, OEM Icelandic
Codepage 862 Hebrew
Alternative names: Israel - Personal Computer, MS-DOS Hebrew, OEM Hebrew
Codepage 863 Canadian-French
Alternative names: Canadian French - Personal Computer, MS-DOS Canadian French, MS-DOS French Canada, OEM French Canadian
Codepage 865 Nordic
Alternative names: Nordic - Personal Computer, MS-DOS Nordic, OEM Nordic
This codepage was for Denmark and Norway.
Codepage 866 Russian (Cyrillic II)
Alternative names: PC Data, Cyrillic, Russian; MS-DOS Cyrillic CIS 1; MS-DOS Russian; OEM Russian
This codepage was developed for the Russian language version of MS-DOS 4.01. It doesn't cover all Cyrillic languages such as Ukrainian.
According to IBM, a euro version (codepage 808) exists where € appears in place of ¤ (hex FD).
Codepage 869 Greek
Alternative names: Greece - Personal Computer, IBM Modern Greek, MS-DOS Greek 2, OEM Modern Greek
According to IBM, a euro version exists where € was added to unused position hex 87.
Codepage 912 ISO 8859-2 (Latin)
Introduced by IBM to PC DOS 7 for countries of Eastern Europe. Not a codepage in MS-DOS or Windows. Positions hex 80–9F are empty as they are reserved for invisible control characters in the ISO 8859-2 standard, even though it doesn't appear DOS actually supported these control characters.
Codepage 915 ISO 8859-5 (Cyrillic)
Introduced by IBM to PC DOS 7 for Cyrillic alphabets of (the former) Yugoslavia. Not a codepage in MS-DOS or Windows. Positions hex 80–9F are empty as they are reserved for invisible control characters in the ISO 8859-5 standard, even though it doesn't appear DOS actually supported these control characters.
Euro version of codepage 850
Codepage 850 also exists as a euro version. It has the dotless ı changed to € in position hex D5. The same codepage appears with two different codepage numbers. The usual number is 858, but PC DOS 2000 calls it 850 instead.
Codepage 850 Multilingual (Latin I), euro version
This is how codepage 850 looks in PC DOS 2000, which was released in 1998. Confusingly, the codepage number is still 850 even though it's different from the original version of 850.
Codepage 858 Multilingual Latin I + euro
Alternative names: Personal Computer - Multilingual with euro, OEM Multilingual Latin 1 + Euro symbol, Multilingual Latin I + Euro
Codepage 858 is exactly the same as the euro version of 850. It is supported by Windows command line mode.
In order to use codepage 858 in Windows 10, one needs to select a TrueType font for the command line box and issue the command chcp 858
. In Windows 2000, one needs to first install 858 support via the Control Panel, then do the other steps mentioned. Raster fonts don't appear to have the € symbol.
Arabic codepages
Arabic codepages are inadequately documented in online sources. The codepages are not well supported by English versions of either MS-DOS or Windows. Documentation is lacking or inaccurate. As per our knowledge, codepages 709, 710 and 711 have not been documented online prior to this article in 2014.
The following Arabic codepages have been captured in Arabic Windows 98 Second Edition (Arabic command line). Arabic command line means there is a special built-in utility in Windows that adds Arabic script support, such as right-to-left writing and joining of Arabic letters.
Online documentation, published by Microsoft, exists for codepages 708, 720 and 864. A multitude of characters appear to differ in Windows 98, however. Differences between Windows 98 and Microsoft documentation have been highlighted in pink. These differences may be due to Windows 98.
Codepage 708 Arabic (ASMO 708)
Codepage 708 in Arabic Windows 98 SE differs from Microsoft documentation of 708 from 1995. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.
Codepage 708 is downwards compatible with standards ASMO 708 (1988) and ISO 8859-6 (Arabic). Codepage 708 adds characters to positions unused in the standards (for comparison see the ASMO 708 set in ISO-IR 127 and ECMA-114). A reference to codepage 708 appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 709 Arabic (ASMO 449+, BCON V4)
Codepage 709 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 709 appears to have been built on the ASMO 449 standard. ASMO 449 is a 7-bit ASCII-like encoding (see ISO-IR 089) that has Arabic letters in place of letters A-Z, and also some symbols. Codepage 709 has lifted ASMO 449 characters to the area 80-FF (hex) and added some extra characters to unused positions. The tilde (~) at FE (hex) is incompatible with ASMO 449, though. See also: ASMO 449+.
Codepage 709 is quite similar to 708 what comes to Arabic letters and Arabic symbols, but not what comes to Latin letters, ASCII symbols and digits.
Codepage 710 Transparent Arabic
Codepage 710 was introduced in Arabic MS-DOS 3.3. It is inadequately documented in online sources.
Codepage 711 Arabic (Nafitha Enhanced)
Codepage 711 is inadequately documented in online sources. A reference to it appears in the RTF file format specification, where it was added during 1989–1993. Nafitha was a program that added Arabic support to DOS.
Codepages 710 and 711 are somewhat similar but not compatible with each other.
Codepage 720 Arabic (Transparent ASMO)
Alternative name: MS-DOS Arabic (Transparent ASMO)
Codepage 720 in Arabic Windows 98 SE differs from Microsoft documentation of 720. The differences are in pink. Line drawing characters with double lines appear as single lines. Several other characters look different as well.
Codepage 720 was added to MS-DOS 6.22 (1994). A reference to it appears in the RTF file format specification, where it was added during 1989–1993.
Codepage 864 Arabic (MS-DOS)
Alternative names: Arabic - Personal Computer, OEM Arabic
Codepage 864 in Arabic Windows 98 SE differs from Microsoft documentation of 864 from 1996. The differences are in pink. The 1996 documented version, which is a conversion table, supports more characters than the ones actually implemented in Windows 98.
According to MS-DOS 6.22, this was the only Arabic codepage available.
Article updated in August 2021: Added codepages 912, 915 and euro version of 850.
DOS codepages (and their history)
URN:NBN:fi-fe201401011002