ASCII control characters and codes

The first 32 codes (numbers 0–31 decimal) are kept by ASCII for control characters. Initially, the codes were designed not to represent some printable information, but to control devices (such as printers) that use ASCII, or to give meta-information about data streams, for example those stored on magnetic tape.

ASCII Table (Control chars)

000 0000 0000 0 00 NUL - Null character. Unicode symbol:
MD5: cfcd208495d565ef66e7dff9f98764da
SHA1: b6589fc6ab0dc82cf12099d1c2d40ab994e8410c
001 0000 0001 1 01 SOH - Start of Heading. Unicode symbol:
MD5: c4ca4238a0b923820dcc509a6f75849b
SHA1: 356a192b7913b04c54574d18c28d46e6395428ab
002 0000 0010 2 02 STX - Start of text. Unicode symbol:
MD5: c81e728d9d4c2f636f067f89cc14862c
SHA1: da4b9237bacccdf19c0760cab7aec4a8359010b0
003 0000 0011 3 03 ETX - End of Text. Unicode symbol:
MD5: eccbc87e4b5ce2fe28308fd9f2a7baf3
SHA1: 77de68daecd823babbb58edb1c8e14d7106e83bb
004 0000 0100 4 04 EOT - End of Transmission. Unicode symbol:
MD5: a87ff679a2f3e71d9181a67b7542122c
SHA1: 1b6453892473a467d07372d45eb05abc2031647a
005 0000 0101 5 05 ENQ - Enquiry. Unicode symbol:
MD5: e4da3b7fbbce2345d7772b0674a318d5
SHA1: ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
006 0000 0110 6 06 ACK - Acknowledge. Unicode symbol:
MD5: 1679091c5a880faf6fb5e6087eb1b2dc
SHA1: c1dfd96eea8cc2b62785275bca38ac261256e278
007 0000 0111 7 07 BEL - Bell, Alert. Unicode symbol:
MD5: 8f14e45fceea167a5a36dedd4bea2543
SHA1: 902ba3cda1883801594b6e1b452790cc53948fda
008 0000 1000 10 08 BS - Backspace. Unicode symbol:
MD5: c9f0f895fb98ab9159f51fd0297e236d
SHA1: fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f
009 0000 1001 11 09 HT - Character Tabulation, Horizontal Tabulation. Unicode symbol:
MD5: 45c48cce2e2d7fbdea1afc51c7c6ad26
SHA1: 0ade7c2cf97f75d009975f4d720d1fa6c19f4897
010 0000 1010 12 0A LF - Line feed. Unicode symbol:
MD5: d3d9446802a44259755d38e6d163e820
SHA1: b1d5781111d84f7b3fe45a0852e59758cd7a87e5
011 0000 1011 13 0B VT - Line Tabulation, Vertical Tabulation. Unicode symbol:
MD5: 6512bd43d9caa6e02c990b0a82652dca
SHA1: 17ba0791499db908433b80f37c5fbc89b870084b
012 0000 1100 14 0C FF - Form Feed. Unicode symbol:
MD5: c20ad4d76fe97759aa27a0c99bff6710
SHA1: 7b52009b64fd0a2a49e6d8a939753077792b0554
013 0000 1101 15 0D CR - Carriage Return. Unicode symbol:
MD5: c51ce410c124a10e0db5e4b97fc2af39
SHA1: bd307a3ec329e10a2cff8fb87480823da114f8f4
014 0000 1110 16 0E SO - Shift Out. Unicode symbol:
MD5: aab3238922bcc25a6f606eb525ffdc56
SHA1: fa35e192121eabf3dabf9f5ea6abdbcbc107ac3b
015 0000 1111 17 0F SI - Shift In. Unicode symbol:
MD5: 9bf31c7ff062936a96d3c8bd1f8f2ff3
SHA1: f1abd670358e036c31296e66b3b66c382ac00812
016 0001 0000 20 10 DLE - Data Link Escape. Unicode symbol:
MD5: c74d97b01eae257e44aa9d5bade97baf
SHA1: 1574bddb75c78a6fd2251d61e2993b5146201319
017 0001 0001 21 11 DC1 - Device Control One (XON). Unicode symbol:
MD5: 70efdf2ec9b086079795c442636b55fb
SHA1: 0716d9708d321ffb6a00818614779e779925365c
018 0001 0010 22 12 DC2 - Device Control Two. Unicode symbol:
MD5: 6f4922f45568161a8cdf4ad2299f6d23
SHA1: 9e6a55b6b4563e652a23be9d623ca5055c356940
019 0001 0011 23 13 DC3 - Device Control Three (XOFF). Unicode symbol:
MD5: 1f0e3dad99908345f7439f8ffabdffc4
SHA1: b3f0c7f6bb763af1be91d9e74eabfeb199dc1f1f
020 0001 0100 24 14 DC4 - Device Control Four. Unicode symbol:
MD5: 98f13708210194c475687be6106a3b84
SHA1: 91032ad7bbcb6cf72875e8e8207dcfba80173f7c
021 0001 0101 25 15 NAK - Negative Acknowledge. Unicode symbol:
MD5: 3c59dc048e8850243be8079a5c74d079
SHA1: 472b07b9fcf2c2451e8781e944bf5f77cd8457c8
022 0001 0110 26 16 SYN - Synchronous Idle. Unicode symbol:
MD5: b6d767d2f8ed5d21a44b0e5886680cb9
SHA1: 12c6fc06c99a462375eeb3f43dfd832b08ca9e17
023 0001 0111 27 17 ETB - End of Transmission Block. Unicode symbol:
MD5: 37693cfc748049e45d87b8c7d8b9aacd
SHA1: d435a6cdd786300dff204ee7c2ef942d3e9034e2
024 0001 1000 30 18 CAN - Cancel. Unicode symbol:
MD5: 1ff1de774005f8da13f42943881c655f
SHA1: 4d134bc072212ace2df385dae143139da74ec0ef
025 0001 1001 31 19 EM - End of medium. Unicode symbol:
MD5: 8e296a067a37563370ded05f5a3bf3ec
SHA1: f6e1126cedebf23e1463aee73f9df08783640400
026 0001 1010 32 1A SUB - Substitute. Unicode symbol:
MD5: 4e732ced3463d06de0ca9a15b6153677
SHA1: 887309d048beef83ad3eabf2a79a64a389ab1c9f
027 0001 1011 33 1B ESC - Escape. Unicode symbol:
MD5: 02e74f10e0327ad868d138f2b4fdd6f0
SHA1: bc33ea4e26e5e1af1408321416956113a4658763
028 0001 1100 34 1C FS - File Separator. Unicode symbol:
MD5: 33e75ff09dd601bbe69f351039152189
SHA1: 0a57cb53ba59c46fc4b692527a38a87c78d84028
029 0001 1101 35 1D GS - Group Separator. Unicode symbol:
MD5: 6ea9ab1baa0efb9e19094440c317e21b
SHA1: 7719a1c782a1ba91c031a682a0a2f8658209adbf
030 0001 1110 36 1E RS - Record Separator. Unicode symbol:
MD5: 34173cb38f07f89ddbebc2ac9128303f
SHA1: 22d200f8670dbdb3e253a90eee5098477c95c23d
031 0001 1111 37 1F US - Unit Separator. Unicode symbol:
MD5: c16a5320fa475530d9583c34fd356ef5
SHA1: 632667547e7cd3e0466547863e1207a8c0c0c549
127 0111 1111 177 7F DEL - Delete character. Unicode symbol:
MD5: ec5decca5ed3d6b8079e2e7e7bacc9f2
SHA1: 008451a05e1e7aa32c75119df950d405265e0904

Ok, Let's see an example. The character 10 represents the "line feed" function (which makes a printer to move forward its paper). The character 8 represents "backspace". RFC 2822 may be referred to control characters that do not include carriage return, line feed or white space as non-whitespace control characters. The only exception make the control characters that prescribe elementary line-oriented formatting. ASCII doesn't set any mechanism, the purpose of which would be a description of text structure or appearance within a document. Some other schemes, such as markup languages, address page and document layout and formatting.

Only brief descriptive phrases were used for each of the control character of the original ASCII. This uncertainty wasn't always accidental. Sometimes it was purposely created, for example in cases, where a character would be used a little bit on a terminal link than on a data stream. However, sometimes it was clearly accidental, for example in case of "delete" meaning.

The Teletype Model 33 ASR can be surely called as the most significant single device in the interpretation sphere of these characters. The Teletype Model 33 ASR was a printing terminal with an available paper tape reader/punch option. Long - term program storage used paper tape a lot until the 1980s. It was pretty cheaper and somehow less fragile than magnetic tape. Especially, the Teletype Model 33 machine assignments for codes 17 (DC1, Control-Q, also known as XON), 19 (DC3, Control-S, also known as XOFF), and char 127 (Delete), that in fact became standards. The Model 33 had one more reason to be popular at that time. It was taking the description of Control-G (code 7, BEL, meaning audible alarm the operator). In fact, it was a kind of the device with the real bell which it rang when received a BEL character. Because the keytop for the O key showed a left-arrow symbol as well (from ASCII-1963, which had this exactly character instead of underscore), an incompatible use of code 15 (Control-O, Shift In) interpreted as "delete previous character" was also borrowed by many timesharing systems of these times. However, some time later it became ignored by them.

When a Control-S (XOFF, an abbreviation for transmit off) was received by Teletype 33 ASR equipped with the automatic paper tape reader, it stopped the tape reader; receiving Control-Q (XON, "transmit on") resumed the tape reader. Some computer operating systems of that time have borrowed such technique. It was called as the "handshaking" signal that warned a sender to cancel the transmission because of impending overflow; currently it can be found in lots of systems as a manual output control technique. Some systems kept the initial function of the Control-S intact but they replaced Control-Q by a second Control-S to resume output. The 33 ASR also could be set to use Control-R (DC2) and Control-T (DC4) to start and stop the punch of the tape; on some devices that had such a function, the appropriate control character lettering on the keycap above the letter was TAPE and TAPE accordingly.

The Teletype didn't have a function that would let move the head backwards. Taking this into consideration, it did not put an additional key on the keyboard in order to send a BS (backspace). There was another thing for this: a key called "rubout" that sent code 127 (DEL). This key was created in order to correct the mistakes in a hand-typed paper tape: the operator just needed to push a button on the tape punch to back it up. Then the operator typed the rubout, and it in its turn punched all the gaps left after the punch and replaced the mistake with a character that was initially supposed to be ignored. The Digital Equipment Corporation used Teletypes for not so expensive computers. This way systems had to use the available key and then the DEL code in order to erase the previous character. This was the reason, why DEC video terminals (by default) sent the DEL code for the key marked "Backspace", while the key marked "Delete" sent an escape sequence, while lots of other terminals sent BS for the Backspace key. Just one code could be used by the Unix terminal driver in order to back up. The back up function here could be adjusted either to BS or DEL, but not to the both of them. The result of the use of the both keys would be a long exhausting period of irritation, where you had to correct the mistake taking into consideration the terminal you were using (modern shells using readline, so they respectively understand the both codes). The supposition that no key sent a BS caused Control+H to be used for other purposes, such as a "help" command in Emacs.

Lots of new meaning were given to the the control codes, that initially had completely different functions than the new ones. One of the brightest examples is the "escape" character (ESC, code 27). Initially its function was to allow sending other control characters as literals instead of calling to their meaning. The same meaning of "escape" character can be met in URL encodings, C language strings, and some other systems where particular characters have their stable meaning. However, time flies, new technologies appear, so this meaning has been co-opted. In plain words, it was internally changed. Currently ESC sent to the terminal usually means the start of a command sequence usually in the form of a so-called "ANSI escape code" (or, better to say, a "Control Sequence Introducer") from ECMA-48 (1972) and its continuers, beginning with ESC followed by a "[" (left-bracket) character. An ESC sent from the terminal is usually used as an out-of-band character. It's function is to terminate an operation, as in the TECO and vi text editors. Usually ESC makes a request in order to interrupt its current operation or to exit (terminate) completely in graphical user interface (GUI) and windowing systems.

In addition to their historical use, the incorporated uncertainly of lots of the control characters, created certain problems when transmitting "plain text" files between systems. We suppose, that the brightest example of it is the newline problem that different operating systems face. Teletype machines need a line of text to be completed with both "Carriage Return" (which moves the printhead to the beginning of the line) and "Line Feed" (which moved forward the paper one line without moving the printhead). The name "Carriage Return" has its own meaning. In connection to it we have to remember the fact that on a manual typewriter the carriage holding the paper moved while the position where the typebars struck the ribbon remained stationary. The whole carriage had to be pushed (taken back.) to the right in order to position the left margin of the paper for the next line.

The both characters were used by DEC operating systems (OS/8, RT-11, RSX-11, RSTS, TOPS-10, etc.) in order to mark the end of a line. It was needed for the console device (usually Teletype machines) to work. The convention was set up so well, that backward compatibility compelled to continue the convention. This happen at the time so-called "glass TTYs" (which were then called CRTs or terminals) came along. Gary Kildall found his inspiration to design CP/M in some command line interface conventions used in DEC's RT-11. Until the introduction of PC DOS in 1981, IBM wasn't influentially involved in this. There was a reason for that. The thing is that 1970s operating systems used EBCDIC, not ASCII. They were oriented in the direction of the punch-card input and line printer output. The concept of carriage return on it was absolutely pointless. IBM's PC DOS (also marketed as MS-DOS by Microsoft) was lucky to be freely based on CP/M, so thanks to this fact it inherited the convention. Windows inherited it from MS-DOS.

Some unwanted complexities and questions (for example, to how to interpret each character when encountered alone) arose due to the requiring two characters to mark the end of a line. In order to make matters plain text data streams simpler, including files, on Multics used line feed (LF) alone as a line terminator. Unix, Unix-like systems and Amiga systems have borrowed this convention from Multics. However, the original Macintosh OS, Apple DOS, and ProDOS used carriage return (CR) alone as a line terminator. It's worth mentioning, that since Apple replaced these operating systems with the Unix-based macOS operating system, the line feed (LF) is used by them currently as well. The Radio Shack TRS-80 also used a lone CR to terminate lines.

Computers that were assigned to the ARPANET included machines running operating systems, for example TOPS-10 and TENEX using CR-LF line endings, machines running operating systems, for example Multics using LF line endings, and machines running operating systems, for example OS/360 that represented lines as a character count followed by the characters of the line and that used EBCDIC rather than ASCII. An ASCII "Network Virtual Terminal" (NVT) was determined by the Telnet protocol. This way the connections between hosts with different line-ending conventions and character sets could be supported with the help of transferring a standard text format over the network. Telnet used ASCII along with CR-LF line endings. The software using other conventions would translate between the local conventions and the NVT. The File Transfer Protocol adopted the Telnet protocol. The use of the Network Virtual Terminal was also a part of it. It was intended for the use in transmitting commands and transferring data in the default ASCII mode. This make the realization of those protocols pretty difficult, as well as the realization of some other network protocols, like those used for E-mail and the World Wide Web, on systems not using the NVT's CR-LF line-ending convention.

The PDP-6 monitor, and its PDP-10 continuer TOPS-10, used Control-Z (SUB) as an end-of-file indication for input from a terminal. CP/M and some other operating systems tracked file length only in units of disk blocks. They used Control-Z to mark the end of the actual text in the file. In order to reach such goals, EOF, or end-of-file, was used colloquially and conventionally as a three-letter acronym for Control-Z instead of SUBstitute. There were a lot of reasons why the end-of-text code (ETX), also known as Control-C, was unsuitable. The using Z as the control code to end a file is analogous to it ending the alphabet and serves as a pretty convenient mnemonic aid. A historically common and still dominating convention uses the ETX code convention to interrupt and halt a program via an input data stream, usually from a keyboard.

In C library and Unix conventions, the null character is used to terminate text strings; such null-terminated strings can be known in abbreviation as ASCIZ or ASCIIZ, where here Z stands for "zero".

 2018 © Dmytro Koshovyi. Ukraine, Mykolayiv.