ASCII control characters and codes

The first 32 codes (numbers 0–31 decimal) are kept by ASCII for control characters. Initially, the codes were designed not to represent some printable information, but to control devices (such as printers) that use ASCII, or to give meta-information about data streams, for example those stored on magnetic tape.

ASCII Table (Control chars)

000 0000 0000 0 00 NUL - Null character. Unicode symbol:
MD5: CFCD208495D565EF66E7DFF9F98764DA
SHA256: 5FECEB66FFC86F38D952786C6D696C79C2DBC239DD4E91B46729D73A27FB57E9
001 0000 0001 1 01 SOH - Start of Heading. Unicode symbol:
MD5: C4CA4238A0B923820DCC509A6F75849B
SHA256: 6B86B273FF34FCE19D6B804EFF5A3F5747ADA4EAA22F1D49C01E52DDB7875B4B
002 0000 0010 2 02 STX - Start of text. Unicode symbol:
MD5: C81E728D9D4C2F636F067F89CC14862C
SHA256: D4735E3A265E16EEE03F59718B9B5D03019C07D8B6C51F90DA3A666EEC13AB35
003 0000 0011 3 03 ETX - End of Text. Unicode symbol:
SHA256: 4E07408562BEDB8B60CE05C1DECFE3AD16B72230967DE01F640B7E4729B49FCE
004 0000 0100 4 04 EOT - End of Transmission. Unicode symbol:
MD5: A87FF679A2F3E71D9181A67B7542122C
SHA256: 4B227777D4DD1FC61C6F884F48641D02B4D121D3FD328CB08B5531FCACDABF8A
005 0000 0101 5 05 ENQ - Enquiry. Unicode symbol:
MD5: E4DA3B7FBBCE2345D7772B0674A318D5
SHA256: EF2D127DE37B942BAAD06145E54B0C619A1F22327B2EBBCFBEC78F5564AFE39D
006 0000 0110 6 06 ACK - Acknowledge. Unicode symbol:
MD5: 1679091C5A880FAF6FB5E6087EB1B2DC
SHA256: E7F6C011776E8DB7CD330B54174FD76F7D0216B612387A5FFCFB81E6F0919683
007 0000 0111 7 07 BEL - Bell, Alert. Unicode symbol:
MD5: 8F14E45FCEEA167A5A36DEDD4BEA2543
SHA256: 7902699BE42C8A8E46FBBB4501726517E86B22C56A189F7625A6DA49081B2451
008 0000 1000 10 08 BS - Backspace. Unicode symbol:
MD5: C9F0F895FB98AB9159F51FD0297E236D
SHA256: 2C624232CDD221771294DFBB310ACA000A0DF6AC8B66B696D90EF06FDEFB64A3
009 0000 1001 11 09 HT - Character Tabulation, Horizontal Tabulation. Unicode symbol:
SHA256: 19581E27DE7CED00FF1CE50B2047E7A567C76B1CBAEBABE5EF03F7C3017BB5B7
010 0000 1010 12 0A LF - Line feed. Unicode symbol:
MD5: D3D9446802A44259755D38E6D163E820
SHA256: 4A44DC15364204A80FE80E9039455CC1608281820FE2B24F1E5233ADE6AF1DD5
011 0000 1011 13 0B VT - Line Tabulation, Vertical Tabulation. Unicode symbol:
MD5: 6512BD43D9CAA6E02C990B0A82652DCA
SHA256: 4FC82B26AECB47D2868C4EFBE3581732A3E7CBCC6C2EFB32062C08170A05EEB8
012 0000 1100 14 0C FF - Form Feed. Unicode symbol:
MD5: C20AD4D76FE97759AA27A0C99BFF6710
SHA256: 6B51D431DF5D7F141CBECECCF79EDF3DD861C3B4069F0B11661A3EEFACBBA918
013 0000 1101 15 0D CR - Carriage Return. Unicode symbol:
MD5: C51CE410C124A10E0DB5E4B97FC2AF39
SHA256: 3FDBA35F04DC8C462986C992BCF875546257113072A909C162F7E470E581E278
014 0000 1110 16 0E SO - Shift Out. Unicode symbol:
MD5: AAB3238922BCC25A6F606EB525FFDC56
SHA256: 8527A891E224136950FF32CA212B45BC93F69FBB801C3B1EBEDAC52775F99E61
015 0000 1111 17 0F SI - Shift In. Unicode symbol:
MD5: 9BF31C7FF062936A96D3C8BD1F8F2FF3
SHA256: E629FA6598D732768F7C726B4B621285F9C3B85303900AA912017DB7617D8BDB
016 0001 0000 20 10 DLE - Data Link Escape. Unicode symbol:
SHA256: B17EF6D19C7A5B1EE83B907C595526DCB1EB06DB8227D650D5DDA0A9F4CE8CD9
017 0001 0001 21 11 DC1 - Device Control One (XON). Unicode symbol:
MD5: 70EFDF2EC9B086079795C442636B55FB
SHA256: 4523540F1504CD17100C4835E85B7EEFD49911580F8EFFF0599A8F283BE6B9E3
018 0001 0010 22 12 DC2 - Device Control Two. Unicode symbol:
MD5: 6F4922F45568161A8CDF4AD2299F6D23
SHA256: 4EC9599FC203D176A301536C2E091A19BC852759B255BD6818810A42C5FED14A
019 0001 0011 23 13 DC3 - Device Control Three (XOFF). Unicode symbol:
MD5: 1F0E3DAD99908345F7439F8FFABDFFC4
SHA256: 9400F1B21CB527D7FA3D3EABBA93557A18EBE7A2CA4E471CFE5E4C5B4CA7F767
020 0001 0100 24 14 DC4 - Device Control Four. Unicode symbol:
MD5: 98F13708210194C475687BE6106A3B84
SHA256: F5CA38F748A1D6EAF726B8A42FB575C3C71F1864A8143301782DE13DA2D9202B
021 0001 0101 25 15 NAK - Negative Acknowledge. Unicode symbol:
MD5: 3C59DC048E8850243BE8079A5C74D079
SHA256: 6F4B6612125FB3A0DAECD2799DFD6C9C299424FD920F9B308110A2C1FBD8F443
022 0001 0110 26 16 SYN - Synchronous Idle. Unicode symbol:
MD5: B6D767D2F8ED5D21A44B0E5886680CB9
SHA256: 785F3EC7EB32F30B90CD0FCF3657D388B5FF4297F2F9716FF66E9B69C05DDD09
023 0001 0111 27 17 ETB - End of Transmission Block. Unicode symbol:
MD5: 37693CFC748049E45D87B8C7D8B9AACD
SHA256: 535FA30D7E25DD8A49F1536779734EC8286108D115DA5045D77F3B4185D8F790
024 0001 1000 30 18 CAN - Cancel. Unicode symbol:
MD5: 1FF1DE774005F8DA13F42943881C655F
SHA256: C2356069E9D1E79CA924378153CFBBFB4D4416B1F99D41A2940BFDB66C5319DB
025 0001 1001 31 19 EM - End of medium. Unicode symbol:
MD5: 8E296A067A37563370DED05F5A3BF3EC
SHA256: B7A56873CD771F2C446D369B649430B65A756BA278FF97EC81BB6F55B2E73569
026 0001 1010 32 1A SUB - Substitute. Unicode symbol:
MD5: 4E732CED3463D06DE0CA9A15B6153677
SHA256: 5F9C4AB08CAC7457E9111A30E4664920607EA2C115A1433D7BE98E97E64244CA
027 0001 1011 33 1B ESC - Escape. Unicode symbol:
MD5: 02E74F10E0327AD868D138F2B4FDD6F0
SHA256: 670671CD97404156226E507973F2AB8330D3022CA96E0C93BDBDB320C41ADCAF
028 0001 1100 34 1C FS - File Separator. Unicode symbol:
MD5: 33E75FF09DD601BBE69F351039152189
SHA256: 59E19706D51D39F66711C2653CD7EB1291C94D9B55EB14BDA74CE4DC636D015A
029 0001 1101 35 1D GS - Group Separator. Unicode symbol:
MD5: 6EA9AB1BAA0EFB9E19094440C317E21B
SHA256: 35135AAA6CC23891B40CB3F378C53A17A1127210CE60E125CCF03EFCFDAEC458
030 0001 1110 36 1E RS - Record Separator. Unicode symbol:
MD5: 34173CB38F07F89DDBEBC2AC9128303F
SHA256: 624B60C58C9D8BFB6FF1886C2FD605D2ADEB6EA4DA576068201B6C6958CE93F4
031 0001 1111 37 1F US - Unit Separator. Unicode symbol:
MD5: C16A5320FA475530D9583C34FD356EF5
127 0111 1111 177 7F DEL - Delete character. Unicode symbol:
SHA256: 922C7954216CCFE7A61DEF609305CE1DC7C67E225F873F256D30D7A8EE4F404C

Ok, Let's see an example. The character 10 represents the "line feed" function (which makes a printer to move forward its paper). The character 8 represents "backspace". RFC 2822 may be referred to control characters that do not include carriage return, line feed or white space as non-whitespace control characters. The only exception make the control characters that prescribe elementary line-oriented formatting. ASCII doesn't set any mechanism, the purpose of which would be a description of text structure or appearance within a document. Some other schemes, such as markup languages, address page and document layout and formatting.

Only brief descriptive phrases were used for each of the control character of the original ASCII. This uncertainty wasn't always accidental. Sometimes it was purposely created, for example in cases, where a character would be used a little bit on a terminal link than on a data stream. However, sometimes it was clearly accidental, for example in case of "delete" meaning.

The Teletype Model 33 ASR can be surely called as the most significant single device in the interpretation sphere of these characters. The Teletype Model 33 ASR was a printing terminal with an available paper tape reader/punch option. Long - term program storage used paper tape a lot until the 1980s. It was pretty cheaper and somehow less fragile than magnetic tape. Especially, the Teletype Model 33 machine assignments for codes 17 (DC1, Control-Q, also known as XON), 19 (DC3, Control-S, also known as XOFF), and char 127 (Delete), that in fact became standards. The Model 33 had one more reason to be popular at that time. It was taking the description of Control-G (code 7, BEL, meaning audible alarm the operator). In fact, it was a kind of the device with the real bell which it rang when received a BEL character. Because the keytop for the O key showed a left-arrow symbol as well (from ASCII-1963, which had this exactly character instead of underscore), an incompatible use of code 15 (Control-O, Shift In) interpreted as "delete previous character" was also borrowed by many timesharing systems of these times. However, some time later it became ignored by them.

When a Control-S (XOFF, an abbreviation for transmit off) was received by Teletype 33 ASR equipped with the automatic paper tape reader, it stopped the tape reader; receiving Control-Q (XON, "transmit on") resumed the tape reader. Some computer operating systems of that time have borrowed such technique. It was called as the "handshaking" signal that warned a sender to cancel the transmission because of impending overflow; currently it can be found in lots of systems as a manual output control technique. Some systems kept the initial function of the Control-S intact but they replaced Control-Q by a second Control-S to resume output. The 33 ASR also could be set to use Control-R (DC2) and Control-T (DC4) to start and stop the punch of the tape; on some devices that had such a function, the appropriate control character lettering on the keycap above the letter was TAPE and TAPE accordingly.

The Teletype didn't have a function that would let move the head backwards. Taking this into consideration, it did not put an additional key on the keyboard in order to send a BS (backspace). There was another thing for this: a key called "rubout" that sent code 127 (DEL). This key was created in order to correct the mistakes in a hand-typed paper tape: the operator just needed to push a button on the tape punch to back it up. Then the operator typed the rubout, and it in its turn punched all the gaps left after the punch and replaced the mistake with a character that was initially supposed to be ignored. The Digital Equipment Corporation used Teletypes for not so expensive computers. This way systems had to use the available key and then the DEL code in order to erase the previous character. This was the reason, why DEC video terminals (by default) sent the DEL code for the key marked "Backspace", while the key marked "Delete" sent an escape sequence, while lots of other terminals sent BS for the Backspace key. Just one code could be used by the Unix terminal driver in order to back up. The back up function here could be adjusted either to BS or DEL, but not to the both of them. The result of the use of the both keys would be a long exhausting period of irritation, where you had to correct the mistake taking into consideration the terminal you were using (modern shells using readline, so they respectively understand the both codes). The supposition that no key sent a BS caused Control+H to be used for other purposes, such as a "help" command in Emacs.

Lots of new meaning were given to the the control codes, that initially had completely different functions than the new ones. One of the brightest examples is the "escape" character (ESC, code 27). Initially its function was to allow sending other control characters as literals instead of calling to their meaning. The same meaning of "escape" character can be met in URL encodings, C language strings, and some other systems where particular characters have their stable meaning. However, time flies, new technologies appear, so this meaning has been co-opted. In plain words, it was internally changed. Currently ESC sent to the terminal usually means the start of a command sequence usually in the form of a so-called "ANSI escape code" (or, better to say, a "Control Sequence Introducer") from ECMA-48 (1972) and its continuers, beginning with ESC followed by a "[" (left-bracket) character. An ESC sent from the terminal is usually used as an out-of-band character. It's function is to terminate an operation, as in the TECO and vi text editors. Usually ESC makes a request in order to interrupt its current operation or to exit (terminate) completely in graphical user interface (GUI) and windowing systems.

In addition to their historical use, the incorporated uncertainly of lots of the control characters, created certain problems when transmitting "plain text" files between systems. We suppose, that the brightest example of it is the newline problem that different operating systems face. Teletype machines need a line of text to be completed with both "Carriage Return" (which moves the printhead to the beginning of the line) and "Line Feed" (which moved forward the paper one line without moving the printhead). The name "Carriage Return" has its own meaning. In connection to it we have to remember the fact that on a manual typewriter the carriage holding the paper moved while the position where the typebars struck the ribbon remained stationary. The whole carriage had to be pushed (taken back.) to the right in order to position the left margin of the paper for the next line.

The both characters were used by DEC operating systems (OS/8, RT-11, RSX-11, RSTS, TOPS-10, etc.) in order to mark the end of a line. It was needed for the console device (usually Teletype machines) to work. The convention was set up so well, that backward compatibility compelled to continue the convention. This happen at the time so-called "glass TTYs" (which were then called CRTs or terminals) came along. Gary Kildall found his inspiration to design CP/M in some command line interface conventions used in DEC's RT-11. Until the introduction of PC DOS in 1981, IBM wasn't influentially involved in this. There was a reason for that. The thing is that 1970s operating systems used EBCDIC, not ASCII. They were oriented in the direction of the punch-card input and line printer output. The concept of carriage return on it was absolutely pointless. IBM's PC DOS (also marketed as MS-DOS by Microsoft) was lucky to be freely based on CP/M, so thanks to this fact it inherited the convention. Windows inherited it from MS-DOS.

Some unwanted complexities and questions (for example, to how to interpret each character when encountered alone) arose due to the requiring two characters to mark the end of a line. In order to make matters plain text data streams simpler, including files, on Multics used line feed (LF) alone as a line terminator. Unix, Unix-like systems and Amiga systems have borrowed this convention from Multics. However, the original Macintosh OS, Apple DOS, and ProDOS used carriage return (CR) alone as a line terminator. It's worth mentioning, that since Apple replaced these operating systems with the Unix-based macOS operating system, the line feed (LF) is used by them currently as well. The Radio Shack TRS-80 also used a lone CR to terminate lines.

Computers that were assigned to the ARPANET included machines running operating systems, for example TOPS-10 and TENEX using CR-LF line endings, machines running operating systems, for example Multics using LF line endings, and machines running operating systems, for example OS/360 that represented lines as a character count followed by the characters of the line and that used EBCDIC rather than ASCII. An ASCII "Network Virtual Terminal" (NVT) was determined by the Telnet protocol. This way the connections between hosts with different line-ending conventions and character sets could be supported with the help of transferring a standard text format over the network. Telnet used ASCII along with CR-LF line endings. The software using other conventions would translate between the local conventions and the NVT. The File Transfer Protocol adopted the Telnet protocol. The use of the Network Virtual Terminal was also a part of it. It was intended for the use in transmitting commands and transferring data in the default ASCII mode. This make the realization of those protocols pretty difficult, as well as the realization of some other network protocols, like those used for E-mail and the World Wide Web, on systems not using the NVT's CR-LF line-ending convention.

The PDP-6 monitor, and its PDP-10 continuer TOPS-10, used Control-Z (SUB) as an end-of-file indication for input from a terminal. CP/M and some other operating systems tracked file length only in units of disk blocks. They used Control-Z to mark the end of the actual text in the file. In order to reach such goals, EOF, or end-of-file, was used colloquially and conventionally as a three-letter acronym for Control-Z instead of SUBstitute. There were a lot of reasons why the end-of-text code (ETX), also known as Control-C, was unsuitable. The using Z as the control code to end a file is analogous to it ending the alphabet and serves as a pretty convenient mnemonic aid. A historically common and still dominating convention uses the ETX code convention to interrupt and halt a program via an input data stream, usually from a keyboard.

In C library and Unix conventions, the null character is used to terminate text strings; such null-terminated strings can be known in abbreviation as ASCIZ or ASCIIZ, where here Z stands for "zero".

 2018-2023 © Dmytro Koshovyi. Ukraine, Mykolayiv.