Unicode characters can then be entered by holding down Alt, and typing + on the numeric keypad, followed by the hexadecimal code – using the numeric keypad for digits from 0 to 9 and letter keys for A to F – and then releasing Alt. (In versions earlier than Vista, users needed to reboot for it to start working.)
Users will need to log off and back in after editing the registry for this input method to start working. Hexadecimal Unicode input can be enabled by adding a string type (REG_SZ) value called EnableHexNumpad to the registry key HKEY_CURRENT_USER\Control Panel\Input Method and assigning the value data 1 to it. Most modern systems have some method to emulate this, sometimes limited to four digits (thus only the Basic Multilingual Plane). See below for use of decimal code points in HTML.Ĭlause 5.1 of ISO/IEC 14755 describes a Basic method whereby a beginning sequence is followed by the hex number representation of the code point and the ending sequence. For example, as decimal 9881 is equal to hexadecimal 2699, dig Gr 9881 associates "Gr" with U+2699 ⚙ GEAR. The installed set can be augmented by custom mnemonics defined for arbitrary code points, specified in decimal. The text editor Vim allows characters to be specified by two-character mnemonics (confusingly called "digraphs" by Vim developers). In programs in which Alt codes over 255 do not work, the character retrieved usually corresponds to the remainder when the number is divided by 256. For example, Alt+ 0 2 4 7 yields a ÷, corresponding to its code point, but the character produced by Alt+ 2 4 7 depends on the OEM code page, such as Code page 437, and may yield a ≈.
Similarly, Alt+ 1 2 0 1 3 2 produces the double-struck character ?.ĭecimal code points in the range 160 –255 must be entered with a leading zero (so that the Windows code page is chosen) and furthermore the Windows code page must be set to match Unicode ( CP1252 must be used ). For example, the Euro sign € has 20AC as its hexadecimal code point, which is 8364 in decimal, so Alt+ 8 3 6 4 will produce the symbol. Some programs running in Microsoft Windows, including recent versions of Word and Wordpad, can produce characters from their Unicode code points expressed in decimal and entered on the numeric keypad with the Alt key held down. It is often practical to just find the desired character on the web or in another document, and copy and paste it from there. Generally these tools let the user "copy" the selected characters into the clipboard, and then paste them into the document, rather than pretending to directly type them. On most Linux desktop environments, equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – are available. More advanced third-party tools of the same type are also available (a notable freeware example is BabelMap, which supports all Unicode characters). Characters are searchable by Unicode character name, and the table can be limited to a particular code block. This is limited to characters in the Basic Multilingual Plane (BMP). Microsoft Windows has provided a Unicode version of the Character Map program, appearing in the consumer edition since XP. ISO/IEC 14755 refers to this as a screen-selection entry method. Many systems provide a way to select Unicode characters visually. notdef for unsupported characters, and the replacement character ⟨�⟩ only for encoding errors. If an application does not have access to a glyph, the character will usually be shown as the font's ".notdef" glyph ⟨?⟩ which often appears as an empty box (nicknamed "tofu" based on the shape), a box with an X in it, or a box with a question mark in it.
Which fonts are used for fallback and the thoroughness of Unicode coverage varies by software and operating system some software will search for a suitable glyph in all of the installed fonts, others only search within certain fonts. However, most modern browsers and other text-processing applications are able to display multilingual content because they perform font substitution, automatically switching to a fallback font when necessary to display characters which are not supported in the current font.
Very few fonts have full Unicode coverage most only contain the glyphs needed to support a few writing systems. Historic scripts, but also many modern symbols and pictographs (such as emoticons, emojis, playing cards and many CJK characters) have 5-digit codes.Īn application can display a character only if it can access a font which contains a glyph for the character. Characters in the Basic Multilingual Plane (BMP), containing modern scripts – including many Chinese and Japanese characters – and many symbols, have a 4-digit code. Unicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310.