Hyper Dictionary

English Dictionary Computer Dictionary Video Dictionary Thesaurus Dream Dictionary Medical Dictionary


Search Dictionary:  

Meaning of UTF-8

 
Matching Terms:  utf

Computing Dictionary
 
 Definition: 

(UCS transformation format 8) An ascii-compatible multibyte unicode and ucs encoding, used by java and plan 9.

The unicode character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain bytes like '\0' or '/' which have a special meaning in filenames and other c library function parameters. In addition, the majority of unix tools expects ASCII files and can't read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc.

The iso 10646 universal character set (UCS), a superset of Unicode, occupies a 31-bit code space and the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems.

The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching etc. work as expected.

UTF-8 is defined in rfc 2279.

["File System Safe UCS Transformation Format (FSS_UTF)", X/Open Preliminary Specification, X/Open Company Ltd., Document Number: P316. This information also appears in ISO/IEC 10646, Annex P].

plan 9 utf manual entry.

 

 

COPYRIGHT © 2000-2013 HYPERDICTIONARY.COM HOME | ABOUT HYPERDICTIONARY