Unicode is a computer standard for the consistent encoding of text in most of the world's languages. The latest version of Unicode contains more than 110,000 characters covering 100 scripts. The standard consists of character encodings and rules for rendering text. The standard is maintained by the Unicode Consortium.
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, which has the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit (two 8-bit bytes) for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using two 16-bit units (4 × 8 bit) to handle each of the additional characters.