Code Unit: What It Is, How It Works & How to Use It

In UTF-16 (the encoding system used for JavaScript strings) code units are 16-bit values. This means that operations such as indexing into a string or getting the length of a string operate on these 16-bit units. These units do not always map 1-1 onto what we might consider characters.

For example, characters with diacritics such as accents can sometimes be represented using two Unicode code points:

                                    
                                        const
                                        myString =
                                        "\u006E\u0303"
                                        ;
                                        console.
                                        log
                                        (
                                        myString)
                                        ;
                                        // ñ
                                        console.
                                        log
                                        (
                                        myString.
                                        length)
                                        ;
                                        // 2

Also, since not all of the code points defined by Unicode fit into 16 bits, many Unicode code points are encoded as a pair of UTF-16 code units, which is called a surrogate pair :

The codePointAt() method of the JavaScript String object enables you to retrieve the Unicode code point from its encoded form:

                                    
                                        const
                                        face =
                                        "🥵"
                                        ;
                                        console.
                                        log
                                        (
                                        face.
                                        codePointAt
                                        (
                                        0
                                        )
                                        )
                                        ;
                                        // 129397

Code Unit

See also