In UTF-16 (the encoding system used for JavaScript strings) code units are 16-bit values. This means that operations such as indexing into a string or getting the length of a string operate on these 16-bit units. These units do not always map 1-1 onto what we might consider characters.
For example, characters with diacritics such as accents can sometimes be represented using two Unicode code points:
const
myString =
"\u006E\u0303"
;
console.
log
(
myString)
;
// ñ
console.
log
(
myString.
length)
;
// 2
Also, since not all of the code points defined by Unicode fit into 16 bits, many Unicode code points are encoded as a pair of UTF-16 code units, which is called a surrogate pair :
const
face =
"🥵"
;
console.
log
(
face.
length)
;
// 2
The
codePointAt()
method of the JavaScript
String
object enables you to retrieve the Unicode code point from its encoded form:
const
face =
"🥵"
;
console.
log
(
face.
codePointAt
(
0
)
)
;
// 129397