Suppose you want to encode the Australian flag, you may consider this to be one simple emoji character. Actually you're in for a surprise, perhaps, because emojis aren't always represented as a single character, many emojis are combinations of multiple Unicode code points.

For instance, the Australian flag may be represented as two 8-byte code points.


This happens to display as a single glyph, the Australian flag, on some platforms, but may also display as two separate glyphs.

However, some ways of representing text only support encoding of 4-byte code points, those in the range U+0000 to U+FFFF. JSON is one of these. When we attempt to escape the 8-byte characters (which, note, do not need to be escaped under the JSON spec), we get a result that looks like two different codepoints. Quoth RFC 7159:

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a 12-character sequence,
encoding the UTF-16 surrogate pair.  So, for example, a string
containing only the G clef character (U+1D11E) may be represented as

So in fact, the Australian flag may be concretely represented in escaped JSON as this string:


As you can see, the last two hex digits of these escape pairs (e6, fa) matches to the last bytes of the 8-byte code points above.