NIF builds on the current best practices for counting strings and creating offsets. The relevant documents are:
Section 2.4, Code Points and Characters
Section 2 of RFC 5147 (for newlines all code points must be counted in NIF)
ISO 24612:2012 - Language resource management -- Linguistic annotation framework (LAF):