Kana Transliterations

Kana Transliterations

This specification defines how to implement general-purpose Kana transliterators.

Conformance

A conforming implementation MUST use the steps to convert a string into Katakana to implement Hiragana to Katakana conversions, or the steps to convert a string into Hiragana to implement Katakana to Hiragana conversions.

Hiragana to Katakana

To convert a string into Katakana with string input, run these steps:

  1. Return the result of running the steps to transliterate a string input with the Hiragana to Katakana table.

Katakana to Hiragana

To convert a string into Hiragana with string input, run these steps:

  1. Return the result of running the steps to transliterate a string input with the Katakana to Hiragana table.

Normalization

To normalize a Kana string input, run these steps:

  1. Let output be the empty string.
  2. Let length be the length of input.
  3. Let mapping table be the Normalization table.
  4. Let i be zero.
  5. While i is less than length:
    1. Let char be ith code point within input.
    2. Let mark be the empty string.
    3. If i + 1 is less than length:
      1. Set mark to (i + 1)th code point within input.
    4. If mark is not the empty string and mapping table [ char followed by mark ] exists:
      1. Append mapping table [ char followed by mark ] to output.
      2. Increment i by two.
    5. Otherwise, if mapping table [ char ] exists:
      1. Append mapping table [ char ] to output.
      2. Increment i by one.
    6. Otherwise:
      1. Append char to output.
      2. Increment i by one.
  6. Return output.

Algorithm

To transliterate a string input with mapping table, run these steps:

  1. Let output be the empty string.
  2. For each code point char in input:
    1. If mapping table [ char ] exists:
      1. Append mapping table [ char ] to output.
    2. Otherwise:
      1. Append char to output.
  3. Return output.
A mapping table [ char ] value can have more than one code points.

Tables

Tables referenced from this specification are defined in the maps.json data file https://github.com/manakai/data-chars/blob/master/data/maps.json (documentation: https://github.com/manakai/data-chars/blob/master/doc/maps.txt), which is a normative part of this specification.

The JSON data file contains several mapping tables, identified by a name, containing entries from code points to code points. For the purpose of this specification, they are considered as maps of string to string entries with following names:

json
Name in JSON data file
this
Name in this specification
json
kana:h2k
this
Hiragana to Katakana table
json
kana:k2h
this
Katakana to Hiragana table
json
kana:normalization
this
Normalization table
Any other mapping tables in the JSON data file are not used by this specification.

These tables will be updated when new Kana characters are added to the Unicode Standard. Implementations SHOULD be prepared for updating their tables.

References

This specification depends on the Infra Standard https://infra.spec.whatwg.org/. The terms for each, while, code point, length, string, append, ordered map, entry, and exists are defined by the Infra Standard.

Test data

This section is non-normative.

There are test data:

License

Per CC0 https://creativecommons.org/publicdomain/zero/1.0/, to the extent possible under law, the author of this specification has waived all copyright and related or neighboring rights to this specification.

Notes

This section is non-normative.

See 平仮名と片仮名 for rationale.