Japanese typing in QMK firmware

Implemented a Japanese input method in my keyboard firmware (based on QMK).

It’s used to write in katakana and hiragana (syllable-based writing systems for Japanese) using Roman / Latin letters on my keyboard.

How kana works

Hiragana and katakana (collectively kana) are syllabaries, writing systems where a consonant-vowel syllable is written as a single character.

A common way to type kana on a regular keyboard is romaji, using Roman letters to produce kana. For example, sushi = すsuしshi.

Kana are usually organised as a table with vowels as columns and consonants as rows:

	A	I	U	E	O
K	かka	きki	くku	けke	こko
S	さsa	しsi/shi	すsu	せse	そso
T	たta	ちti/chi	つtu/tsu	てte	とto
⋮	⋮	⋮	⋮	⋮	⋮

This tabular structure inspired the core idea behind my implementation.

The map

The core idea is a 2D lookup table. Since flash memory (program space) is severely limited on keyboard microcontrollers, I wanted to represent all romaji rules in a single compact table.

Columns are the five vowels. Rows cover every Roman letter, including those that don’t make sense for Japanese, to keep the indexing logic simple and generalised.

static char MAP[ROW_COUNT][5*3] = {
  // A I U E O
  "あああああ", // A (vowel)
  "ばびぶべぼ", // B
  "？ち？？？", // C
  "だ？づでど", // D
  "えええええ", // E (vowel)
  "？？ふ？？", // F
  "がぎぐげご", // G
  "はひふへほ", // H
  "いいいいい", // I (vowel)
  "？じ？？？", // J
  "かきくけこ", // K
  "らりるれろ", // L
  "まみむめも", // M
  "なにぬねの", // N
  "おおおおお", // O (vowel)
  "ぱぴぷぺぽ", // P
  "？？く？？", // Q
  "らりるれろ", // R
  "さしすせそ", // S
  "た？つてと", // T
  "ううううう", // U (vowel)
  "？？ゔ？？", // V
  "わ？う？を", // W
  "ぁぃぅぇぉ", // X (small vowels)
  "やいゆえよ", // Y
  "ざじずぜぞ", // Z
};

Some of these code examples are interactive!

Every letter in the alphabet has a row in the map even if it isn't used for kana to make indexing simple. Since keycodes are assigned alphabetically in a contiguous block, we can use pointer arithmetic MAP[pressed_keycode - keycode('a')] to index the row for any pressed letter key. The offset is just the letter's position in the alphabet.

For letter combinations that don't map to anything in Japanese, like Q,A, let’s mark them with ？ for now. More on special values later.

// A I U E O
  "？？く？？", // Q

The X row is reused to input small vowels ぁぃぅぇぉ (there is no x consonant in Japanese).

// A I U E O
  "ぁぃぅぇぉ", // X (small vowels)

Those are the basics of the map lookup. Essentially, a typed consonant and a vowel give us the character for that syllable.

But you might be wondering how a 2D table of Unicode characters even works in practice. What about endianness? Doesn't Unicode use variable-length encoding? How would indexing into that work?

UTF-8

The QMK firmware framework only supports UTF-8 for Unicode output, so that's the encoding I needed to work with.

UTF-8 is a variable-length encoding spanning the full 21-bit Unicode range U+0000–U+10FFFF. The number of bytes each character takes up depends on the character range:

Unicode character	Bytes in UTF-8
`U+0000`–`U+007F`	1
`U+0080`–`U+07FF`	2
`U+0800`–`U+FFFF`	3

Kana characters fall in the U+3040 to U+30FF range, so every kana character is exactly 3 bytes in UTF-8.

This means every row in the map, which is 5 kana characters, is consistently 15 bytes long. The table is evenly laid out in memory! Any character can be addressed with simple index arithmetic:

// kana_ptr points to the kana character (3 bytes)
char* kana_ptr = &MAP[letter_index][vowel_index * 3];

Getting a pointer to くku, for example, is simply &MAP['k' - 'a'][2 * 3], where 2 is the vowel index of U.

Byte layout of MAP

Finally, to send a kana character in QMK from a consonant and vowel keypress:

// Get indices
uint letter_index = prev_letter_keycode - KC_A;
uint vowel_index = vowel_index_for(curr_letter_keycode);
// Get pointer to the kana character
char* kana_ptr = &MAP[letter_index][vowel_index * 3];
// Copy 3 bytes into a null-terminated buffer
static char buf[4] = "\0\0\0\0";
strncpy(buf, kana_ptr, 3);
// Send as a null-terminated UTF-8 string
send_unicode_string(buf);

Independent vowels

It’s not always a consonant-vowel pair. If the last typed letter wasn’t a consonant, then the current syllable is just a vowel and we can skip the whole syllable processing logic and just output the vowel symbol directly: MAP[letter_index][0]. By the way, we can also use the table to tell whether a letter is a consonant or a vowel. There’s no separate hardcoded list of consonants.

static char MAP[ROW_COUNT][5*3] = {
  // A I U E O
  "あああああ", // A ← vowels, just output any of these

Special rules

Romaji doesn't always map two letters to one kana. For example, shi in sushi is three letters but a single character (the し in すsuしshi). There are a few of these special rules.

ch and sh

Sometimes two letters represent a single ‘consonant’, like sh and ch. The fix is to resolve the key sequences S,H and C,H to extra ‘virtual consonant’ rows appended at the end of the table. This doesn’t affect the indexing of the real letter rows.

static char MAP[ROW_COUNT][5*3] = {
  // A I U E O
  // ⋮
  "ざじずぜぞ", // Z
  [ROW_EXTRA_CH] = "？ち？？？",
  [ROW_EXTRA_SH] = "？し？？？",
};

With this, typing shi outputs し and chi outputs ち as expected.

ja, ju, jo

Some sounds map two Roman letters to two kana characters. For example, ja is じゃ (じji + small やya). Similarly, ju and jo become じゅ and じょ, respectively. This class of sounds is called yōon, where a syllable ending in i blends into a ya, yu, or yo sound.

These get a special marker in the table. Here, it’s a fullwidth Ｙ (also 3 bytes) to signal special handling:

static char MAP[ROW_COUNT][5*3] = {
  // A I U E O
  // ⋮
  "ＹちＹ？Ｙ", // C
  // ⋮
  "ＹじＹ？Ｙ", // J
  // ⋮
  [ROW_EXTRA_CH]    = "ＹちＹ？Ｙ",
  [ROW_EXTRA_SH]    = "ＹしＹ？Ｙ",
  [ROW_EXTRA_YOUON] = "ゃ　ゅ　ょ",
};

The small ya/yu/yo are stored in an extra row (in columns A, U, and O, respectively).

Note how the Ｙ rule integrates in the CH and SH rows seamlessly. Rule synergy!

When a lookup returns Ｙ, we know it’s a yōon form. Output the consonant's i form then the vowel’s corresponding small ya/yu/yo:

if (match(&MAP[letter_index][vowel_index * 3], "Ｙ")) {
  send_kana_char(&MAP[letter_index][vowel_offset(KC_I)]);
  send_kana_char(&MAP[ROW_EXTRA_YOUON][vowel_index * 3]);
}

That’s not all. We also need to catch the three-letter yōons, such as nya, nyu, and nyo:

if (
  is_consonant(prev_prev_ keycode)
  && prev_consonant == KC_Y
  && !empty(&MAP[ROW_EXTRA_YOUON][vowel_offset(curr_vowel)])
) {
  send_kana_char(
    &MAP[prev_prev_keycode - KC_A][vowel_offset(KC_I)]);
  send_kana_char(
    &MAP[ROW_EXTRA_YOUON][vowel_offset(curr_vowel)]);
}

With this, we can type:

ja	じゃ
nya	にゃ
kyo	きょ
chu	ちゅ
sha	しゃ

fa, fi, fe, fo

Japanese has no native fa, fi, fe, or fo sounds, only fu (ふ). These foreign sounds are approximated by combining ふ with a small vowel. For example, fa, fi, and fe become ふぁ, ふぃ, and ふぇ, respectively (ファ, フィ, and フェ in katakana).

We can mark these cases with the special value Ｓ in the map:

// A I U E O
  "ＳＳふＳＳ", // F

When a lookup returns Ｓ, we need to find the base syllable for that row. Find the column that isn't marked special. For the F row that's the sole ふfu. We then output that base syllable followed by the small vowel from the X row:

if (match(&MAP[letter_index][vowel_index * 3], "Ｓ")) {
  uint16_t base_vowel = get_base_vowel(prev_keycode, curr_keycode);
  send_kana_char(
    &MAP[prev_keycode - KC_A][vowel_offset(base_vowel)]);
  send_kana_char(
    &MAP[ROW_SMALL_VOWELS][vowel_offset(curr_keycode)]);
}

The same mechanism handles other borrowed sounds like va → ゔぁ and di → でぃ, which are also marked with Ｓ in their respective rows.

Thus completes the generalised lookup table for romaji.

Final lookup table

// A-Z mapped to kana in UTF-8.
//
// Since kana is a syllabary, it's a 2D map:
//   1. Rows map to one Roman letter in ASCII order.
//   2. Columns map to vowels A, I, U, E, O, in that order.
//   3. Thus, a syllable is a row-column pair.
//
// Every char takes up exactly 3 bytes.
// Thus it's possible address any syllable by its letter-vowel pair.
// - MAP[letter_idx][vowel_idx * 3]
//
// Special cases:
// - Vowels are the same for all columns
// - XA, XI, XU, XE, XO are mapped to small vowel kana
// - Ｙ indicates youon (never in the I column)
// - Ｓ indicates additional small vowel must be used
static char MAP[ROW_COUNT][5*3] = {
  // A I U E O
  "あああああ", // A (vowel)
  "ばびぶべぼ", // B
  "ＹちＹＳＹ", // C
  "だＳづでど", // D
  "えええええ", // E (vowel)
  "ＳＳふＳＳ", // F
  "がぎぐげご", // G
  "はひふへほ", // H
  "いいいいい", // I (vowel)
  "ＹじＹＳＹ", // J
  "かきくけこ", // K
  "らりるれろ", // L
  "まみむめも", // M
  "なにぬねの", // N
  "おおおおお", // O (vowel)
  "ぱぴぷぺぽ", // P
  "ＳＳくＳＳ", // Q
  "らりるれろ", // R
  "さしすせそ", // S
  "たＳつてと", // T
  "ううううう", // U (vowel)
  "ＳＳゔＳＳ", // V
  "わＳうＳを", // W
  "ぁぃぅぇぉ", // X (small vowels)
  "やいゆえよ", // Y
  "ざじずぜぞ", // Z
  [ROW_EXTRA_YOUON] = "ゃ　ゅ　ょ",
  [ROW_EXTRA_CH]    = "ＹちＹＳＹ",
  [ROW_EXTRA_SH]    = "ＹしＹＳＹ",
};

// enum helps keep track of virtual letter rows and total size
enum {
  ROW_EXTRA_YOUON = 'z' - 'a' + 1,
  ROW_EXTRA_CH,
  ROW_EXTRA_SH,
  ROW_COUNT
};

Hover over the above highlighted code to see connections!

Actual special cases

Some rules don't fit neatly into the 2D map and are handled separately in logic.

Three-letter sequences

A couple kana can be resolved three specific Roman letters. The only cases are: tsu and dzu for つ and づ. I wasn’t able to find a general pattern that fit the rest so these are detected explicitly before checking the map. If the input matches a known three-letter sequence, we skip the lookup and output the character directly.

Double consonants

In romaji, a doubled consonant indicates a kind of a glottal stop written as small っ. For example, nikki → にniっきki. When the same consonant is pressed twice in a row, we output っ and continue processing the second keypress normally.

Standalone n

N (ん) is the only kana consonant that can stand alone without a vowel. But this creates an ambiguity: typing na should produce なna, not んn + あa. The solution is to buffer the n and wait for the next keypress. If a vowel or y follows, we proceed with the normal map lookup. If another consonant follows, or the user types n' explicitly, we output ん and move on.

Katakana via Shift

Holding Shift while typing converts the output to katakana instead of hiragana. Since the hiragana and katakana Unicode blocks share the same internal order, can can just offset the hiragana value by a certain amount to get katakana. Just like adding ('A' - 'a') to an ASCII lowercase letter gives you the corresponding uppercase letter, adding an offset of (L'ア' - L'あ') code points to any hiragana maps it to katakana.

Problem is we’re working with UTF-8, not code points directly. A two-way conversion is needed to do the offset in code point space:

// converts one UTF-8 hiragana character to katakana in place
function to_katakana(char* kana) {
  // decode UTF-8 (the first byte will be unchanged)
  uint32_t codepoint = ((kana[1] & 0b00111111) << 6)
                      | (kana[2] & 0b00111111);
  // hiragana to katakana offset
  codepoint += (uint32_t)L'ア' - (uint32_t)L'あ';
  // encode back to UTF-8
  kana[1] = 0x80 | ((codepoint >> 6) & 0b00111111);
  kana[2] = 0x80 | ( codepoint       & 0b00111111);
}

The typing feel

An inherent awkwardness in romaji input (and other syllable-based Roman input methods) is you only know which kana to output once the next vowel is pressed. Buffering consonants until the next vowel makes the keyboard feel unresponsive. This is especially true when the input method lives in the hardware, not on the computer software.

The solution is to emit the Roman letter immediately on each consonant keypress, giving instant feedback. When the vowel arrives, we backspace over the previous consonants and replace it with the final kana:

// in my keypress processing logic
} else if (curr_cons) {
  tap_code(curr_keycode); // emit Roman letter immediately
} else if (prev_cons && curr_vowel) {
  syllable_t syllable =
    process_syllable(prev_prev_cons, prev_cons, curr_vowel);
  // delete Roman letter(s)
  while (syllable.backspaces-- > 0) tap_code(KC_BACKSPACE); 
  send_kana_unicode(...);
}

For multi-letter combinations like sh or tsu, multiple backspaces are issued accordingly. If you pause mid-syllable for more than 2 seconds, the state resets, abandoning the buffered consonant, and the next keypress starts fresh.

The main looker upper

Finally, the process_syllable() function is what runs all of the above rules against the lookup table. Reading the code provides a nice overview of all the rules in this input method implementation:

// called when prev_keycode is a consonant and curr_keycode is a vowel
// returns the identified syllable
syllable_t process_syllable(void) {
  bool preprev_cons = is_consonant(preprev_keycode);
  syllable_t result;
  result.backspaces = 0;

  // three-letter combinations
  if (preprev_cons) {
    if (match("tsu")) { // つ
      result.backspaces += 2;
      result.consonant_kc = KC_T;
      result.vowel_kc = curr_keycode;
      result.extra_char_ptr = NULL;
      return result;
    } else if (match("dzu")) { // づ
      result.backspaces += 2;
      result.consonant_kc = KC_D;
      result.vowel_kc = curr_keycode;
      result.extra_char_ptr = NULL;
      return result;
    }

    // ひゃ,にゅ,きょ,...
    if (preprev_cons && prev_keycode == KC_Y) {
      result.backspaces += 2;
      result.consonant_kc = preprev_keycode;
      result.vowel_kc = KC_I;
      result.extra_char_ptr = &MAP[ROW_EXTRA_YOUON][vowel_offset(curr_keycode)];
      return result;
    }

    // Map C+H and S+H to the CH/SH rows in the map
    if (preprev_keycode == KC_C && prev_keycode == KC_H) {
      result.backspaces += 1;
      preprev_keycode = 0;
      // not a 'keycode' anymore, just an index 
      prev_keycode = KC_A + ROW_EXTRA_CH;
    } else if (preprev_keycode == KC_S && prev_keycode == KC_H) {
      result.backspaces += 1;
      preprev_keycode = 0;
      prev_keycode = KC_A + ROW_EXTRA_SH;
    }
  } // end three-letter combinations

  // じゃ,じゅ,じょ
  if (is_youon(prev_keycode, curr_keycode)) {
    result.backspaces += 1;
    result.consonant_kc = prev_keycode;
    result.vowel_kc = KC_I;
    result.extra_char_ptr = &MAP[ROW_EXTRA_YOUON][vowel_offset(curr_keycode)];
    return result;
  }

  // ファ,ヴァ,ジェ,...
  if (is_small_vowel(prev_keycode, curr_keycode)) {
    result.backspaces += 1;
    result.consonant_kc = prev_keycode;
    result.vowel_kc = get_base_vowel(prev_keycode, curr_keycode);
    result.extra_char_ptr = &MAP[ROW_SMALL_VOWELS][vowel_offset(curr_keycode)];
    return result;
  }

  // a regular syllable
  result.backspaces += 1;
  result.consonant_kc = prev_keycode; // must be consonant
  result.vowel_kc = curr_keycode; // must be vowel
  result.extra_char_ptr = NULL;
  return result;
}

Why not implement a software IME?

Answered here.

Source code

My QMK fork on GitHub (GPL)