Crunchytoast.com

What’s better than toast? Crunchytoast!

crunchyt sez:

This is my first website ... after 15 years of making them for everyone else! Hope you enjoy it too.

You remind me of someone! If you’re reading this, then like me you must forget regex expressions frequently. Despite the chunky snippets found within the Net’s regex libraries, I can never find the Japanese related ones I want. Of course if you search for 規制表現 you are more likely to find them! Surprise, surprise, Japanese speaking coders write about them more than English speakers :(

So here is a small compilation of essential kisei-hyogen (regexes in Japanese) that you will need if you do any Japanese text or data processing.

These should all be PERL compatible, and assume UTF8 encoding. If you are using a different encoding and get errors, try converting using ICONV or a platform specific function.

Without further ado, let the download begin:

Regex for matching ALL Japanese common & uncommon Kanji (4e00 - 9fcf) ~ The Big Kahuna!
([一-龯])

Regex for matching Hirgana or Katakana
([ぁ-んァ-ン])

Regex for matching Non-Hirgana or Non-Katakana
([^ぁ-んァ-ン])

Regex for matching Hirgana or Katakana or basic punctuation (、。’)
([ぁ-んァ-ン\w])

Regex for matching Hirgana or Katakana and random other characters
([ぁ-んァ-ン!:/])

Regex for matching Hirgana
([ぁ-ん])

Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])

Regex for matching half-width Katakana (hankaku 半角)
([ァ-ン゙゚])

Regex for matching full-width Numbers (zenkaku 全角)
([0-9])

Regex for matching full-width Letters (zenkaku 全角)
([A-z])

Regex for matching Hiragana codespace characters (includes non phonetic characters)
([ぁ-ゞ])

Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters)
([ァ-ヶ])

Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana)
([ヲ-゚])

Regex for matching Japanese Post Codes
/^¥d{3}¥-¥d{4}$/
/^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/

Regex for matching Japanese mobile phone numbers (keitai bangou)
/^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/
/^0¥d0-¥d{4}-¥d{4}$/

Regex for matching Japanese fixed line phone numbers
/^[0-9-]{6,9}$|^[0-9-]{12}$/
/^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/

Also with thanks and kudos to:
- Shirouto Tokidoki Kurouto
- Webtips
- Krauser-sama, Detroit Metal City :P


These have all been tested with Ruby using www.rubular.com.

Leave a Reply