WordPress with multibyte languages

WordPress uses UTF-8 as a default, so you wouldn`t think that there aren’t any problems using Japanese. But the truth is that there are.

Ther are two main problems. When you use the excerpt tag, the WordPress will automatically show limited number of words, 120 words but default I think, with […] at the end. This is often used on the home index page, just like this Hemingway theme does. The problem is that when the content of the post is in Japanese, the whole of that content is shown. Not an excerpt. The other problem is that the “search” do not work. I type in some Japanese word, which I know it exists in my post, but does not return any result.

So. I searched around on the web, and I found the problems and solutions on Jam’s WordPress page. I am not a progmmer, so I am not sure of the detail, but the both problems seems to be associated with the different in context that the Asian languages use, and WordPress do not take in account of that possibility.

The bit I can understand, because its not to do with programming, is the context. Japanese language do not use whitespace to recognise “word”. I believe that this is also true for other Asian multi-byte language like Korean and Chinese. Apprantly, WordPress’@ excerpt count the words by recognising the whitespaces in between. If the content is in Japanese, it will recognise the whitespace at the end of the paragraph as end of the word. As a result, if the post contain five paragraphs, WordPress sees it as five words, and will looks like it is showing the full content.

The “Patch for WordPress 2.0.1 to support Asian text in its excerpt constructing functions” is in diff format, not as a plugin. Thats because the core files needs to be hacked. So thats a downside of applying this patch, especially for those who are not confortable with PHP. I have no experience with maintaining a UNIX system, so I struggled to appyl the patch at first.

For the solution of the search problem, apply the “Search excerpt plugin supporting Asian text”, which is a modified version of the ylsy_search_excerpt.

Some links:

  • Jam’s WordPress page
  • GnuWin32
  • How to use the patch (sorry, its a Japanese link)
  • Patch and How to for WordPressME

Note on the files that the patch hacks:

  • wp-trackback.php
  • ./wp-includes/comment-functions.php
  • ./wp-includes/feed-functions.php
  • ./wp-includes/functions-formatting.php
  • ./wp-includes/functions-post.php
  • ./wp-includes/functions.php

asiantextss1asiantextss2

2 Comments

  1. there are many problems with asian text. i think we have many problems as you guys. it’s the same way of working problem with korean.

    maybe we should make our own way of working.

  2. So it is the same case in Korean too then. A Japanese version of WordPress is actuaaly called “WordPressME”. ME is for Multilingual Editon. From what I know, there are slight deffernces in the core file, maily for, as always, the language issue. Whereas its a one solution, the ideal solution for me is to “sort out” the porblem in the WordPress itself, rather than making ME version.
    If there are anyone out there who are PHP programmer, and willing to give a hand to the WordPress dev. team, on using Asian languages, then that would be great. I think we, Japanese, tends to isolate ourselves in this kind of project…