nlp - programmatically access IME -
is there way access japanese or chinese ime either command line or python? have linux/osx/win8 boxes, ever system exposes easiest accessible api fine.
i'm experimenting building japanese kana-kanji conversion algorithm , establish baseline using existing tools. have collections of kana process.
preferably along lines of
$ ime jp "きしゃのきしゃがきしゃできしゃした" 貴社の記者が汽車で帰社した
i've looked @ anthy, mozc , dbus on linux can't find anyway interact them via terminal or scripting (such python)
anthy provides cli tool
personally, prefer google's ime / mozc better results, perhaps helps.
the source anthy (sourceforge, file anthy-9100h.tar.gz
) includes simple cli program testing. download source file, extract it, run
./configure && make
enter directory test
contains binary anthy
. default, reads test.txt
, uses euc_jp encoding.
simple test:
input file test.txt
*にほんごにゅうりょく *もももすももももものうち。
run (using iconv
convert utf-8:
./anthy --all | iconv -f euc-jp -t utf-8
output:
1:(にほんごにゅうりょく) |にほんご|にゅうりょく にほんご(日本語:(1,1000,n,72089)2500,001 ,にほんご:(n,0,-)2 ,ニホンゴ:(n,0,-)1 ,): にゅうりょく(入力:(1,1000,n,62394)2500,001 ,にゅうりょく:(n,0,-)2 ,ニュウリョク:(n,0,-)1 ,): 2:(もももすももももものうち。) |ももも|すももも|もものうち|。 ももも(桃も:(,1000,ny,72089)225,279 ,ももも:(n,1000,ny,72089)220,773 ,モモも:(,1000,ny,72089)205,004 ,腿も:(,1000,ny,72089)204,722 ,股も:(,1000,ny,72089)146,431 ,モモモ:(n,0,-)1 ,): すももも(すももも:(n,1000,ny,72089)202,751 ,スモモも:(,1000,ny,72089)168,959 ,李も:(,1000,ny,72089)168,677 ,スモモモ:(n,0,-)1 ,): もものうち(桃のうち:(,1000,n,655)2,047 ,もものうち:(n,1000,n,655)2,006 ,モモのうち:(,1000,n,655)1,863 ,腿のうち:(,1000,n,655)1,861 ,股のうち:(,1000,n,655)1,331 ,モモノウチ:(n,0,-)1 ,): 。(。:(1n,100,n,70203)57,040 ,.:(1,100,n,70203)52,653 ,.:(1,100,n,70203)3,840 ,):
you can uncomment printf
statements in source files test/main.c
, src-main/context.c
make output more readable/parsable, eg:
1 にほんごにゅうりょく にほんご 日本語 にゅうりょく 入力 2 もももすももももものうち。 ももも 桃も すももも すももも もものうち 桃のうち 。 。
Comments
Post a Comment