イサハラ ヒトシ
  井佐原 均
   所属   追手門学院大学  心理学部 心理学科
   追手門学院大学  大学院 心理学研究科
   職種   教授
言語種別 英語
発行・発表の年月 2017/10
形態種別 論文
査読 査読あり
標題 Effect of linguistic information in neural machine translation
執筆形態 共著・編著(代表編著を除く)
掲載誌名 ICAICTA 2017
掲載区分国外
出版社・発行元 Institute of Electrical and Electronics Engineers Inc.
著者・共著者 Naomichi Nakamura,Hitoshi Isahara
概要 NMT requires a large corpus and a long calculation time. In order to suppress calculation cost, recent researches replaced low frequency words with symbols. However, the symbols make sentences ambiguous and deteriorates translation accuracy. To solve this problem, sub-word units such as Byte Pair Encoding(BPE) and Wordpiece Model(WPM) creating vocabularies in a prespecified vocabulary size has been proposed. Nevertheless, these tokenize methodsbreak words and treat them as symbols. Words as symbols are compatible with neural networks and NMT performance has increased. This result shows that linguistic correctness is not necessarily important in NMT. If that is the case, we wonder to what extent linguistic correctness contributes to NMT accuracy. In this research, we experiment to incorporate linguistic information into sub-word units. Experimentally, we demonstrate that morpheme as linguistic information is a helpful factor for sub-word units.
DOI 10.1109/ICAICTA.2017.8090975