教員情報 - 宮本　行庸 | 追手門学院大学

ミヤモト　ユキノブ Miyamoto Yukinobu 宮本　行庸所属追手門学院大学理工学部情報工学科職種教授
言語種別	日本語
発行・発表の年月	1998/11/27
形態種別	国内学会誌（その他）
標題	特徴構成法を用いたQ学習の効率改善
執筆形態	代表編著
掲載誌名	情報処理学会研究報告
掲載区分	国内
出版社・発行元	情報処理学会
巻・号・頁	98(105),57-62頁
担当区分	筆頭著者
著者・共著者	宮本行庸, 上原邦昭
概要	本稿では，特徴構成法を用いた強化学習システムFCQLについて述べる．従来の強化学習では，対象とする環境の各状態を識別する適切な属性が，学習の前段階であらかじめ準備されていることを仮定している．現実には，学習システムが状態を識別するのに充分な入力系を持っているとは限らず，領域に固有の特徴を適宜構成していく機能が必要とされる．本稿では，構成的帰納学習に用いられる特徴構成法を，強化学習の一手法であるQ学習と統合し，有限離散時間環境における適切な内部表現と評価関数を学習する手法を提案する．結果として，単位時間における期待報酬値を最大化するのみでなく，収束までに費やす状態数の大幅な削減が実現できた．In this paper, we describe a new reinforcement learning system called FCQL (Feature Constructive Q-Learning). Usually, reinforcement learning methods assume that they an identify each state before learning. In a real-world domain, the learner only has limited sensors, so is required the ability to construct new features. This paper describes an approach integrating feature construction with Q-learning to learn efficient internal state representation and a decision policy simultaneously in a finite, deterministic environment. The result shows that FCQL can not only maximize the long-term discounted reward per unit time. but also reduce the number of status to converge.
ISSN	0919-6072
NAID	110002936337