SDW'98

内容解析に基づく文書構造の自動抽出

品川徳秀*，北川博之**
*筑波大学大学院工学研究科
**筑波大学電子・情報工学系

Extraction of Document Structures Based on Contents Analysis

Norihide SHINAGAWA* and Hiroyuki KITAGAWA**
*Doctoral Program in Engineering, University of Tsukuba
**Institute of Information Sciences and Electronics, University of Tsukuba

[ GZiped PS file is here ]

概要

近年の計算機環境の普及に伴い、電子化文書の重要性は更に高まってきている。それらの潜在的な数、量は膨大なものであり、必要な情報を利用する事が容易ではなくなってきている。本稿では、文書中から話題の階層を抽出し、問合せに適合する部分を柔軟に検索するための手法を提案する。これにより、様々な抽象度の部分文書を、問合せ条件に応じて検索および呈示の単位として利用する事や、文書に内在する論理構造を用いた検索が可能になる。本手法について、実験を通じた評価を示すとともに、転置ファイルを用いて、問合せに応じて動的、局所的に文書構造を抽出する方法についても述べる。

Abstract

Importance of digital documents is growing with the recent advances of computing environments. The volume of document data is huge and it is not easy to access relevent information. In this paper, we propose a method to extract logical structures embeded in documents and retrieve relevent passages from documents flexibly. This method makes it possible to use passages of various abstraction levels as units in document retrieval. Experimental evaluation of this method is given. We also describe a technique to dynamically and locally extract logical strctures using inverted file strucures.

一つ前に戻る
KDE ホームページへ戻る