Document Type

Report

Abstract

Finding meaningful phrases in a document has been studied in various information retrieval systems in order to improve the performance. Many previous statistical phrase finding methods had different aim such as document classification. Some are hybridized with statistical and syntactic grammatical methods; others use correlation heuristics between words. We propose a new phrase-finding algorithm that adds correlated words one by one to the phrases found in the previous stage, maintaining high correlation within a phrase. Our results indicate that our algorithm finds more meaningful phrases than an existing algorithm. Furthermore, the previous algorithm could be improved by applying different correlation functions.

Publication Date

6-20-2004

Share

COinS