大连理工大学主页平台管理系统 rengongzhinengyingyong--Home-- Polyseme-Aware Vector Representation for Text Classification

Paper

Current position: Home >> Research Results >> Paper

Generating word and document matrix representations for document classification

Multi-Link Scheduling Algorithm of LLC Protocol in Heterogeneous Vehicle Networks Based on Environment and Vehicle-Risk-Field Model

Polyseme-Aware Vector Representation for Text Classification

Release time:2020-08-21 Hits:

Indexed by: Journal Papers

First Author: Guo, Shun

Correspondence Author: Yao, NM (corresponding author), Dalian Univ Technol, Dept Comp Sci & Technol, Dalian 116024, Peoples R China.

Co-author: Yao, Nianmin

Date of Publication: 2020-01-01

Journal: IEEE ACCESS

Included Journals: SCIE

Document Type: J

Volume: 8

Page Number: 135686-135699

ISSN No.: 2169-3536

Key Words: Task analysis; Semantics; Text categorization; Training; Computational modeling; Context modeling; Microsoft Windows; Polysemous words; context clustering algorithm; PAVRM-Context; PAVRM-Center

Abstract: Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.

Translation or Not: no