大连理工大学主页平台管理系统 rengongzhinengyingyong--Home-- Generating word and document matrix representations for document classification

Paper

Current position: Home >> Research Results >> Paper

Pre One:

Memory confidentiality and integrity protection technology

Next One:

Polyseme-Aware Vector Representation for Text Classification

Generating word and document matrix representations for document classification

Release time:2020-07-19 Hits:

Indexed by: Journal Papers

First Author: Guo, Shun

Correspondence Author: Guo, S (corresponding author), Dalian Univ Technol, Dept Comp Sci & Technol, Dalian, Peoples R China.

Co-author: Yao, Nianmin

Date of Publication: 2020-07-01

Journal: NEURAL COMPUTING & APPLICATIONS

Included Journals: SCIE

Document Type: J

Volume: 32

Issue: 14

Page Number: 10087-10108

ISSN No.: 0941-0643

Key Words: Document-level classification; Word matrix; Document matrix; Subwindows

Abstract: We present an effective word and document matrix representation architecture based on a linear operation, referred to as doc2matrix, to learn representations for document-level classification. It uses a matrix to present each word or document, which is different from the traditional form of vector representation. Doc2matrix defines proper subwindows as the scale of text. A word matrix and a document matrix are generated by stacking the information of these subwindows. Our document matrix not only contains more fine-grained semantic and syntactic information than the original representation but also introduces abundant two-dimensional features. Experiments conducted on four document-level classification tasks demonstrate that the proposed architecture can generate higher-quality word and document representations and outperform previous models based on linear operations. We can see that compared to different classifiers, a convolutional-based classifier is more suitable for our document matrix. Furthermore, we also demonstrate that the convolution operation can better capture the two-dimensional features of the proposed document matrix by the analysis from both theoretical and experimental perspectives.

Translation or Not: no