Polyseme-Aware Vector Representation for Text Classification
发表时间:2020-08-21
点击次数:
论文类型:
期刊论文
第一作者:
Guo, Shun
通讯作者:
Yao, NM (corresponding author), Dalian Univ Technol, Dept Comp Sci & Technol, Dalian 116024, Peoples R China.
合写作者:
Yao, Nianmin
发表时间:
2020-01-01
发表刊物:
IEEE ACCESS
收录刊物:
SCIE
文献类型:
J
卷号:
8
页面范围:
135686-135699
ISSN号:
2169-3536
关键字:
Task analysis; Semantics; Text categorization; Training; Computational
modeling; Context modeling; Microsoft Windows; Polysemous words; context
clustering algorithm; PAVRM-Context; PAVRM-Center
摘要:
Representation models for text classification have recently shown impressive performance. However, these models neglect the importance of polysemous words in text. When polysemous words appear in a text, imprecise polysemous word embeddings will produce low-quality text representation that results in changing the original meaning of the text. To address this problem, in this paper, we present a more effective model architecture, the polyseme-aware vector representation model (PAVRM), to generate more precise vector representations for words and texts. The PAVRM can effectively identify polysemous words in a corpus with a context clustering algorithm. Additionally, we propose two methods to construct polysemous word representations, PAVRM-Context and PAVRM-Center. Experiments conducted on three standard text classification tasks and a custom text classification task demonstrate that the proposed PAVRM can be effectively introduced into existing models to generate higher-quality word and text representations to achieve better classification performance.
是否译文:
否