Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation
详细信息    查看全文
文摘
This study proposes a conceptual dynamic latent Dirichlet allocation (CDLDA) model for topic detection and tracking in conversational content. Topic detection and tracking is vital for conversational communication, especially for spoken interactions. Because topic transitions occur frequently during conversational communication (i.e., a conversation usually contains many topics), language processors must detect different topics in conversational content. Considering the structure of spoken dialogue, the dynamic model was employed in this study to capture the sequence of two adjacent topics in spoken content. The proposed model applies the proportions of verbs and nouns to analyze the similarity between utterances. An agglomerative clustering algorithm, based on an ontology defined in E-HowNet, clusters conversational utterances. Because the topic structure of conversational content is friable, E-HowNet uses hypernym relationships of speech acts to obtain robust solutions, even for sparse data. Compared with the traditional latent Dirichlet allocation (LDA) model, which detects topics only through a bag-of-words technique, the proposed model considers temporal features by introducing dynamic concepts. Experimental results revealed that the proposed approach outperformed the traditional DLDA and LDA and support vector machine models, in addition to achieving excellent performance for topic detection and tracking in conversations.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700