Cross-domain deception detection using support vector networks

详细信息查看全文

作者：Ángel Hernández-Castañeda ; Hiram Calvo ; Alexander Gelbukh…
关键词：Deception detection ; Continuous semantic space model ; Word ; space model ; Linguistic inquiry and word count ; Support vector networks
刊名：Soft Computing
出版年：2017
出版时间：February 2017
年：2017
卷：21
期：3
页码：585-595
全文大小：
刊物类别：Engineering
刊物主题：Computational Intelligence; Artificial Intelligence (incl. Robotics); Mathematical Logic and Foundations; Control, Robotics, Mechatronics;
出版者：Springer Berlin Heidelberg
ISSN：1433-7479
卷排序：21

文摘

Our motivation is to assess the effectiveness of support vector networks (SVN) on the task of detecting deception in texts, as well as to investigate to which degree it is possible to build a domain-independent detector of deception in text using SVN. We experimented with different feature sets for training the SVN: a continuous semantic space model source represented by the latent Dirichlet allocation topics, a word-space model, and dictionary-based features. In this way, a comparison of performance between semantic information and behavioral information is made. We tested several combinations of these features on different datasets designed to identify deception. The datasets used include the DeRev dataset (a corpus of deceptive and truthful opinions about books obtained from Amazon), OpSpam (a corpus of fake and truthful opinions about hotels), and three corpora on controversial topics (abortion, death penalty, and a best friend) on which the subjects were asked to write an idea contrary to what they really believed. We experimented with one-domain setting by training and testing our models separately on each dataset (with fivefold cross-validation), with mixed-domain setting by merging all datasets into one large corpus (again, with fivefold cross-validation), and with cross-domain setting: using one dataset for testing and a concatenation of all other datasets for training. We obtained an average accuracy of 86% in one-domain setting, 75% in mixed-domain setting, and 52 to 64% in cross-domain setting.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700