用户名: 密码: 验证码:
多重数字文本水印的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,数字水印的研究主要集中在图像和音视频等领域,而对文本水印的研究较少。由于文本本身的特点,在其中加载水印的难度较大。已有的研究结果普遍存在非格式化文本难以嵌入水印、格式化文本的水印严重依赖文本的格式特征、水印难以深入到文本的内容中、水印嵌入容量不足、鲁棒性较差等问题。针对以上问题,本文提出了在文本中分别加载不同的水印,将自然语言处理技术、零水印、不对称加密和数字时间戳等技术应用于文本水印技术,来对文档进行版权保护、完整性验证、文档真伪鉴别等。具体说来,本文的主要工作如下:
     1.对目前文本水印的发展状况进行了较详细的分析、归纳和总结,提出了目前存在的一些需要解决的关键问题。在介绍了文本数字水印的概念、特点、研究现状的基础上,提出了一种在文本中加载多重水印的技术,该方法不仅可以加载验证版权信息的鲁棒性水印,还可以加载验证文本完整性以及有无篡改等信息的脆弱性水印。
     2.提出了一种新的基于中文分词的水印技术。利用自然语言处理技术中的中文分词对中文文档进行分词处理,在此基础上,剔除不重要的和无意义的词语,并提取数字摘要来唯一标注文档。同时对于每个文本段也提取其数字摘要,并将这两个数字摘要与作者版权信息同时嵌入每个文本段后的格式控制字符中。借鉴了生物界繁衍延续的方法,将水印信息多次重复嵌入,只要没有移除和破坏嵌入的所有水印,就能检测出水印信息。实验结果表明,该方法隐蔽性好,鲁棒性强,即使对文本进行格式攻击和存储攻击也不会丢失水印信息。
     3.给出了一种基于时间戳认证的零水印算法。在对文档进行预处理后,提取出能唯一标识文本的数字指纹,并与版权信息绑定后生成水印信息。然后向时间戳权威机构TSA申请时戳,TSA对用户的水印信息加盖时间戳,生成含有时间戳信息的零水印。将数字时间戳零水印与文件进行绑定后,就可以作为该文档的版权信息在某段时间内有效的证明。
     在同一个文档中同时嵌入几种水印方案时,要考虑这几种水印相互间的影响。本文提出的多重水印因为嵌入的载体不同,也没有改变文本的内容,因此这几种水印相互间没有影响。
At present, the research of digital watermarking technology mainly concentrated in the areas of images, audio and video etc, however less watermark research on text. Because of the special character of text, it’s very difficult to load watermark in it. The research results have been showed that classical algorithms have some problems such as unformatted text is difficult to embed watermarks, watermarks in formatted text are not related with text contents, the capacity of watermark is less and unmeasured, and the robustness is poor, etc. In view of these shortages, the paper presents a new method with loading different watermarks into one text document, which takes the advantages of natural language processing technology, zero-watermark, asymmetric encryption, and digital time-stamping technology. It can protect the copyright of text document, determine the integrity and the creditability of the text.
     The researches included in the paper can be summarized as follows:
     1. In this paper, the history of text watermark, research background and development status is firstly described, analysed and summarized. Then the shortage of existing algorithms is presented. After introducing the concept, character and research status of text watermark, a new technology of multiple text watermark is proposed. This new method not only can protect copyright information for text document with robustness watermark, but also can determine the integrity and the creditability of the text, furthermore, it can be able to locate the tamper in the text.
     2. A new watermark algorithm technology based on Chinese word segmentation is proposed. It uses Chinese word segmentation of natural language processing technology to segment the Chinese text document. Then deleting the unimportant and meaningless words, and extracting its digital fingerprint to only tag the text document. At the same time, we also extract the digital fingerprints from every segment, then embed the two digital fingerprints combined with the copyright information into the format control characters behind each segment. Drawing on the continuity of the proliferation of biological field, watermark information will be repeated embedded, and they can be detected as long as all embedded watermark have not been removed and destructed. The results of experimentation show that this algorithm is invisible and robust against various text processing attacks such as formatting and saving attack will not delete the watermark information.
     3. A zero-watermark algorithm based on time-stamping authority is presented. The algorithm will extract digital fingerprint to only tag the text document, after the Chinese text document is pretreatment,and encrypt it with the copyright information. Then, sending a request to time-stamping authority for a time-stamping. The time-stamping authority signs a time-stamping on the user’s messages, so a zero- watermark is registered in the database since then. The zero-watermark with a time-stamping is tied to the text document, so it can prove somebody has the copyright since sometime.
     When embedding several different watermarks into one text document, we must make sure that one of them won’t change or influence the others. In this paper, two digital text watermarks are embeded in different object areas with different methods, and don’t change the text content, so these watermarks are not be influenced by each other.
引文
[1] I.J.Cox, M.L.Miller. The First 50 Years of Electronic Watermarking. Eurasip Journal of Applied Signal Processing,2002,(2):126-132
    [2] 钮心忻.信息隐藏与数字水印的研究及发展.计算机教育,2005,1:22-24
    [3] Young-Won Kim, II-Seok Oh. Watermarking Text Document Images using Edge Direction Histograms. Pattern Recognition Letters,2004,25(11)
    [4] 易开祥,石敦英.数字水印技术研究进展.中国图像图形学报,2001,6(2):111-117
    [5] I.J.Cox, J.Killian, F.T.Leighton. Secure Spread Spectrum Watermarking for Multimedia. IEEE Transactions on Image Processing,1997,6(12):1673-1687
    [6] A.K.Bhattacharjya, H.Ancin. Data Emdedding in Text for a Copier System. IEEE International Conference on Image Processing,1999,2(10):245-249
    [7] J.T.Brassil, S.H.Low, N.F.Maxemchuk. Copyright Protection for the Electronic Distribution of Text Documents. Proceedings of the IEEE,1999,87(7):1181-1196
    [8] J.T.Brassil, S.H.Low, N.F.Maxemchuk. Electronic Marking and Identification Techniques to Discourage Document Copying. IEEE Journal on Selected Areas in Communications,1995,13(8):1495-1504
    [9] S.H.Low, N.F.Maxemchuk, A.M.Lapone. Document Identification for Copyright Protection using Centroid Detection. IEEE Transactions on Communications, 1998, 46(3):372-383
    [10] Ding Huang, Hong Yan. Interword Distance Changes Represented by Sine Waves for Watermarking Text Images. IEEE Trans. on Circuits and Systems for Video Technology,2001,11(12):1237-1245
    [11] Young-Won Kim, Kyung-Ae Moon, II-Seok Oh. A Text Watermarking Algorithm based on Word Classification and Interword Space Statistics. IEEE,2003,8: 775-779
    [12] 赵东宁,张勇,李德毅.基于云模型的文本数字水印技术.计算机应用,2003,23(12):100-102
    [13] 戴祖旭,洪帆,李小刚.文本文档水印质心检测方法的改进.计算机应用,2007,27(5):1064-1066
    [14] Nopporn Chotikakamthorn. Electronic Document Data Hiding Technique using Intercharacter Space. The 1998 IEEE Asia Pacific Conf. on Circuits and Systems, 1998,419-422
    [15] 曹卫兵,戴冠中,夏煜等.基于文本的信息隐藏技术.计算机应用研究,2003,20(19):39-41
    [16] 刘玉玲,孙星明.通过改变文字大小在 Word 文档中加载数字水印的设计与实现.计算机工程与应用,2003,41(12):110-112
    [17] 吴悠,孙星明.基于正弦波的 Word 文档数字水印.计算机工程,2005,31(24):175-176
    [18] 张静,张春田.用于 PDF 文档认证的数字水印算法.天津大学学报,2003,36(2):215-219
    [19] 戈英民,郑岗.一种利用字符特征变化的文本数字水印方法.微型电脑应用,2005,21(3):36-39
    [20] 罗纲,孙星明.汉字数学表达式开发平台的设计与实现.计算机工程与应用,2005,5(4):113-116
    [21] 孙星明,陈火旺.汉字的数学表达式研究.计算机研究与发展,2002,39(6):701-711
    [22] Mikhail J.Atallah, V.Raskin. Natural Language Processing for Information Assurance and Security. New York: ACMPress,2000,51-65
    [23] 张宇,刘挺,陈毅恒等.自然语言文本水印.中文信息学报,2005,19(1):56-62
    [24] 眭新光,罗慧.一种安全的基于文本的信息隐藏技术.计算机工程,2004,30(19):104-106
    [25] L.Qiao, K.Nahrstedt. Watermarking Schemes and Protocols for Protecting Rightful Ownerships and Customer's Rights. Journal of Visual Communication and Image Representation,1998,9(3):194-210
    [26] 张勇,赵东宁,李德毅.数字水印技术及进展.解放军理工大学学报,2003,4(3):3-7
    [27] S.Craver. Resolving Rightful Ownerships with Invisible Watermarking Techniques: Limitations, Attacks, and Implications. IEEE Journal of Selected Areas in Communications,1998,16(4):573-586
    [28] 陈明奇,钮义忻,杨义先.数字水印的研究进展和应用.通信学报,2001,22(5):71-79
    [29] 杨义先,钮心忻.数字水印理论与技术.北京:高等教育出版社,2006,365-375
    [30] 卢炜.网络环境下数字版权保护安全协议研究:[华中科技大学硕士学位论文].武汉:华中科技大学,2004
    [31] J.T.Brassil, L.O Gorman. Watermarking Document Images with Bounding Box Expansion. Information Hiding of Lecture Notes in Computer Science,1996, 227-235
    [32] N.F.Maxemchuk, S.H.Low. Marking Text Documents. International Conference on Image Processing. California,1997,10:26-29
    [33] S.H.Low, N.F.Maxemchuk, J.T.Brassil. Document Marking and Identification using both Line and Word Shifting. Proceedings of Infocom95.Boston:MA,1995
    [34] N.F.Maxemchuk. Electronic Document Distribution. AT&T Technical Journal, 1994,73(5):73-80
    [35] Adnan M.Alattar, Osama M.Alattar. Watermarking Electronic Text Documents Containing Justified Paragraphs and Irregular Line Spacing. In: Security, Steganography, and Watermarking of Multimedia Contents. SPIE,2004,685-695
    [36] 刘东,孙明,周明天.基于图论的文本数字水印技术.计算机研究与发展,2007,44(10):1757-1764
    [37] 刘东,陈松,周明天.基于字符拓扑结构的文本数字水印技术.小型微型计算机系统,2007,(5):812-815
    [38] 董相志,柳岸,苏庆堂.一种大容量鲁棒性中文文本数字水印算法.计算机应用,2007,27(S1):229-231
    [39] Bender W. Techniques for data hiding. IBM Systems Journal,1996,35(3):313-336
    [40] 赵敏之,孙星明,向华政.基于虚词变换的自然语言信息隐藏算法研究.计算机工程与应用,2006,3:158-160
    [41] Shingo Inoue, Ichiro Murase, Osamu Takizawa. A Proposal on Information Hiding Methods using XML. Kyoko Makino,2002,12(6):1-2
    [42] 周莉,王炼红,李丽娟.一种基于 XML 文档的数字水印方案.湖南大学学报(自然科学版),2007,34(5):83-86
    [43] Mikhail.J.Atallah, V.Raskin, M.Crogan. Natural Language Watermarking: Design, Analysis and a proof-of-concept Implementation. Pittsburgh: Fourth International Information HidingWorkshop,2001,4:185-199
    [44] Mikhail J.Atallah, S.Wagstaff. Watermarking Data using Quadratic Residues. Working Paper, Department of Computer Science Purdue University,1996
    [45] Xingming Sun, Alex Jessey Asiimwe. Noun-Verb Based Technique of Text Watermarking using Recursive Decent Semantic Net Parsers. Lecture Notes in Computer Science,2005,958-961
    [46] S.Nirenburg, V.Raskin. Ontological Semantics. Cambridge, Massachusetts: The MIT Press Publisher, 2004,141-159
    [47] Mikhail J.Atallah, V.Raskin, C.Hempelmann, et al. Natural Language Watermark and Tamperproofing. Fifth Information Hiding Workshop,2002, 196-212
    [48] 白拴虎.汉语词切分及标注一体化方法.计算机语言学进展与应用.北京:清华大学出版社,1995,56-61
    [49] 刘开瑛.中文文本自动分词和标注.北京:商务印书馆,2000
    [50] 杨超.基于最大匹配的书面汉语自动分词研究:[湖南大学硕士学位论文].长沙:湖南大学,2004,12
    [51] 王显芳,杜利民.一种能够检测所有交叉歧义的汉语自动分词算法.电子学报,2004,32(1):50-54
    [52] 郑德权,于凤.基于汉语二字应成词的歧义字段切分方法.计算机工程与应用,2003,(1):17-26
    [53] Microsoft Technical Support. Microsoft Office Word 2003 Rich Text Format Specification White Paper. Microsoft Technical Support,2004,8
    [54] 俞士汶.现代汉语语法信息词典详解(第 2 版).北京:清华大学出版社,2003
    [55] 俞士汶,朱学锋,刘云. 面向自然语言理解的汉语虚词研究.第十一届全国民族语言文字信息学术研讨会,2007,270-279
    [56] 温泉,孙锬锋,王树勋.零水印的概念与应用.电子学报,2003,2(2):214-216
    [57] 杨树国,李春霞,孙枫等.小波域内图像零水印技术的研究.中国图像图形学报,2003,8(6):554-669
    [58] 张科伟,唐晓波.时间戳协议研究.计算机应用研究,2004,(10):100-103
    [59] 蔡 方 萍 , 张 毅 萍 . 电 子 文 档 时 间 戳 的 新 探 讨 . 计 算 机 工 程 , 2006 ,32(15):140-142
    [60] 刘军,吴贵臣,翁亮.安全电子时间戳系统的设计方案.通信学报,2003,24(2):64-70
    [61] 李雷达,郭宝龙.一种新的多重水印算法.光电工程,2007,34(2):74-78
    [62] 袁树雄,孙星明.英文文本多重数字水印算法设计与实现.计算机工程学报,2006,32(15):146-148
    [63] 李敏,费耀平.基于置乱变换的多重数字水印盲算法.计算机工程学报,2006,32(16):122-124

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700