From the beginning of the 21st century people have been familiar to the network gradually,the uses of the internet have been in explosive growth.People could exchange variety of information and access digital information (films, literature, technical information), online services(online banking,herbmylife,and so on) easily through internet, but the pirate became also easier. So the management and protection of digital works become not only the urgent need to solve the problem,but also the request to ascertain the rights towards the illegal use of digital works.
     As a kind of technology for saving text digital information, text watermarking can protect the hiding information in text media and could be saved in datas retrievably so as to realize confirming the rights of the text and tailing the infringing behaviors to the text. Because text documents lack redundancies of the space fields and frequency fields, the hiding techniques in multimedia could not be used to generate text watermarking.The early methods of text steganography were relized by that human visual system can not recognize the minute changing of the texts physical formats, such as word space, line space and character font.Once the text is attacked by typeset or optical character recognition,the watermarking would be damaged and its applications would be limited.As a carrier text has its particularity,it belongs to the category of natural language.Now the research of text information hiding based on natural language processing become the main direction, many research institutions put a lot of manpower and resources to it.Until now a large number of results have emerged, however,a unified standard which can be accepted by all parties have not been formed.thus the research is challenging.
     The paper mainly concerned about Chinese texts watermarking,and proposes several methods for natural language steganography on word level and sentence level.The main contributions are summarized as follows:
     A method based on exchanging conjunctions is proposed. Based on the characters that the chaotic systems is sentitive to the initial values extremely and the text sematic features have tiny changes after synonyms' exchanging,the algorithm encrypt the original watermarking information through chaotic system,create the substitution table of synonym conjunctions,then replace the conjunctions in accordance with encrypted watermarking to realize hiding information. By the experiments' proof the watermarking has good invisibility and strong robustness to attacks.
     A method based on Chinese characters frequently used is proposed.Based on the steady emergence frequency of the frequently used characters, and the looming rules of 'de' word, according to the hiding information,the algorithm changes the parity value of character numbers between two frequently used words through adding or deleting'de',then the watermarking is embedded.lt has a wide range of applications and can not be effected by text types.The experiments results show that the watermarking has large capacity and better invisibility.
     A method on sentence layer is proposed.Because there are many sentence structures such as 'SBV(?)ADV' and 'SBV (?) ADV (?) POB' existing in text.The adverbs in 'ADV' structures and the prepositions in 'POB' structures have tight relations with head words of the text, they are always ideographic strongly and play a more importment role.Based on this character, combined with chaos encryption and sequence mapping,the information is embedded in the text by the synonyms'swapping of adverbs,the same as the prepositions.Combined with other similar algorithms through experiments.the algorithm has large watermarking capacity, strong robustness and good invisibility.
     A zero watermarking method is proposed that based on text key words and Chinese high frequency characters.Because the zero watermarking doesn't change the original text a little,while the key words and Chinese frequently used characters can reflect the importment features of the text,the algorithm realize the information hiding and copyrights reservation through encryption and adding time stamp by the third certification authority. For enhancing the robustness the algorithm add phonetic sequencing to reduce the effects of changing sequence by modifying the text. The method has a better ability to resist many kinds of attacks.
