详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     (1)基于统计语言模型的短文本计算。针对短文本包含字符少、文本语言不规范、文本数量巨大的特点,本文提出了一种基于N-gram的特征提取和RPCL(Rival Penalized Competitive Learning)的短文本聚类算法。首先进行基于字符级的N-gram特征提取,即从未分词的语料中抽取中文块。中文块可以是一个汉字、一个词或者字符串,这样,中文块不但可以表达短文本的语义信息,而且能够保留语序结构和字符之间的依赖。然后通过统计子串约减和互信息过滤得到候选中文块集合。最后,使用一种神经网络聚类算法RPCL对短文本进行聚类。实验结果表明,这种基于N-gram的特征提取和RPCL的短文本聚类算法能够有效的对短文本聚类,并能有效的降低特征的维度。
With the rapid development of Internet and communication networks, web documents have become one of the major modern information media as well as an indispensable information source in people's lives. Text mining has become a technology of great research and practical significance. While the Web2.0 is coming, more and more users are involved in the generation of information, and more and more personal opinioned contents are full of the Internet. Such contents are meaningful and valuable for many applications, such as e-commerce, network community, network information security, web search engine and so on. However, it is enormous challenges to process these texts by traditional text mining.
     In this dissertation, three problems are investigated, which includes short text computing, web text information extraction, and text sentiment analysis. The main contributions of this dissertation are summarized as follows:
     (1) Short text computing based on statistical language model. We introduce an algorithm to cluster Chinese short texts based on N-gram feather extraction. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feather extraction, statistical substring reduction and mutual information filtering to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which needs not know the exact number of clusters. Experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.
     (2) Web text information extraction based on keyword recommendation system and sentiment analysis. In keyword recommendation system in advertisement, we propose a semi-supervised Chinese compounds extraction approach based on HMM using bootstrapping in this paper. First, we define a set of tags BEMI {beginning, end, middle, independence}, which means the position of words in compounds. Then we employ HMM to extract compounds automatically in BEMI tagging algorithm. We rank the Compounds extracted from corpus by their word frequency and length in descending order, and add top N compounds in seed compounds list. The algorithm learns more Chinese compounds from corpus by bootstrapping. Experimental results show that this approach get much higher performance than Unsupervised one. Different from those extracted by traditional methods, these Chinese compounds contain category information, which can be used in text classification/clustering as features. Also, this approach can be applied in keyword recommendation system in advertisement for different kinds of advertisers because of its expansibility and versatility.
     For word level sentiment analysis, we propose an algorithm based on Maximum Entropy model and LMR template. LMR template is used to tag word position. Words, word position and POS are used as feature in ME. A text window sides and the sentiment of the word in M poisiton is labeled. Experimental results show that this algorithm has good performance in sentiment word extraction. And, this algorithm is robust in some feature combination.
     (3) Text sentiment classification based on supervised and semi-supervised learning. Most of pop music songs have suited lyrics, which play an essential role to semantically understand songs. Therefore, analysis of lyrics must be a complement of acoustic methods for music retrieval. One basic aspect of music retrieval is music emotion classification by learning from lyrics. This problem is different from traditional text classification in that more linguistic or semantic information is required for better emotion analysis. We investigate the lyrics corpus based on Zipf's Law using word as a unit, and results roughly obey Zipf's Law. Thereby, we study three kinds of preprocessing methods (including different N-grams, deleting stop words, and filtering based on POS) and a series of language grams under the well-known N-gram language model framework to extract more semantic features. Besides that, we also improve Maximum Entropy model with Gaussian and exponential priors to model features for music emotion classification. Experimental results show that feature extraction methods improved music emotion classification accuracy. ME with priors obtained the best results.
     Since labeled data in sentiment classification is scarce, we are interested in such situation. We introduce a novel semi-supervised learning algorithm to address such task. We assume that there is a sentiment manifold structure, and documents are sampled from such manifold. We do so by creating a graph on both labeled and unlabeled data, which is linearly constructed by data points' neighborhood information. Then, labels are spread though the graph, which is regarded as probabilistic transition matrix in the process of spread. This algorithm is capable for learning sentimental manifold structures within texts. Promising experimental results are shown in lyrics and movie review data.
     (4) Opinion retrieval. Following the Chinese Opinion Analysis Evaluation (COAE2008), we discuss text opinion retrieval. Our sentiment analysis system named PRIS-SAS employ a two-stage approach. After preprocessing, corpus given by COAE2008 is indexed by Indri retrieval system, which is used to ad-hoc retrieval. And then sentiment model and polarity model trained by ME with priors are used to classify text returned by Indri. The retrieval results are reranked by classification results. Experiments on COAE2008 datasets show that, the system proposed in this dissertation is a state-of-the-art opinion retrieval system.
    [5]Sebastiani F.Machine learning in automated text categorization.ACM Computing Surveys(CSUR),Vol34,Issue 1,2002,1-47.
    [7]Baeza-Yates R,Ribeiro-Neto B.Modern information retrieval.ACM Press,1999.
    [8]Manning C D,Schutze H.Foundations of statistical natural language processing.The MIT Press,1999.
    [9]Crimmins F,Smeaton A F,Dkaki T,et al.TetraFusion:information discovery on the Interact.IEEE Intelligent Systems and their Applications,Volume 14,Issue 4,2002,55-62.
    [10]Assis F,Yerazunis W,Siefkes C,et al.CRM114 versus Mr.X:CRM114 notes for the TREC 2005 spam track.In Proceedings of 14th Text Retrieval Conference,2005.
    [11]Lewis D D.Naive(Bayes) at Forty:The Independence Assumption in Information Retrieval.In Proceedings of the 10th European Conference on Machine Learning New York,1998,4-15.
    [12]Eyheramendy S,Lewis D D,Madigan D.On the naive bayes model for text categorization.Artificial Intelligence & Statistics 2003.
    [13]Peng F,Schuurmans D.Combining naive bayes and n-gram language models for text classification.Proceedings of the 25th European Conference on Information Retrieval Researeh(ECIR03).April,2003,Pisa,Italy,335-350.
    [14]Yang Y.An evaluation of statistical approaches to text categorization.Information Retrieval,1999,1(1),76-88.
    [15]Cohen W,Singer Y.Context-sensitive learning methods for text categorization.In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996,307-315.
    [16]Lewis D D,Schapire R E,Callan J P,et al.Training algorithms for linear text classifiers.In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996,298-306.
    [17]Yang Y,Chute C G.A linear least squares fit mapping method for information retrieval from natural language texts.In Proceedings of the 14th Conference on Computational Linguistics(COLING92),1992.
    [18]Nigam K,Lafferty J,McCallum A.Using maximum entropy for text classification.In Proc.of the Int.Joint Conf.on Artificial Intelligence IJCAI-99 Workshop on Machine Learning for Information Filtering,1999,61-67.
    [19]Chen B,He H,Guo J.Constructing maximum entropy language models for movie review subjectivity analysis.Journal of Computer Science and Technology(JCST),23(2),2008,231-239.
    [20]Joachims T.Text categorization with support vector machines:learning with many relevant features.In Proceedings of 10th European Conference on Machine Learning,1998,137-142.
    [21]Hsu C,Lin C.A comparison on method for multi-class support vector machines.IEEE Transactions on Neural Networks,2002(13),415-425.
    [22]Jain A,Dobes R.Algorithms for clustering data.Engle-wood Cliffs,NJ:Prentice Hall,1998.
    [23]Chu S,Roddick J,Chen T,Pan J.Efficient search approaches for k-medoids-based algorithms,In Proc.of TENCON'02,1,2002,721a-715a.
    [24]Zhang B,Li H,Liu Y,et al.Improving web search results using affinity graph.SIGIR'05,2005,15-19.
    [25]Xue G,Lin C,Yang Q,et al.Scalable collaborative filtering using cluser-based smoothing.SIGIR'05,2005,114-121.
    [26]Cilibrasi R,Vitanyi P.The Google similarity distance.IEEE Transactions on Knowledge and Data Engineering,19(3),2007,370-383.
    [27]Xu R,Wunsch D.Survey of clustering algorithms.IEEE Transactions on Neural Networks,16(3),2005,645-678.
    [32]Sager N.Natural Language Information Processing.Reading,Massachusetts:Addison Wesley,1981.
    [33]Dejong G.An Overview of the FRUMP System.In:LEHNERT W,RINGLE M H eds.Strategies for Natural Language Processing,Lawrence Erlba(?)m,1982:142-176.
    [34]Grishman R,Sundheim B.Message Understanding Conference-6:A Brief History.In Proceedings of the 16th International Conference on Computational Linguistics (COING-96),1996,08.
    [35]Automatic Content Extraction(ACE),http://www.nist.gov/speech/tests/ace/
    [36]Freitag D.Information extraction from html:Application of a general learning approach In Proceedings of the 15th Conference on Artificial Intelligence(AAAI-98),1998:pp.517-523.
    [37]Muslea I,Minton S,Knoblock C.A hierarchical approach to wrapper induction.In Proceedings of third International Conference on Autonomous agents(AA-1998),1998.
    [38]Kim J,MoNovan D.Acquisition of Semantic Patterns for information Extraction from corpora.In Proceedings of the ninth lEE Conference on Artificial Intelligence for Applications,Los Alamitos,CA,IEEE Computer Society Press,1993:pp.171-176.
    [39]Chen H H,Ding Y W,Tsai Sc et al.Description of the NTU system Used for MET2.In Proceedings of the Seventh Message Understanding Conference,1998.
    [40]Zhang Y M,Zhou J F.A Trainable Method for Extracting Chinese Entity Names and Their Relations.In Proceedings of the Second Chinese Language Processing Workshop,Hong Kong,2000-10.
    [42]李效东,顾毓清.基于DOM的Web信息提取.计算机学报,Vol.25(5),2002, pp.526-532.
    [47]Wong S K M,Ziarko W,Raghavan V V,et al.Generalized vector spaces model in information retrieval.In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1985,18-25.
    [48]Raghavan V V,Wong S K M.A critical analysis of vector space model in information retrieval.Journal of the American Society for Information Science,37(5),1986,279-287.
    [49]Van gijsbergen C J.A new theoretical framework for information retrieval.In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval,1986,194-200.
    [51]Song F,Croft W B.A general language model for information retrieval.In Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval,1999,279-280.
    [55]Text REtrieval Conference(TREC).http://trec.nist.gov/
    [56]NII-NACSIS Text Collection for IR Systems(NTCIR).http://research.nii.ac.jp/ntcir/
    [3]龚春才,张华平,许洪波,程学旗,白硕.中文短文本流的快速编码识别算法.In Proceeding of the 7~(th) International Conference of Chinese Computing,2007,772-776.
    [4]Nie J,Gao J.On the use of words and N-grams for Chinese information retrieval.The 5~(th) International Workshop on Information Retrieval with Asian Languages,2000.
    [5]Baeza-Yates R,Ribeiro-Neto B.Modern information retrieval.ACM Press,1999.
    [6]Maron M.E.On relevance,probabilistic indexing and information retrieval.Journal of ACM,7(3):216-244,1960.
    [8]Sebastiani E Machine learning in automated text categorization:a survey.Tech.Rep.IEI-B4-31-1999,Istituto di Elaborazione dell'Informazione,Consiglio Nazionale delle Ricerche,Pisa,IT,1999.
    [9]Yang Y,Pedersen J.O.A comparative study on feature selection in text categorization.The 14~(th) International Conference on Machine Learning,1997:412-420.
    [10]Nigam K,Lafferty J,McCallum A.Using maximum entropy for text classification.IJCAI-99 Workshop on Machine Learning for Information Filtering,1999:61-67.
    [11]Church K.W,Hanks P.Words association norms,mutual information and lexicography.Computational Linguistics,1989,16(1):22-29.
    [12]Dunning T.E.Accurate methods for the statistics of surprise and coincidence.Computional Linguistics,1993,19(1):61-74.
    [13]Mladenic D,Grobelnik M.Feature selection for classification based on text hierarchy.Workshop on Learning from Text and the Web,1998.
    [14]Ruizetal M.E.Automatic text categorization using neural networks.The 8~(th) ASIS SIG/CR Classification Research,1997,8:59-72.
    [15]Tokunaga T,Iwayama M.Text categorization based on weighted inverse document frequency.SIG-IPS Japan,1994,100(5).
    [16]Kolcz A,Prabakarmurthi V,Kalita J.Summarization as feature selection for text categorization.The 10~(th) International Conference on Information and Knowledge Management,2001:365-370.
    [17]Dash M,Liu H.Feature selection for clustering.PAKDD'00,2000,110-121.
    [18]Rogati M,Yang Y.High-performing feature selection for text classification.CIKM'02,2002:659-661.
    [19]Dy J.G,Brodley C.E.Feature subset selection and order identification for unsupervised learning.ICML'00,2000:247-254.
    [20]Talavera L.Dependency-based feature selection for clustering symbolic data.
    [25]Nagao M,Mori S.A new method of N-gram statistics for large number of N and automatic extraction of words and phrases from large text data of Japanese.COLING-94,Kyoto,1994,611-615.
    [26]Fung P,Wu D.Statistical augmentation of a Chinese machine-readable dictionary.COLING-94,Kyoto,1994,69-85.
    [27]Zhang L,Lu X Q,Shen Y N,Yao T S.A statistical approach to extract Chinese chunk candidates from large corpora.ICCPOL2003,2003,109-117.
    [28]L(u|¨) X Q,Zhang L,Hu J F.Statistical substring reduction in linear time.IJCNLP2004,2004,320-327.
    [29]Han J,Kamber M.Data Mining:Concepts and Techniques.Morgan Kaufmann Publishers,San Francisco,2001.
    [30]Xu L,Krzyzak A,Oja E.Rival penalized competitive learning for clustering analysis, RBF Net,and Curve Detection.IEEE Transactions on Neural Networks,1993,636-649.
    [31]Xu L,Krzyzak A,Oja E.Unsupervised and supervised classification by rival penalized competitive learning.In the 11~(th) Proceeding of International Conference on Pattern Recognition,1992,492-496.
    [32]Ma J W,Wang T J,Xu L.Convergence analysis of rival penalized competitive learning(RPCL) algorithm.In:Proceedings of the 2002 International Joint Conference on Neural Network,2002,1596-1601.
    [33]Law L T,Cheung Y M.Color image segmentation using rival penalized controlled competitive learning.In:Proceedings of the 2003 International Joint Conference on Neural Networks,2003,108-112.
    [36]Chen B,He H,Xu W R,Guo J.POC-NLW template based tagging method for Chinese word segmentation.Proceeding of the 2006 International Conference on Computational Intelligence and Security,Guangzhou,China,2006,1423-1428.
    [2]Bahl L,Jelinek F,Mercer R.A maximum likelihood approach to continuous speech recognition.IEEE Trans.on Pattern Analysis and Machine Intelligence,5(2),1983,179-190.
    [4]Rabiner LR.A tutorial on hidden Markov models and selected applications in speech recognition.In Proc.of the IEEE,77(2),1989,257-286.
    [5]Miller D R H,Leek T,Schwartz R M.A hidden Markov model information retrieval system.Proceedings of the 22~(nd) annual international ACM SIGIR conference on Research and Development in Information Retrieval,1999,214-221.
    [6]McCallum A,Freitag D,Pereira F.Maximum Entropy Markov Models for information extraction and segmentation.In Proceedings of International Conference on Machine Learning(ICML00),2000,591-598.
    [7]Brants T.TnT:a statistical part-of-speech tagger.In Proc.of the 6th Conf.on Applied Natural Language Processing,2000,224-231.
    [8]Ray S,Craven M.Representing sentence structure in Hidden Markov Models for information extraction.Proceedings of the 17~(th) International Joint Conference on Artificial Intelligence(IJCAI01),2001.
    [10]Della Pietra S,Delta Pietra V,Mercer R L,et al.Adaptive language modeling using minimum discriminant estimation.In Proceedings of the Speech and Natural Language DARPA Workshop,1992.
    [11]Berger A L,Della Pietra S A,Della Pietra V J.A maximum entropy approach to natural language processing.Computational Linguistics,1996,22(1),39-71.
    [12]Ratnaparkhi A.A maximum entropy Part-Of-Speech tagger.In Proceedings of the Conference on Empirical Methods in Natural Language Processing,1996,17-18.
    [13]Reynar J C,Ratnaparkhi A.A maximum entropy approach to identifying sentence boundaries.In Proceedings of the Fifth Conference on Applied Natural Language Processing,1997,16-19.
    [14]Koeling R.Chunking with maximum entropy models.In Proceedings of CoNLL-2000 and LLL-2000,2000,139-141.
    [15]Luo X Q,Ittycheriah A,Jing H Y,et al.A mention-synchronous coreference resolution algorithm based on the Bell Tree.In Proceedings of ACL 2004.
    [16]Nigam K,Lafferty L,McCallum A.Using maximum entropy for text classification.In IJCAI-99 Workshop on Machine Learning for Information Filtering,1999.
    [17]Ittycheriah A,Roukos S.IBM's statistical question answering system for Trec-11.In Proceedings of the TREC-11 conference,NIST,2002,394-401.
    [18]Della Pietra S,Della Pietra V,Lafferty J.Inducing features of random fields.IEEE Transactions on Pattern Analysis and Machine Intelligence,19(4),1997,380-393.
    [20]Zhang J,Gao J F,Zhou M.Extraction of Chinese compound words:an experimental study on a very large corpus.Proceedings of the 2~(nd) Workshop on Chinese Language Processing:Held in Conjunction with the 38~(th) Annual Meeting of the Association for Computational Linguistics,2000,Vol.12,132-139.
    [24]Xue N.Chinese Word Segmentation as Character Tagging.In International Journal of Computational Linguistics and Chinese Language Procession,2003,8(1),29-48.
    [26]Chen B,He H,Xu W R,et al.POC-NLW Template Based Tagging Method for Chinese Word Segmentation,Proceeding of the 2006 International Conference on Computational Intelligence and Security,Guangzhou,China,2006,pp.1423-1428.
    [28]Horrigan J A.Online shopping.Pew Internet & American Life Project Report,2008.
    [29]Kelsey group.Online consumer-generated reviews have significant impact on offline purchase behavior.Press Release,November 2007.
    [30]Hatzivassiloglou V,McKeown R.Predicting the semantic orientation of adjectives.Proceedings of the 35~(th) Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics,Marid,1997,174-181.
    [31]Turney P D.Thumbs up or thumbs down?:Semantic orientation applied to unsupervised classification of reviews.Proceedings of the 40~(th) Annual Meeting on Association for Computational Linguistics,Philadelphia,2002,417-424.
    [32]Riloff E,Wiebe J.Learning extraction patterns for subjective expressions.Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing,2003,70-77.
    [33]Kanayama H,Nasukawa T.Fully automatic lexicon expansion for domain-oriented sentiment analysis.Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing(EMNLP06),2006,355-363.
    [34]KajiN,Kitsuregawa M.Building lexicon for sentiment analysis from massive collection of HTML documents.Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007,1075-1083.
    [36]Yao T F,Lou D C.Research on semantic orientation distinction for Chinese sentiment words.The 7~(th) International Conference on Chinese Computing,Wuhan,2007.
    [3]Wiener E.A neural network approach to topic spotting.In Proceedings of the 4~(th)Annual Symposium on Document Analysis and Information Retrieval(SDAIR95),1995.
    [4]Apte C,Damerau P,Weiss S.Text mining with decision rules and decision trees.In Proceedings of the Conference on Automated Learning and Discovery Workshop 6:Learning from Text and the Web,1998.
    [5]Lent B,Swami A,Widom J.Clustering association rules.In Proceedings of the 13~(th)International Conference on Data Engineering(ICDE97),1997.
    [6]Lewis D D.Na(i|¨)ve Bayes at forty:the independence assumption in information retrieval.In Proceedings of the 10~(th) European Conference on Machine Learning,1998,4-15.
    [7]Eyheramendy S,Lewis D D,Madigan D.On the Na(i|¨)ve Bayes model for text categorization.Artificial Intelligence & Statistics,2003.
    [8]Peng F,Schuurmans D.Combining Na(i|¨)ve Bayes and N-gram language models for text classification.In Proceedings of the 25~(th) European Conference on Information Retrieval Research(ECIR03),2003,14-16.
    [9]Yang Y.An evaluation of statistical approaches to text categorization.Information Retrieval,1999,1(1),76-88.
    [11]Joachims T.Text categorization with support vector machines:learning with many relevant features.In Proceedings of the 10~(th) European Conference on Machines Learning,1998,137-142.
    [12]Hsu C,Lin C.A comparison on methods for multi-class support vector machines,IEEE Transactions on Neural Networks,2002,13,415-425.
    [13]Nigam K,Lafferty L,McCallum A.Using maximum entropy for text classification.In IJCAI-99 Workshop on Machine Learning for Information Filtering,1999.
    [15]Berger A.Error-correcting output coding for text classification.In Proceedings of International Joint Conference on Artificial Intelligence:Workshop on Machine Learning for Information Filtering,1999.
    [16]Ghani R.Using error-correcting codes for text classification.In Proceedings of the 17~(th) International Conference on Machine Learning,2000.
    [17]Platt J,Cristianini N,Shawe-Taylor J.Large margin DAGs for multiclass classification.Advances in Neural Information Processing Systems,2000,12,547-553.
    [18]Zhu X J.Semi-supervised learning literature survey.Computer Sciences,University of Wisconsin-Madison,Tech.Rep.,2007
    [21]Nigam K,McCallum,A K,Thrun S,et al.Text classification from labeled and unlabeled documents using EM.Machine Learning,2000,39,103-134.
    [22]Nigam K.Using unlabeled data to improve text classification(Technical Report CMU-CS-01-126).Carnegie Mellon University,Doctoral Dissertation,2001.
    [23]Baluja S.Probabilistic modeling for face orientation discrimination:learning from labeled and unlabeled data.Neural Information Processing Systems,1998.
    [24]Fujino A,Ueda N,Saito K.A hybrid generative/discriminative approach to semi-supervised classifier design.The Twentieth National Conference on Artificial Intelligence(AAAI05),2005.
    [25]Yarowsky D.Unsupervised word sense disambiguation rivaling supervised methods.Proceedings of the 33~(rd) Annual Meeting of the Association for Computational Linguistics,1995,189-196.
    [26]Riloff E,Wiebe J,Wilson T.Learning subjective nouns using extraction pattern bootstrapping.Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003),2003.
    [27]Blum A,Mitchell T.Combining labeled and unlabeled data with co-training.COLT: Proceedings of the Workshop on Computational Learning Theory,1998.
    [28]Maeireizo B,Litman D,Hwa R.Co-training for predicting emotions with spoken dialogue data.The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics(ACL),2004.
    [29]Joachims T.Transductive inference for text classification using support vector machines.In Proceedings of the 16~(th) International Conference on Machine Learning,1999,200-209.
    [30]Lawrence N D,Jordan M I.Semi-supervised learning via Gaussian processes.Advances in Neural Information Processing Systems,2005,17.
    [31]Szummer M,Jaakkola T.Information regularization with partially labeled data.Advances in Neural Information Processing Systems,2002,15.
    [32]Zhu X.Semi-supervised learning with graphs.Doctoral dissertation,Carnegie Mellon University,CMU-LTI-05-192,2005.
    [33]Blum A,Lafferty J,Rwebangira M,et al.Semi-supervised learning using randomized mincuts.In Proceedings of the 21~(st) International Conference on Machine Learning (ICML04),2004.
    [34]Zhu X J,Ghahramani Z,Lafferty J.Semi-supervised learning using Gaussian fields and harmonic functions.In Proceedings of the 20~(th) International Conference on Machine Learning(ICML03),2003.
    [35]Zhou D,Bousquet O,Lal T,et al.Learning with local and global consistency.Advances in Neural Information Processing System 16,2004.
    [36]Belkin M,Matveeva I,Niyogi P.Regularization and semi-supervised learning on large graphs.COLT,20O4.
    [37]Belkin M,Niyogi P,Sindhwani V.Manifold regularization:a geometric framework for learning from examples.Technical Report TR-2004-06,University of Chicago,2004.
    [38]Wang F,Zhang C.Label propagation through linear neighborhoods.IEEE Transactions on Knowledge and Data Engineering,2008,Vol20,Issue 1,55-67.
    [39]Hearst MA.Direction-based text interpretation as an information access refinement.In Text-based intelligent systems:current research and practice in information extraction and retrieval,Lawrence Erlbaum Associates,Inc.,Mahwah,NJ,1992.
    [40]Sack W.On the computation of point of view.In Proc.of the 12th National Conf.on Artificial Intelligence,vol.2,1994.
    [41]Finn A,Kushmerick N,Smyth B.Genre classification and domain transfer for information filtering.In Proc.of the 24th BCS-IRSG European Colloquium on IR Research:Advances in Information Retrieval,2002,353-362.
    [42]Wiebe J,Bruce R,Bell M,et al.A corpus study of evaluative and speculative language.In Proc.of the 2nd SIGdial Workshop on Discourse and Dialogue,Vol.16,2001,1-10.
    [43]Bruce R,Wiebe J.Recognizing subjectivity:a case study in manual tagging.Natural Language Engineering,5(2),1999,1-16.
    [44]Wiebe J,Riloff E.Creating subjective and objective sentence classifiers from unannotated texts.In Proc.of the 6th Int.Conf.on Computational Linguistics and Intelligent Text Processing,2005,486-497.
    [45]Subasic P,Huettner A.Affect analysis of text using fuzzy semantic typing.IEEE Trans.on Fuzzy Systems,9(4),2001,483-496.
    [46]Das S R,Chen M.Yahoo! for Amazon:sentiment extraction from small talk on the web.In Proc.of the 8th Asia Pacific Finance Association Annual Conf.,2001.Available at http://scumis.scu.edu/srdas/chat.pdf
    [47]Turney P D.Thumbes up or thumbs down? Semantic orientation applied to unsupervised classification of reviews.In Proc.of the 40th Annual Meeting of the Association for Computational Linguistics,2002,417-424.
    [48]Turney P D,Littman ML.Measuring praise and criticism:inference of semantic orientation from association.ACM Transactions on Information Systems,21(4),2003,315-346.
    [49]Liu H,Lieberman H,Selker T.A model of textual affect sensing using real-world knowledge.In Proc.of the 11th Int.Conf.on Intelligent User Interface,2003,125-132.
    [51]Pang B,Lee L,Vaithyanathan S.Thumbs up? Sentiment classification using machine learning techniques.In Proc.Conf.on Empirical Methods in Natural Language Processing,2002,79-86.
    [52]B,Lee L.A sentimental education:sentiment analysis using subjectivity summarization based on minimum cuts.In Proc.of the 42nd Meeting of the Association for Computational Languages,2004,271-278.
    [53]Pang B,Lee L.Seeing stars:exploiting class relationships for sentiment categorization with respect to rating scales.In Proc.of the 43rd Annual Meeting on Association for Computational Linguistics,2005,115-124.
    [54]Liu B,Hu M,Cheng J.Opinion observer:analyzing and comparing opinions on the web.In Proc.of the 14th Int.Conf.on World Wide Web,2005,342-351.
    [55]Hu M,Liu B.Mining and summarizing customer reviews.In Proc.of the 10th ACM SIGKDD Iint.Conf.on Knowledge Discovery and Data Mining 2004,168-177.
    [56]Hu M,Liu B.Mining opinion features in customer reviews.In Proc.of the 19th National Conf.on Artificial Intelligence(AAAI-2004),2004,755-760.
    [57]Lin WH,Wilson T,Wiebe J,et al.Which side are you on? Identifying perspectives at the document and sentence levels.In Proc.of the 10th Conf.on Computational Natural Language Learning,2006,109-116.
    [58]Whitelaw C,Garg N,Argamon S.Using appraisal groups for sentiment analysis.In Proc.of the 14th ACM Int.Conf.on Information and Knowledge Management,2005,625-631.
    [59]Yi J,Nasukawa T,Bunescu R,et al.Sentiment analyzer:extracting sentiments about a given topic using natural language processing techniques.In Proc.of the 3rd IEEE Int.Conf.on Data Mining,2003,427-434.
    [60]Goldberg A B,Zhu X.Seeing stars when there aren't many stars:Graph-based semi-supervised learning for sentiment categorization.In Proc.of HLT-NAACL 2006Workshop on Textgraphs:Graph-based Algorithms for Natural Language Processing,2006,45-52.
    [61]Mei Q,Ling X,Wondra M,et al.Topic sentiment mixture:modeling facets and opinions in Weblogs.In Proc.of the 16th Int.Conf.on World Wide Web,2007,171-180.
    [62]Ni X,Xue G,Ling X,et al.Exploring in the Weblog space by detecting informative and affective articles.In Proc.of the 16th Int.Conf.on World Wide Web,2007,281-290.
    [65]Chen B,He H,Guo J.Constructing maximum entropy language models for movie review subjectivity analysis.Journal of Computer Science and Technology(JCST),23(2),2008,231-239.
    [66]Huron D.Perceptual and cognitive applications in music information retrieval.In Proc.Int.Symp.Music Information Retrieval,2000.
    [67]Li T,Ogihara M.Toward intelligent music information retrieval,IEEE Transactions on Multimedia,Vol.8,No.3,June 2006,564-574.
    [68]Lu L,Liu D,Zhang H.Automatic mood detection and tracking of music audio signals.IEEE transactions on Audio,Speech,and Language Processing,Vol.14(1),2006,5-18.
    [71]Chen S F,Rosenfeld R.A Gaussian prior for smoothing maximum entropy models.Tech.Rep.CMUCS-99-108,Carnegie Mellon University,1999.
    [72]Ney H,Essen U,Kneser R.On structuring probabilistic dependences in stochastic language modeling.Computer,Speech,and Language,8,1994,1-38.
    [73]Kazama J,Tsujii J.Evaluation and extension of maximum entropy models with inequality constraints.In Proc.EMNLP 2003,2003,137-144.
    [78]Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear umbedding.Science:vol.290,2000,2323-2326.
    [80]Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis.Journal of the American Society for Information Science,1990,41,391-407.
    [4]Eguchi K,Lavrenko V.Sentiment retrieval using generative models.In Proc.of the 2006 Conf.on Empirical Methods in Natural Language Processing,2006,345-354.
    [5]Skomorowski J.Topical opinion retrieval.[Master's Thesis].University of Waterloo,2006.
    [6]Osman D J,Yearwood J L.Opinion search in web logs.In Proc.of the 18th Conf.on Australasian Database,2007,133-139.
    [7]Zhang W,Yu C,Meng W.Opinion retrieval from blogs.In Proc.of the 16th ACM Conf.on Information and Knowledge Management,2007,831-840.
    [9]Ounis I,Rijke M,Macdonald C,et al.Overview of the TREC-2006 Blog Track.In Proc.the 15th Text REtrieval Conf.,2006.
    [10]Macdonald C,Ounis I,Soboroff I.Overview of the TREC 2007 Blog Track.In Proc.the 16th Text REtrieval Conf.,2007.
    [18]Buckley C,Voorhees E M.Retrieval Evaluation with Incomplete Information.In Proc.of the 27th Annual Int.ACM SIGIR Conf.on Research and Development in Information Retrieval,2004,25-32.
    [19]Chen B,He H,Xu W,et al.POC-NLW template based tagging method for Chinese word segmentation.In Proc.of the 2006 Int.Conf.on Computational Intelligence and Security,2006,1423-1428.
    [20]Ponte J M,Croft W B.A language modeling approach to information retrieval.In Proc.of the 21st Annual Int.ACM SIGIR Conf.on Research and Development in Information Retrieval,1998,275-281.
    [21]Turtle H,Croft W B.Evaluation of an inference network-based retrieval model.ACM Trans.on Information System,9(3),1991,187-222.
    [22]Strohman T,Metzler D,Turtle H,et al.Indri:A language model-based search engine for complex queries(extended version).CIIR Technical Report,2005.
    [23]Metzler D,Croft WB.Combining the language model and inference network approaches to retrieval.Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval,40(5),2004,735-750.
    [24]He H,Chen B,Guo J.Emotion recognition of pop music based on maximum entropy with priors".The 13~(th) Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD'09),2009,788-795.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700