基于Web的实体信息搜索与挖掘研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着网络技术的迅猛发展,当今的万维网出现了多代共存、共同发展的新局面。传统万维网(Web 1.0)构成了当今万维网的主体。社会化万维网(Web 2.0)近年来飞速发展,成为了当今万维网的新兴力量。同时,为了能够让机器和人一样地理解并处理各种网络数据,人们正积极推进语义万维网技术的发展,并预期其将成为下一代网络的主流载体(Web 3.0)。所有这些网络的应用均层出不穷,各类实体描述信息散布其间。这给用户带来便利的同时也带来了一个关键的问题,即信息过载。如何从这一巨大而复杂的信息空间中,有效地找到用户所需要的各类实体信息也成为近年来的一个研究热点。根据这一需求,本文分析了各代网络的特点,提出了在Web 1.0、2.0和3.0中进行实体信息检索与挖掘的概念,针对每代网络进行了体系化的理论研究工作,并提出了一系列的挖掘算法。
     在传统网络(Web 1.0)中,大部分研究工作都以提供用户最为相关的网页为目标,而现实中,越来越多的用户开始关心网页内部所蕴含的信息,而非网页本身。针对这一需求,本文第一部分提出了以下算法对网页中的实体信息进行挖掘:1)专家搜索:本文提出了基于概率的细粒度专家搜索模型。2)专家-技术隐式关联挖掘:本文提出了多类型的可分混合模型用于高效地挖掘专家和技术之间的隐式关联。3)竞争者挖掘:本文提出了一个创新的算法(CoMiner)用于从网上自动地挖掘领域无关的竞争对手信息。4)时间关联的事件挖掘:本文提出了一个新的算法(TESer)用于挖掘网络中的事件信息并按照时间进行整合。
     Web2.0的快速发展带来了大量对网页、图片、论文、专家等实体进行的大众标注,比如Del.icio.us书签网、Flickr图片共享网等。本文第二部分分析Web 2.0的特性,挖掘其中的各种实体关系,并用挖掘到的信息改善各种现有的应用:1)社会化搜索:本文提出了两个新算法分别用于改进网页搜索的动态排序和静态排序。2)社会化语言模型:本文提出了一个语言标注模型用来进一步改进语言模型的检索效果。3)社会化浏览:本文提出了一个改进的网页浏览算法,该算法能够充分地利用网页标注之间的语义关联和隐含的层次信息。
     为了让机器也能理解网络信息,人们提出了语义万维网。目前语义万维网正处于早期发展阶段。作为现有万维网的下一个自然扩展,本文将其称为Web 3.0。本文第三部分对Web 3.0的构建及其应用进行了探讨性的研究:1)语义浮出:通常语义万维网通过专家定义本体信息来构建,本文提出了基于社会化标注自动浮出层次化语义的算法。2)语义应用:本文进一步将语义信息应用到Web服务组合中,并提出了一个新的语义服务的查找与组合算法。
     研究结果表明,通过对Web 1.0、2.0和3.0环境下的实体挖掘研究,能够极大地减少用户获取目标信息所需的时间,并能更好地帮助用户理解搜索目标。
With the rapid development of the Web technologies, the World Wide Web is comingto a new status containing multiple mixed generations, each of which keeps developing fastas well. The traditional web (Web 1.0) still acts as the principal part of the current Web.Recently, social World Wide Web (Web 2.0) develops rapidly and becomes the rising notablepart of today’s Web. At the same time, many people are working on the development of theSemantic Web where machine can understand and process various web data like humanbeings. It is expected to be a main stream in the next generation of Web (Web 3.0). Variousapplications emerge endlessly in all these generations of the Web. They bring the web usersgreat convenience as well as a key problem, i.e. information overload. How to effectivelyfind the desired information for the user from such a huge and complex information spacebecomes a hot research topic in recent years. In this paper, we propose to mine the entityinformation in Web 1.0, 2.0 and 3.0. For each generation, we analyze the properties of theWeb and propose a series of mining algorithms as follows.
     In the traditional web (Web 1.0), most work targets on providing the user with the mostrelevant web pages. In reality, more and more users are concerned with information of en-tities scattered in the web page, but not the web page itself. Motivated by this, the first partof this paper proposes the following algorithms for entity mining. 1) Expert search: Wepropose a new algorithm, namely fine-grained model, to address the problem. 2) Expert-expertise mining: We propose a new typed separable mixture model to mine the latent as-sociations between expert and expertise effectively. 3) Competitor mining: We propose anew algorithm, CoMiner, to mine the competitors automatically in a domain-independentmanner. 4) Temporal event mining: We propose a new algorithm, TESer, to mine the eventschronologically.
     With the boost of Web 2.0, more and more web resources like web pages, picturesare annotated by web users with different backgrounds, for example, various resources areannotated with services provided by Del.icio.us, Flickr and so on. The second part of thispaper analyzes the properties of Web 2.0 and mines the entity relations. 1) Social search:We propose two new algorithms to improve the web pages’similarity ranking and static ranking, respectively. 2) Social language model: We propose a new algorithm to smooththe estimation of language model with social annotations. 3) Social browsing: We proposean effective algorithm to utilize the semantic association and hierarchical information toimprove the social browsing experience.
     To make machine understand web information, researchers propose the Semantic Webto define the semantics of web resources explicitly. The Semantic Web is in an early stageof rapid development. As a natural extension of the current web, Semantic Web (referredas Web 3.0 here) is expected to be the coming next generation of the Web. The third partof the paper takes a try on mining the semantic information of Web 3.0.1) Emergent seman-tics:We propose an effective algorithm for emerging hierarchical semantics from social an-notations. 2) Semantic web service composition: We propose a semantic rewriting approachfor semantic web service composition based on query rewriting.
     The experimental results show that the mining of entities in web 1.0, 2.0 and 3.0 benefitsthe web users a lot in saving time to find the target information and facilitates the understand-ing of the target entities.
引文
?http://trend.cnki.net/trendshow.php?searchword=WEB
    [1] Acm digital library. In http://portal.acm.org/dl.cfm?dl=acm.
    [2] Baidu: http://www.baidu.com.
    [3] Dogpile: http://www.dogpile.com.
    [4] Expertnet: http://www.cvcp.ac.uk/expertnet.htm.
    [5] Gate-general architecture for text engineering: http://gate.ac.uk/.
    [6] Google: http://www.google.com.
    [7] http://blog.del.icio.us/.
    [8] http://en.wikipedia.org.
    [9] http://hublog.hubmed.org/tags/visualisation.
    [10] http://research.microsoft.com/users/nickcr/w3c-summary.html.
    [11] http://www.marketingpilgrim.com/2006/01/winks-michael-tanne-discusses-future.html.
    [12] http://www.neuroticweb.com/recursos/del.icio.us-graphs/.
    [13] Kartoo: http://www.kartoo.com.
    [14] Minipar: http://www.cs.ualberta.ca/ lindek/minipar.htm.
    [15] Msn: http://www.msn.com.
    [16] Open directory project: http://www.dmoz.org.
    [17] Profnet: http://www.profnet.com.
    [18] Search engine watch: http://www.searchenginewatch.com.
    [19] Skillview: htttp://www.skillview.com.
    [20] Surfwax: http://www.surfwax.com.
    [21] Virginia tech expertise database. In http://www.vt.edu/vt98/directories/facultyexpertise.html.
    [22] Vivisimo: http://vivisimo.com.
    [23] Yahoo: http://www.yahoo.com.
    [24] Web 2.0 introduction. In http://en.wikipedia.org/wiki/Web 2.0, 2006.
    [25] Mockus A. and J.D. Herbsleb. Expertise browser: A quantitative approach to identifying expertise.In Proceedings of the 24th International Conference on Software Engineering, pages 503 – 512,2002.
    [26] K. Aberer, P. Cudre-Mauroux, A.M. Ouksel, T. Catarci, M.S. Hacid, A. Illarramendi, V. Kashyap,M. Mecella, E. Mena, E.J. Neuhold, et al. Emergent semantics principles and issues. In Proceedingsof DASFAA 2004, 2004.
    [27] E. Agichtein, E. Brill, and S. Dumais. Improving web search ranking by incorporating user behav-ior information. In Proc. of SIGIR 2006, 2006.
    [28] H. S. Al-Khalifa and H. C. Davis. Measuring the semantic value of folksonomies. In Innovationsin Information Technology, pages 1–5, 2006.
    [29] J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and trackingpilot study: final report. In Proc. of the DARPA broadcast news transcription and understandingworkshop, pages 194–218, 1998.
    [30] J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of new topics. In Proc. of SIGIR’01,pages 10–18, 2001.
    [31] J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proc. of SI-GIR’98, pages 37–45, 1998.
    [32] O. Alonso and M. Gertz. Clustering of search results using temporal attributes. In Proc. of SI-GIR’06, 2006.
    [33] M Anderberg. Cluster Analysis and Applications. Academic Press, New York, 1973.
    [34] Kemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking complex relation-ship search results on the semantic web. In Proc. of the 14th Intl. World Wide Web Conference(WWW2005), 2005.
    [35] N Ashish and C Knoblock. Wrapper generation for semi-strcutured internet sources. SIGMODRecord, pages 8–15, 1996.
    [36] M. Aurnhammer, P. Hanappe, and L. Steels. Augmenting navigation for collaborative tagging withemergent semantics. In Proceedings of the ISWC 2006, 2006.
    [37] F. Baader, R. Ku¨sters, and R. Molitor. Rewriting concepts using terminologies. In A. G. Cohn,F. Giunchiglia, and B. Selman, editors, Proceedings of the Seventh International Conference onKnowledge Representation and Reasoning (KR2000), pages 297–308, San Francisco, CA, 2000.Morgan Kaufmann Publishers.
    [38] Franz Badder, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter Patel-Schneider,editors. The Description Logic Handbook. Cambridge University Press, Jan. 2003.
    [39] K. Balog, L. Azzopardi, and M. Rijke. Formal models for expert finding in enterprise corpora. InProceedings of the 29th Annual International ACM SIGIR Conference, 2006.
    [40] A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropyapproach to breg-man coclustering and matrix approximation. In Proc. of SIGKDD’04, pages 509–514, 2004.
    [41] Shenghua Bao, Xiaoyuan Wu, Ben Fei, Guirong Xue, Zhong Su, and Yong Yu. Optimizing websearch using social annotations. In Proceedings of WWW 2007, pages 501–510, 2007.
    [42] Catriel Beeri, Alon Y. Levy, and Marie-Christine Rousset. Rewriting queries using views in de-scription logics. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium onprinciples of database systems table of contents, pages 99–108. ACM Press New York, NY, USA,1997.
    [43] G. Begelman, P. Keller, and F.Smadja. Automated tag clustering improved search and explorationin the tag space. In Proc. of Collaborative Web Tagging Workshop at WWW2006.
    [44] B. Benatallah, M. Hacid, C. Rey, and F. Toumani. Request rewriting-based web service discovery.In Proceedings of 2nd International Semantic Web Conference, pages 242 – 257, 2003.
    [45] A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proc. of SIGIR’99,pages 222–229, 1999.
    [46] P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, 2002.
    [47] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. ComputerNetworks and ISDN Systems, 30(1-7):107–117, 1998.
    [48] C. H. Brooks and N. Montanez. Improved annotation of the blogosphere via autotagging andhierarchical clustering. In Proc. of WWW 2006, page May 23.26 2006, 625-632.
    [49] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learningto rank using gradient descent. In Proc. of the ICML 2005, 2005.
    [50] Ashish N. Knoblock C. Wrapper generation for semi-structured internet sources. In SIGMOD,1997.
    [51] C.S. Campbell, P. Maglio, A. Cozzi, and B. Dom. Expertise identification using email commu-nications. In Proceedings of ACM 12th Conference on Information and Knowledge Management,pages 528–531, 2003.
    [52] G. Cao, J.Y. Nie, and J. Bai. Integrating word relationships into language models. In In INPRO-CEEDINGS of SIGIR’05, pages 298–305, 2005.
    [53] Y. Cao, J. Liu, , S. Bao, and H. Li. Research on expert search at enterprise track of trec 2005. InProceedings of the Text REtrieval Conference, 2005.
    [54] Soumen Chakrabarti, Byron Dom, David Gibson, Jon M.Kleinberg, Prabhakar Raghavan, and Srid-har Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associatedtext. In Proceedings of 7th World Wide Web Conference (WWW97), pages 65–74, 1997.
    [55] E. Charniak and M. Berland. Finding parts in very large corpora. In proceeding of the 37th AnnualMeeting of the ACL, 1999.
    [56] L-F. Chien, T-I. Huang, and M-C. Chien. Pat-tree-based keyword extraction for chinese informationretrieval. In Proceedings of SIGIR 1997, pages 50–58, 1997.
    [57] H. L. Chieu and Y. K. Lee. Natural language processing: Query based event extraction along atimeline. In Proc. of SIGIR’04, pages 425–432, July 2004.
    [58] Philipp Cimiano, Sieqfried Handschuh, and Steffen Staab. Towards the self-annotating web. InProc. of the 13th Intl. World Wide Web Conference (WWW2004), 2004.
    [59] P. Cimino, S. Handschuh, and S. Staab. Towrds the self-annotating web. In Proceedings of WorldWide Web (WWW-04), 2004.
    [60] W. Cohen, M. Hurst, and L. Jensen. A ?exible learning system for wrapping tables and lists in htmldocuments. In Proceedings of WWW 2002, pages 232–241, 2002.
    [61] N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information.In Proc. of SIGIR 2001, pages 250–257, New Orleans, 2001.
    [62] N. Craswell, D. Hawking, A. M. Vercoustre, and P. Wilkins. P@noptic expert: Searching forexperts not just for documents. In In Ausweb, 2001, 2001.
    [63] N. Craswell, A.P. Vries, and I Soboroff. Overview of the trec 2005 enterprise track. In Proc. ofTREC 2005.
    [64] Fabio Crestani, Mounia Lalmas, van Rijsbergen, and Iain Campbell. Is this document relevant?probably: A survey of probabilistic models in information retrieval. ACM Computing Surveys,30(4):528–552, 1998.
    [65] F. Curbera, M. Duftler, R. Khalaf, W. Nagy, N. Mukhi, and S. Weerawarana. Unraveling the webservices web: an introduction to soap, wsdl, and uddi. Internet Computing, IEEE, 6:86–93, Mar2002.
    [66] D. Cutting, D. R. Karger, J. Pederson, and J. W. Tukey. Scatter/gather: a cluster-based approach tobrowsing large document collections. In Proc. of SIGIR’92, pages 318–329, 1992.
    [67] R. D’Amore. Expertise community detection. In SIGIR’04: Proceedings of the 27th annual in-ternational ACM SIGIR conference on Research and development in information retrieval, pages498–499, 2004.
    [68] T. H. Davenport and L Prusak. Working Knowledge: how organizations manage what they know.Howard Business, School Press,Boston, MA,, 1998.
    [69] G. DeJong. Prediction and substantiation: A new approach to natural language processing. Cogni-tive Science, 3(3):251–273, 1979.
    [70] O. Dekel, C. Manning, and Y. Singer. Log-linear models for label-ranking. Advances in NeuralInformation Processing Systems, 16, Cambridge, MA: MIT Press, 2003.
    [71] Delicious. http://del.icio.us.
    [72] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via theem algorithm. In Journal of the Royal Statistical Society: Series B, 39(1), pages 1–38, November1977.
    [73] I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. InProc. of SIGKDD’01, pages 269 – 274, 2001.
    [74] I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proc. ofSIGKDD’03, pages 89–98, 2003.
    [75] P. A. Dmitriev, N. Eiron, M. Fontoura, and E. Shekita. Using annotations in enterprise search. InProc. of WWW 2006, pages 811–817, May 23.26, 2006.
    [76] B. Dom, I. Eiron, Cozzi A., and Z. Yi. Graph-based ranking algorithms for e-mail expertise anal-ysis. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining andKnowledge Discovery, 2003.
    [77] M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, and A. Tomkins. Visualizing tags overtime. In Proc. of WWW2006, pages 193–202, May 23.26, 2006.
    [78] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A Popcscu, T. Shaked, S. Soderland, and S. Weld.Web-scale information extraction in knowitall(preliminary results). In Proceedings of WWW 2004,pages 100–110, 2004.
    [79] ExpertNet. http://www.cvcp.ac.uk/expertnet.htm.
    [80] H. Fang, L. Zhou, and C Zhai. Language models for expert finding-uiuc trec 2006 enterprise trackexperiments. In Proc. of TREC 2006, 2006.
    [81] D. Fensel, C. Bussler, Y. Ding, and B. Omelayenko. The web service modeling framework WSMF.In Proceedings of Electronic Commerce Research and Applications, 2002.
    [82] D. Freitag and A. McCallum. Information extraction with hmms and shrinkage. In Proceedings ofthe AAAI 1999 Workshop on Machine Learning for Information Extraction, 1999.
    [83] Thomas L. Friedman. The World Is Flat: A Brief History of the Twenty-first Century. Farrar, Strausand Giroux, April 2005.
    [84] B. Gao, T. Liu, X. Zheng, Q. Cheng, and W. Ma. Consistent bipartite graph co-partitioning forstar-structured high-order heterogeneous data co-clustering. In Proc. of SIGKDD’05, pages 41–50,2005.
    [85] J. Gao, J.Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. In InINPROCEEDINGS of SIGIR’04, pages 170–177, 2004.
    [86] Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, and Guihong Cao. Dependence language model forinformation retrieval. In Proceedings of the 27th annual international conference on Research anddevelopment in information retrieval, 2004.
    [87] Francois Goasdoue and Marie christine Rousset. Answering queries using views: a KRDB per-spective for the semantic web. ACM Transactions on Internet Technology (TOIT), 4:255 – 288,August 2004.
    [88] S. A. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal ofInformation Science, 32(2):198–208, 2006.
    [89] G. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
    [90] Paul Graham. Web 2.0. In http://www.paulgraham.com/web20.html, 2005.
    [91] G.Salton and M.E.Lesk. Computer evaluation of indexing and text processing. Jouranl of the ACM,15(1):8–36, 1968.
    [92] R. Guha, Rob McCool, and Eric Miller. Semantic search. In Proc. of WWW’03, pages 700–709.
    [93] Volker Haarslev and Ralf Moller. Racer system description. In Proceedings of the First Interna-tional Joint Conference on Automated Reasoning, pages 701–706. Springer-Verlag, 2001.
    [94] U. Hahn and K. Schnattinger. Towards text knowledge engineering. In Proceedings of the 15thNational Conference on Artificial Intelligence and the 10th Conference on Innovative Applicationof Artificial Intelligence, pages 524–531, 1998.
    [95] Harry Halpin, Valentin Robu, and Hana Shepherd. The complex dynamics of collaborative tagging.In Proceedings of the WWW 2007, pages 211–220, 2007.
    [96] T. Hammond, T. Hannay, B. Lund, and J. Scott. Social book marking tools (i)- a general review.D-Lib Magazine, 11(4), 2005.
    [97] M. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14thInternational Conference on Computational Linguistics, 1992.
    [98] M. Hearst. Clustering versus faceted categories for information exploration. Communication of theACM, 49(4):59–61, 2006.
    [99] Jeff He?in and James Hendler. Searching the web with SHOE. In Proc. of AAAI-2000 Workshopon AI for Web Search, 2000.
    [100] R. Herbrich, T. Graepel, and K. Obermayer. Support vector learning for ordinal regression. InProc. of the 9th International Conference on Artificial Neural Networks, pages 97–102, 1999.
    [101] D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In In INPRO-CEEDINGS of European Conference on Digital Libraries, pages 569–584, 1998.
    [102] Djoerd Hiemstra, Stephen Robertson, and Hugo Zaragoza. Parsimonious language models forinformation retrieval. In Proceedings of the 27th International ACM SIGIR conference, 2004.
    [103] Dion Hinchcliffe. The state of web 2.0. In http://web2.wsj2.com/the state of web 20.htm, 2006.
    [104] J. Hladik. reasoning about nominals with fact and racer. In Proceedings of the 2003 InternationalWorkshop on Description Logics (DL2003), Rome, Italy September 5-7, 2003, 2003.
    [105] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of SIGIR’99, pages 50–57, 1999.
    [106] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. 42(1-2):177 – 196,2001.
    [107] T. Hofmann. Latent semantic models for collaborative filtering. 22(1):89 – 115, 2004.
    [108] T. Hofmann and J. Puzicha. Statistical models for co-occurrence data. A.I.Memo 1635, MIT, 1008.
    [109] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Searchand ranking. In Proceedings of ESWC 2006, 2006.
    [110] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of SIGKDD 2004,pages 168–177, 2004.
    [111] Y. Hu, H. Li, Y. Cao, D. Meyerzon, L. Teng, and Q. Zheng. Automatic extraction of titles fromgeneral documents using machine learning. Information Processing Management, 2006.
    [112] Y. Hu, G. Xin, R. Song, G. Hu, S. Shi, Y. Cao, and H. Li. Title extraction from bodies of htmldocuments and its application to web page retrieval. In Proc. of SIGIR 2005, pages 250–257, 2005.
    [113] Zeng Hua-Jun, Qi-Cai He, Zheng Chen, Wei-Ying Ma, and Jinwen Ma. Learning to cluster websearch results. In Processedings of SIGIR 2004, pages 210–217, 2004.
    [114] N. Jardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. InInformation Storage and Retrieval, pages 7:217–240, 1971.
    [115] K Jarvelin and J. Kekalainen. Ir evaluation methods for retrieving highly relevant documents. InProc. of SIGIR 2000, 2000.
    [116] G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In In Proc. of SIGKDD2002, pages 538–543, 2001.
    [117] F. Jelinek. Statistical methods for speech recognition. In Cambridge, MA:MIT Press., 1997.
    [118] Nitin Jindal and Bing Liu. Identifying comparative sentences in text documents. In Proceedings ofSIGIR 2006, pages 244–251, 2006.
    [119] Nitin Jindal and Bing Liu. Mining comparative sentences and relations. In Proceedings of AAAI2006, 2006.
    [120] T. Joachims. Optimizing search engines using clickthrough data. In Proc. of SIGKDD 2002, pages133–142, 2002.
    [121] Goncalves J.Zhu. Corder:comunity relation discovery by named entity recognition. In K-CAP,2005.
    [122] Goncalves J.Zhu. Mining web data for competency management. In In Proc.of 2005 InternationalConference on Web Intelligence, 2005.
    [123] A. K. Karlson, G. G. Robertson, Robbins, C. D. C. Mary, and S. Greg. Fathumb: a facet-basedinterface for mobile search. In Proc. of CHI’06, pages 711–720, 2006.
    [124] H. Kautz and A. Selman, B.and Milewski. Agent amplified communication. In Proceedings of the13th National Conference on Artificial Intelligence, pages 3–9, 1996.
    [125] P. Kim and S. H. Myaeng. Usefulness of temporal information automatically extracted from newsarticles for topic tracking. ACM Transactions on Asian Language Information Processing (TALIP),3(4), December 2004.
    [126] Mei Kobayashi and Koichi Takeda. Information retrieval on the web. ACM Computing Surveys,32(2):144–173, 2000.
    [127] Wessel Kraaij, Thijs Westerveld, and Hiemstra Hiemstra. The importance of prior probabilitiesfor entry page search. In Proceedings of the 25th Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval, Web Information Retrieval, pages 27–34,2002.
    [128] Arun Kumar, Biplav Srivastava, and Sumit Mittal. Information modeling for end to end compo-sition of semantic web services. In Proceeding of 2005 International Semantic Web Conference,pages 476–490, 2005.
    [129] John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilisticmodels for segmenting and labeling sequence data. In Proc. 18th International Conf. on MachineLearning, pages 282–289. Morgan Kaufmann, San Francisco, CA, 2001.
    [130] John Lafferty and Chengxiang Zhai. Document language models, query models, and risk mini-mization for information retrieval. In Proceedings of the 24th annual international ACM SIGIRconference on Research and development in information retrieval, 2001.
    [131] K. Lagus, T. Honkela, S. Kaski, and T. Kohonen. Websom for textual data mining. ArtificialIntelligence Review, 13(5-6):245–364, 1999.
    [132] R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. InarXiv:cs.DS/0512090 v2, 29 Dec 2005.
    [133] Erwan Bornier Lee Provoost. Service-oriented architecture and the semantic web: A killer combi-nation? In http://lee.webcoder.be/papers/sesa.pdf, 2006.
    [134] A.Y. Levy and M.C. Rousset. Carin: A representation language combining horn rules and de-scription logics. In Proceeding of European Conference on Artificial Intelligence, pages 323–327,1996.
    [135] Rui Li, Shenghua Bao, Ben Fei, Zhong Su, and Yong Yu. Towards effective browsing of large scalesocial annotations. In Proceedings of the WWW 2007, pages 943–952, 2007.
    [136] W. Li, K.-F. Wong, and C. Yuan. Toward automatic chinese temporal information extraction. Jour-nal of the American Society for Information Science and Technology, 52(9):748–762, 2001.
    [137] ChenXi Lin, Lei Zhang, Jian Zhou, Ying Yang, and Yong Yu. SPortS: Semantic+Portal+Service. InECAI 2004 Workshop on Application of Semantic Web Technologies to Web Communitites, volume107 of CEUR-WS, 2004.
    [138] B. Liu and C. Chin. Mining topic-specific concepts and definitions on the web. In Proceedings ofWWW 2003, pages 251–260, 2003.
    [139] B. Liu, Y. Ma, and P. Yu. Discovering unexpected information from your competitors’ web sites.In Proceedings of SIGKDD 2001, pages 144–153, 2001.
    [140] B. Liu, K. Zhao, and L. Yi. Visualizing web site comparisons. In Proceedings of WWW 2002,pages 693–703, 2002.
    [141] Bing Liu, Minqing Hu, and Junsheng Cheng. Opinion observer: Analyzing and comparing opin-ions on the web. In Proceedings of WWW 2005, pages 342–351, 2005.
    [142] Jianguo Lu, Yijun Yu, and John Mylopoulos. A lightweight approach to semantic web servicesynthesis. In ICDE Workshop, International Workshop on Challenges in Web Information Retrievaland Integration, 2005.
    [143] C. Maconald and I Ounis. Voting for candidates: adapting data fusion techniques for an expertsearch task. In Proc. of CIKM’06, pages 387–396, 2006.
    [144] J. B MacQueen. Some methods for classification and analysis of multivariate observations,. InProc. of 5-th Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297,1967.
    [145] Inderjeet Mani and George Wilson. Robust temporal processing of news. In Proc. of ACL’00,pages 69–76, 2000.
    [146] David Martin, Mark Burstein, Grit Denker, Jerry Hobbs, Lalana Kagal, Ora Lassila, Drew McDer-mott, Sheila McIlraith, Massimo Paolucci, Bijan Parsia, Terry Payne, Marta Sabou, Evren Sirin,Monika Solanki, Naveen Srinivasan, and Katia Sycara. Technical report, DAML.org, Nov. 2003.http://www.daml.org/services/owl-s/1.0/.
    [147] M. Maslov, A. Golovko, I. Segalovich, and P. Braslavski. Extracting news-related queries fromweb query log. In Proc. of WWW’06, pages 931–932, 2006.
    [148] A. Mathes. Folksonomies - cooperative classification and communication through shared metadata.In http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html,December 2004.
    [149] D. Mattox, M. T. Maybury, and D. Morey. Enterprise expert and knowledge discovery. Technicalreport, The MITRE Cor-poration,, 1999.
    [150] A. McCallum. Efficiently inducing features of conditional random fields. Proceedings of UAI2003, 2003.
    [151] A. McCallum, K. Nigam, and L. Ungar. Efficient clustering of high-dimensional data sets withapplication to reference matching. In Proc. of SIGKDD’00, pages 169–178, 2000.
    [152] D. W. McDonald. Evaluating expertise recommendations. In Proc. of the ACM 2001 internationalconference on Supporting Group Work (GROUP’01), 2001.
    [153] D. W. McDonald and M. S. Ackerman. Just talk to me: A field study of expertise location. InProceedings of the 12th ACM conference on Computer Supported Cooperative Work, pages 315–324, 1998.
    [154] D. W. McDonald and M. S. Ackerman. Expertise recommender: A ?exible recommendation systemand architecture. In ACM Conference on Computer Supported Cooperative Work, pages 231–240,2000.
    [155] F. McSherry. A uniform approach to accelerated pagerank computation. In In: Proc. of WWW2005, pages 575–582, 2005.
    [156] P. Merholz. Metadata for the masses. In http://www.adaptivepath.com/publications/essays/archives/000361.php,October 19, 2004.
    [157] P. Mika. Ontologies are us: a unified model of social networks and semantics. In Proc. of ISWC2005, pages 522–536, Nov. 2005.
    [158] David R. Miller, Tim Leek, and Richard M. Schwartz. A hidden markov model information re-trieval system. In Proceedings of SIGIR-99, 22nd ACM International Conference on Research andDevelopment in Information Retrieval, pages 214–221, Berkeley, US, 1999.
    [159] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushinna. Mining product reputations on theweb. In Proceedings of SIGKDD 2002, pages 341–349, 2002.
    [160] A. Ntoulas, J. Cho, and C. Olston. What’s new on the web? the evolution of the web from a searchengine perspective. In In Proc. of the Thirteenth WWW Conference, 2004.
    [161] Tim O’Reilly. What is web 2.0. In http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html, 2005.
    [162] Tim O’Reilly. Web 2.0 compact definition: Trying again. Inhttp://radar.oreilly.com/archives/2006/12/web 20 compact.html, 2006.
    [163] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order tothe web. Technical report, Stanford Digital Library Technologies Project, 1998.
    [164] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learningtechniques. In Proceedings of EMNLP 2002, pages 79–96, 2002.
    [165] M. Paolucci, T. Kawmura, T. Payne, and K. Sycara. Semantic matching of web services capabilities.In Proceedings of the First International Semantic Web Conference, pages 333–347, 2002.
    [166] A. Patil, S. Oundhakar, A. Sheth, and K. Verma. Meteor-s web service annotation framework. InProceedings of 13th International World Wide Web Conference, pages 553–562, 2004.
    [167] F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In Proceedings of the31st conference on Association for Computational Linguistics, pages 183–190, 1993.
    [168] D. Petkova and W. B. Croft. Hierarchical language models for expert finding in enterprise corpora.In Proc. of ICTAI’06, pages 599–608, 2006.
    [169] J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In In INPRO-CEEDINGS of SIGIR’98, pages 275–281, 1998.
    [170] A-M Popescu and Etzioni. O. Extracting product features and opinions from reviews. In Proceed-ings of EMNLP 2005, pages 339–346, 2005.
    [171] ProfNet. http://www.profnet.com/.
    [172] P.Zang. Ctms: A comparative text mining system. Master’s thesis, University of Illinois at Urbana-Champaign, 2004.
    [173] E. Quintarelli. Folksonomies: Power to the people. In Paper presented at the ISKO Italy-UniMIBmeeting, http://www.iskoi.org/doc/folksonomies.htm, June 2005.
    [174] L.R. Rabiner. A tutorial on hidden markov models and selected applications inspeech recognition.In Proceedings of the IEEE, pages 257–286, 1989.
    [175] M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: Machine learning for static ranking.In Proc. of WWW2006, May 23-26, 2006.
    [176] C.J.van Rijsbergen. A new theoretical framework for information retrieval. In Proc. of the 9thACM SIGIR Conference, pages 194–200, 1986.
    [177] S. E. Robertson, S. Walker, M. Hancock-Beaulieu, A. Gull, and M. Lau. Okapi at trec. In TextREtrieval Conference 1992, pages 21–30, 1992.
    [178] D. Roman, U. Keller, H. Lausen, J. de Bruijn, R. Lara, M. Stollberg, A. Polleres, C. Feier, C.Bussler, and D. Fensel. Web service modeling ontology. Applied Ontology, 1:77–106, 2005.
    [179] K. Rose. Deterministic annealing for clustering, compression, classification, regression, and relatedoptimization problems. Proceedings of the IEEE, 86(11):2210–2239, 1998.
    [180] R. Rosenfeld. Two decades of statistical language modeling: Where do we go from here, 2000.
    [181] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, NewYork, 1983.
    [182] M. Sanderson. Retrieval with good sense. Information Retrieval, 2:47–67, 2000.
    [183] J. Schachter. del.icio.us about page. In http://del.icio.us/doc/about, 2004.
    [184] Joshua Schachter. Del.icio.us about page. http://del.icio.us/about, 2004.
    [185] F. Schilder. Extracting meaning from temporal nouns and temporal prepositions. In ACM Trans-actions on Asian Language Information Processing (TALIP), volume 3, March 2004.
    [186] C. Schmitz, M. Grahl, A. Hotho, G. Stumme, C. Cattuto, A. Baldassarri, V. Loreto, and V.D.P.Servedio. Network properties of folksonomies. In In WWW’07 Tagging and Metadata for SocialInformation Organization workshop., Banff, Alberta, Canada, 2007.
    [187] Mithun Sheshagiri, Marie desJardins, and Tim Finin. A planner for composing services describedin daml-s. In Proceedings of AAMAS Workshop on Web Services and Agent-Based Engineering,2003.
    [188] C. Shirky. Folksonomy. In Blog entry at http://www.corante.com/many/archives/2004/08/25/folksonomy.php,August 2004.
    [189] W. Sihn and Heeren F. Xpertfinder-expert finding within specified subject areas through analysis ofe-mail communication. In Proceedings of the 6th Annual Scientific conference on Web Technology,2001.
    [190] Evren Sirin, Bijan Parsia, Dan Wu, James Hendler, and Dana Nau. Htn planning for web servicecomposition using shop2. Journal of Web Semantics, 1, 2004.
    [191] G. Smith and Atomiq. Folksonomy: social classification. Inhttp://atomiq.org/archives/2004/08/folksonomy social classification.html, Aug 3, 2004.
    [192] I. Soboroff, A.P. Vries, and N. Craswell. Overview of the trec 2006 enterprise track. In Proc. ofTREC 2006, 2006.
    [193] S. Soderland. Learning information extraction rules for semi-structured and free text. MachineLearning, pages 233–272, 1999.
    [194] F. Song and W.B. Croft. A general language model for information retrieval. In In INPROCEED-INGS of CIKM’99, pages 316–321, 1999.
    [195] M. Srikanth and R.K. Srihari. Exploiting syntactic structure of queries in a language modelingapproach to ir. In In INPROCEEDINGS of CIKM’03, pages 476–483, 2003.
    [196] Munirathnam Srikanth and Rohini Srihari. Biterm language models for document retrieval. InProceedings of the 25th annual international ACM SIGIR conference on Research and developmentin information retrieval, 2002.
    [197] L. A. Steer and K. E. Lochbaum. An expert/expert locating system based on automatic repre-sentation of semantic structure. In Proc. of the Fourth IEEE Conference on Artificial IntelligenceApplications, pages 345–349, 1988.
    [198] M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery.In Proc. of SIGKDD’04, pages 306–315, 2004.
    [199] Nenad Stojanovic, Rudi Studer, and Ljiljana Stojanovic. An approach for the ranking of queryresults in the semantic web. In Proceedings of the 2nd International Semantic Web Conference(ISWC2003), LNCS 2870. Springer-Verlag, 2003.
    [200] Jian-Tao Sun, Xuanhui Wang, Dou Shen, Hua-Jun Zeng, and Zheng Chen. Cws: A comparativeweb search system. In Proceedings of WWW 2006, pages 467–476, 2006.
    [201] R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of SIGIR 2000,pages 49–56, 2000.
    [202] Snehal Thakkar, Jose Luis Ambite, and Craig A.Knoblock. A data integration approach to auto-matically composing and optimizing web services. In Proceeding of 2004 ICAPS Workshop onPlanning and Scheduling for Web and Grid Services, 2004.
    [203] V. W. Thomas. Folksonomy definition and wikipedia. Inhttp://www.vanderwal.net/random/category.php?cat=153, November, 2005.
    [204] T.R.Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199–220,1993.
    [205] W3C. Resource description framework. Technical report, http://www.w3.org/RDF/.
    [206] W3C. Web ontology language. Technical report, http://www.w3.org/2004/OWL/.
    [207] T. V. Wal. Explaining and showing broad and narrow folksonomies. Inhttp://www.personalinfocloud.com/2005/02/explaining and .html, February 21, 2005.
    [208] X. Wang, J. Sun, Z. Chen, and C. Zhai. Latent semantic analysis for multiple-type interrelated dataobjects. In Proc. of SIGIR’06, pages 236–243, 2006.
    [209] X. Wang and A. Zhou. Linkage analysis for the world wide web and its application: A survey.Journal of Software, 14(10):1768–1780, 2003.
    [210] C. Wanhyun, J. Park, M. Lee, and S. Park. Unsupervised color image segmentation using mean shiftand deterministic annealing EM. Internat. Conf. on Computational Science and Its Applications,ICCSA, 3:867–876, 2004.
    [211] T. Westerveld., W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls andanchors. In Proc. of TREC10, 2002.
    [212] George W.Furnas, Scott Deerwester, Susan T.Dumais, Thomas K.Landauer, Richard A.Harshman,Lynn A.Streeter, and Karen E.Lochbaum. Information retrieval using a singular value decomposi-tion model of latent semantic structure. In Proc. of the ACM SIGIR’88, pages 465–480, Grenoble,France, 1988.
    [213] X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In Proc. of WWW2006, pages 417–426, May 23.26, 2006.
    [214] J. Xu and W. Croft. Cluster-based retrieval using language models. In In INPROCEEDINGS ofSIGIR’04, pages 186–193, 2004.
    [215] G.-R. Xue, H.-J. Zeng, Z. Chen, Y. Yu, W.-Y., W. Xi Ma, and W. Fan. Optimizing web search usingweb clickthrough data. In Proc. of CIKM 2005, pages 118–126, 2005.
    [216] XL Yang, Q. Song, and WB Zhang. Kernel-based deterministic annealing algorithm for data clus-tering. IEEE Proceedings-Vision, Image, and Signal Processing, 153:557, 2006.
    [217] Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In Proc.ofSIGIR’98, page 1998.
    [218] Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In Proc. ofSIGKDD’02, pages 688–693, 2002.
    [219] D. Yimam-seid and A. Kobsa. Expert finding systems for organizations: Problem and domain anal-ysis and the demoir approach. Journal of Organizational Computing and Electronic Commerce,pages 1–24, 2002.
    [220] Robert H’obbes’ Zakon. Hobbes’ internet timeline. Inhttp://www.zakon.org/robert/internet/timeline/, 2006.
    [221] A. Zanasi. Information gathering and analysis competitive intelligence through data mining publicsources. Competitive Intelligence Review, 9(1):44–54.
    [222] Alessandro Zanasi. Competitive intelligence through data mining public sources. CompetitiveIntelligence Review, 9(1):44–54.
    [223] Jeffrey Zeldman. Web 3.0. In http://www.alistapart.com/articles/web3point0, 2006.
    [224] H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to cluster web search results. InProc. of SIGIR’04, pages 210– 217, 2004.
    [225] C. Zhai and J. Lafferty. Two-stage language models for information retrieval. In In INPROCEED-INGS of SIGIR’02, pages 49–56, Tampere, Finland, August 11-15, 2002,.
    [226] Chengxiang Zhai, A.Velivelli, and B. Yu. A cross-collection mixture model for comparative textmining. In Proceedings of SIGKDD 2004, pages 743–748, 2004.
    [227] Chengxiang Zhai and John Lafferty. A study of smoothing methods for language models appliedto information retrieval. ACM Transactions on Information Systems (TOIS), 22:179–214, 4 2004.
    [228] R. Zhang, I. Arpinar, and B. Aleman-Meza. automatic composition of semantic web services. Inproceedings of 2nd International Semantic Web Conference(ISWC2003), 2003.
    [229] Q. Zhao, T.-Y. Liu, S.S. Bhowmick, and W.-Y. Ma. Event detection from evolution of click-throughdata. In Proc. of SIGKDD 2006, pages 484–493, 2006.
    [230] Y. Zhao and G. Karypis. Criterion functions for document clustering: Experiments and analysis.In UMN CS 01-040, 2001.
    [231] Geoffrey Z.Liu. Semantic vector space model: Implementation and evaluation. Journal of theAmerican Society for Information Science, 48(5):395–417, 1997.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700