详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     针对农业互联网资源的动态性和高度分散性特点,本文提出了AADWED(Adaptive Agriculture Deep Web Entry Discovery)算法,一种自适应农业领域Deep Web资源发现算法。该算法通过不断从样本中学习到合适的查询表达式提交给通用搜索引擎来高效获取领域Deep web资源入口页面。实验证明,该算法大幅度提高农业领域Deep Web资源发现的收益率。
By the end of 2009, there have been more than 30000 agricultural web sites on the internet, which cover almost all kinds of agricultural information, such as agricultural technology, market information, agricultural news and policies. However, agricultural information on the web has no uniform representation and is heterogeneous, distributed and redundant, which forms isolated information islands. Since the knowledge of farmers to operate a computer is limited, it would be hard for them to use traditional search tools to acquire and filter personalized information on the web. Facing huge amount of information, farmers are often frustrated and the phenomenon of "information overload" is a serious matter here. Obviously, it is significant to develop personalized, intelligent and professional web search models and tools.
     For the characteristics of openness, scatterings, hierarchy, evolution and hugeness of internet, an agricultural search model based on complex adaptive system is proposed in this dissertation. This model constructs the agent alliance of agricultural information discovery agent, information acquisition agent, information processing agent and service agent. The model fit the complex and dynamic internet environment through learning mechanisms between agents and web contents, representation methods and user needs. The method proposed improves the precision and recall of agricultural search engine and solves the core problem for the next generation search engine.
     For the characteristics of dynamics and high scattering of web resources, AADWED (Adaptive Agriculture Deep Web Entry Discovery) algorithm is proposed to acquire domain-specific deep web resources effectively and efficiently. This algorithm constantly constructs queries according to the sample and submits the queries to a search engine in order to find the entry page of hidden web resources. The experiments validate that this method can significantly improve the efficiency of finding hidden web resources.
     Aiming at the two characteristics (dynamics and diversity) of web pages on the web sites, an adaptive web structural data extraction algorithm is presented in this dissertation. This algorithm is based on traditional MDR algorithm and adopts relative entropy theory for noise removal so as to improve the precision of web structural data extraction.
     Aiming at huge amount of heterogeneous, incomplete and redundant agricultural information on the web, this dissertation studied the automatic spatial property annotation and processing redundant data based on semantics for agricultural product price and buy/sell information. The proposed method improves the quality of data and constructs a fundamental for precise retrieval and visualization.
     To tackle the problem of personalized information needs from different web users, a new approach that automatically mining web user profile based on FCA is proposed. The interest models of web users are represented as formal concepts and the relationship between these models are described in a concept lattice. The method of assessing document relevance to the topics is also proposed. The experiments show that our approach is effective.
     At last, based on the complex adaptive agricultural search model proposed in this dissertation, agricultural vertical search engine "Sounong" has been designed and implemented. This search engine has served publicly for many provinces.
Adelberg B., Denny M..1999. Nodose version 2.0 [C]. In:Proceedings of the 18th ACM SIGMOD International Conference on Management of Data, Philadelphia,1999,559-561
    Arocena G. O., Mendelzon A.O.1998. WebOQL:restructuring documents, databases, and Webs [C]. In:Proceedings of the 14th International Conference on Data Engineering, Orlando,1998, 24-33
    Adelberg B.1998. NoDoSE-a tool for semi-automatically extracting semi-structured data from text documents [C], In:Proceedings of the 17th ACM SIGMOD International Conference on Management of Data,1998,283-294
    Barabasi A L.1999. Emergence of scaling in random networks [J]. Science 1999 286:509-512
    Barbosa and J. Freire. Searching for Hidden-Web Databases. In Proceedings of WebDB, pages 1-6, 2005.
    Barbosa L, Freire J. An adaptive crawler for locating hidden-Web entry points. In:Williamson CL, Zurko ME, Patel-Schneider PF, Shenoy PJ, eds. Proc. of the World Wide Web Conf. (WWW). ACM,2007.441-450.
    Bin He, Mitesh Patel, Zhen Zhang, Kevin Chen-Chuan Chang:Accessing the deep web. Commun. ACM 50(5):94-101 (2007)
    B. Liu and K. Chang.2004. SIGKDD Explorations [J], Special issue on Web content mining, vol. 6, no.2, pp.1-4,2004
    Califf M. E.1998. Relational Learning Techniques for Natural Language Information Extraction[R]. Ph.D. thesis, Department of Computer Sciences, University of Texas, Austin, August 1998. Technical Report AI98-276.
    Crescenzi V., Mecca G.1998. Grammars have exceptions [J]. Inf. Syst.,1998,23,8:539-565
    Crescenzi V., Mecca G., Merialdo P.2001. RoadRunner:towards automatic data extraction from large Web sites [C]. In:Proceedings of the 27th International Conference on Very Large Data Bases, Roma,2001,109-118.
    Crescenzi V., Mecca G., Merialdo P.2002. RoadRunner:automatic data extraction from data-intensive web sites [C]. In:Proceedings of the 21th ACM SIGMOD International Conference on Management of Data, Madison,2002,624.
    Cai D., Yu S., Wen J., Ma W.2003. Extracting content structure for Web pages based on visual representation [C]. In:Proceedings of the 5th Asian-Pacific Web Conference, Xian,2003, 406-417
    Chakrabarti, M. van den Berg, and B. Dom. Focused Crawling:A New Approach to Topic-Specific Web Resource Discovery. Computer Networks,31 (11-16):1623-1640,1999.
    Chakrabarti, K. Punera, and M. Subramanyam.Accelerated focused crawling through online relevance feedback. In Proceedings of WWW, pages 148-159,2002.
    Christopher D. Manning,Hinrich Schutze.1999. Foundations of Statical Language Processing [M] The MIT Press.1999:95-116
    Cooley R., B. Mobasher, and J. Srivastava.1999. Data Preparation for Mining World Wide Web Browsing Patterns [J]. Knowledge and Information Systems, 1(1):5-32,1999. Carpineto, C.,& Romano, G.1993. GALOIS:An order-theoretic approach to conceptual clustering [C]. Proc. of the 10th Conf. on Mach. Learn., Amherst, MA, Kaufmann,33-40
    D.A. Hull and S. Roberston.1999. The TREC-8 Filtering Track Final Report [R]. Proc. Text Retrieval Conf. (TREC-8),1999
    Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling Using Context Graphs. In Proceedings of VLDB, pages 527-534,2000.
    Doorenbos R. B., O. Etzioni, D. S. Weld.1996. A Scalable Comparison-Shopping Agent for the World Wide Web [R]. Technical report UW-CSE-96-01-03, University of Washington, 1996.
    Doorenbos R. B., O. Etzioni, D. S. Weld.1997. A Scalable Comparison Shopping Agent for the World-Wide-Web [C]. Proceedings of the first International Conference on Autonomous Agents, California, February 1997.
    Dumais, S.2004. Latent semantic analysis [R]. Annual Review of InformationScience and Technology (ARIST),38 (2004)
    E.J. Glover, G.W. Flake, S. Lawrence, W.P. Birmingham, A. Kruger, C.L. Giles, and D.M. Pennock,.2001. Improving Category Specific Web Search by Learning Query Modifications [J]. SAINT, pp.23-34,2001
    Fu X., J. Budzik, and K. J. Hammond.2000. Mining Navigation History for Recommendation [C]. In Proc. of the 5th International Conference on Intelligent User Interfaces (IUI 2000), pages 106-112,2000.
    Freitag. D.1998a. Information Extraction from HTML:Application of a General Machine Learning Approach [C].Proceedings of the 15'th National Conference on Artificial Intelligence (AAAI-98),1998.
    Freitag. D.1998b. Multistrategy Learning for Information Extraction [C]. Proceedings of the 15'th International Conference on Machine Learning (ICML-98), Madison, Wisconsin, July 1998.
    Freitag. D.1998c. Machine Learning for Information Extraction in Informal Domains [M]. Ph.D. dissertation, Carnegie Mellon University, November 1998.
    Godin, R., Gecsei, J.,& Pichet, C.1989. Design of browsing interface forinformation retrieval [C]. In N. J. Belkin,& C. J. van Rijsbergen (Eds.), Proc. SIGIR 1989,32-39
    Godin, R., Missaoui, R.,& April, A.1993a. Experimental comparison ofnavigation in a Galois lattice with conventional information retrieval methods [J]. Int. J. Man-Machine Studies 38, 747-767
    He B., Tao T., Chang K C.. Clustering structured Web sources:a schema-based, model-differentiation Approach. In:Proceedings of the 9th International Conference on Extending Database Technology, Heraklion, Crete,2004,536-546
    He H., Meng W., Yu C. T., Wu Z.:WISE-Integrator:an automatic integrator of Web search interfaces for e-commerce. In:Proceedings of the 29th International Conference on Very Large Data Bases, Berlin,2003,357-368
    Hammer S., Hector G., Nestorov S., Yerneni R., Breunig M. M., Vassalos V..1997 Template-based wrappers in the TSIMMIS system [C]. In:Proceedings of the 16th ACM SIGMOD International Conference on Management of Data. Tucson,1997,532-535
    Haveliwala.T. H.2002. Topic-Sensitive PageRank [C]. In Proc. of the 11th International World Wide Web Conference (WWW2002), pages 517-526,2002.
    Hock Dee W.2000. Birth of the Chaordic Age [J]. Berrett-Koehler PuA.2000(2).
    Holland J H.1995. Hidden Order:How Adaptation Builds Complexity [M]. Reading, MA: Addison-Wesley,1995.
    Jansen B.J. A. Spink, and T Saracevic.2000. Real life, real users, and real needs:A study and analysis of user queries on the Web [J]. Information Processing and Management,36(2):207-227,2000.
    J. Srivastava, R. Cooley, M. Deshpand.2002. Web Usage Mining:Discovery and Applications of Usage Pattern from Web Data [C]. SIGKDD Explorations, vol.1, no.2, pp.12-23,2002
    J.D. Holt and S.M. Chung.2001. Multipass Algorithms for Mining Association Rules in Text Databases [J]. Knowledge and Information Systems, vol.3, pp.168-183,2001
    J. Mostafa, W. Lam, and M. Palakal.1997. A Multilevel Approach to Intelligent Information Filtering:Model, System, and Evaluation [J]. ACM Trans. Information Systems, vol.15, no.4, pp.368-399,1997
    Ken Lang.1995. NewsWeeder:Learning to Filter Netnews [C]. ICML 1995:331-339
    N. Kushmerick, D. S. Weld, R. Doorenbos.1997. Wrapper Induction for Information Extraction [C].15'th International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, August 1997.
    Lada Adamic.1999a. The Small World Web [C]. ECDL'99, LNCS 1696, Springer,443-452.
    Lada A Adamic.l999b. Scaling Behavior of the World Wide Web [J]. Science 286,1999,15: 509-512.
    Lada A Adamic.2001. Friends and Neighbors on the Web[R]. Pre-print last modified,2001 Xerox Palo Alto Research Center.
    Laender A. H. F., Berthier A. R., Altigran S.2002. DEByE-data extraction by example [J]. Data Knowl. Eng.,2002,40,2:121-154
    Liren Chen, Katia P. Sycara.1998. WebMate:A Personal Agent for Browsing and Searching [J]. Agents 1998:132-139
    Liu L, Pu C, Han.2000. XWRAP:An XML-Enabled wrapper construction system forweb information sources [C]. In:Proceedings of the 16th international conference on data engineering. SanDiego, California, USA:IEEE Computer Society,2000,611-621
    Liu B., Grossman R. L., Zhai Y.2003. Mining data records in Web pages [C], In:Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington,2003,601-606
    Manber U. A. Patel, and J. Robison.2000. Experience with Personalization on Yahoo! [J]. Communications of the ACM,43(8):35-39,2000.
    Meng X., Lu H., Wang H., Gu M.2002. SG-WRAP:a schema-guided wrapper generator [C]. In: Proceedings of the 18th International Conference on Data Engineering, San Jose,2002,331-332
    Michael Gordon.2006. Adaptive Web Search:Evolving a Program That Finds Information [J]. IEEE INTELLIGENT SYSTEMS 20069/10
    M. Perkowitz and O. Etzioni.2002. Adaptive Web Sites [J].Comm. ACM, vol.43, no.8, pp. 152-158,2002
    Muslea I., Minton S., Knoblock C. A.2001. Hierarchical wrapper induction for semistructured information sources [J]. Autonomous Agents and Multi-Agent Systems,2001,4,1/2:93-114
    Neches R, Fikes R E, Gruber T R, etal.1991. Enabling Technology for Knowledge Sharing [J]. AI Magazine,1991,12(3):36-56
    Ning Zhong, Juzhen Dong, Yiyu Yao, Setsuo Ohsuga.2002. Gastric Cancer Data Mining with Ordered Information [J]. Rough Sets and Current Trends in Computing 2002:467-478
    Priss, U.2000. Lattice-based Information Retrieval [J]. Knowledge Organization,27,3,132-142 (2000)
    Prediger, S.1998. Kontextuelle Urteilslogik mit Begriffsgraphen. Ein Beitrag zur Restrukturierung der mathematischen Logik [M]. PhD Thesis. (1998)
    Prediger, S.,& Stumme, G.1999. Theory-driven Logical Scaling. Conceptual information Systems meet Description Logics [C]. In P. Lambrix, A. Borgida, M. Lenzerini, R. Muller,& P. Patel-Schneider (Eds.), Proceedings DL'1999. CEUR Workshop Proc.
    Rennie and A. McCallum. Using Reinforcement Learning to Spider the Web Efficiently. In Proceedings of ICML, pages 335-343,1999.
    R. Feldman, I. Dagen, and H. Hirsh.1998. Mining Text Using Keywords Distributions [J]. J. Intelligent Information Systems, vol.10, no.3, pp.281-300,1998
    Sahuguet A, Azavant F.1999. Building light-weight wrappers for legacy web data—sources using W4F [C]. In Proceedings of the 25th international conference on very large databases. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc.,1999,738-741
    Searchenginewatch.2004. http://www.searchenginewatch.com/.
    Sebastian M Maurer, Bernardo A Huberman.2000. Competitive Dynamics of Web Sites [R]. Pre-print Last modified 2000 Xerox Palo Alto Research Center.
    Shivakumar N. and H. Garca-Molina.1998. Finding near-replicas of documents on the web [C].presented at Proceedings of Workshop on Web Databases (WebDB'98), Mar,1998
    Sizov, M. Biwer, J. Graupmann, S. Siersdorfer.M. Theobald, G. Weikum, and P. Zimmer. The BINGO! System for Information Portal Generation and Expert Web Search. In Proc. of CIDR, 2003.
    Soderland. S.1999. Learning Information Extraction Rules for Semistructured and Free Text [J]. Machine Learning,1999.
    Song R., Liu H., Wen J., Ma W.2004. Learning important models for web page blocks based on layout and content analysis. SIGKDD Explorations,2004,6,2:14-23
    Spiliopoulou M. and L. Faulstich.1998. WUM-A Tool for WWW Utilization Analysis. In Proc. of the International Workshop on the World Wide Web and Databases (WebDB'98), pages
    S. Robertson and D.A. Hull.2000. The TREC-9 Filtering Track Final Report [R]. Proc. Text Retrieval Conf. (TREC-9),2000
    Stanley H E.1971. Introduction to Phase Transitions and Critical Phenomena [M]. Oxford University Press, New York,1971.
    Stumme, G.2002. Formal Concept Analysis on Its Way from Mathematics to Computer Science [C]. In U. Priss, D. Corbett,& G. Angelova (Eds.), Conceptual Structures:Integration and Interfaces,10th International Conference on Conceptual Structures, LNCS 2393. Berlin: Springer,2-19
    S. Schocken and R.A. Hummel.1993. On the Use of the Dempster Shafer Model in Information Indexing and Retrieval Applications [J]. Int'l J. Man-Machine Studies, vol.39, pp.843-879, 1993
    Yuefeng Li, Ning Zhong.2006. Mining Ontology for Automatically Acquiring Web User Information Needs [J]. IEEE Trans. Knowl. Data Eng.18(4):554-568 (2006)
    Yuefeng Li, Y. Y. Yao.2002. User Profile Model:A View from Artificial Intelligence [J]. Rough Sets and Current Trends in Computing 2002:493-496
    Wang J., Z. Chen, L. Tao.2002. Ranking Relevance to a Topic through Link Analysis on Web Logs [C]. The 4th ACM CIKM International Workshop on Web Information and Data Management (WIDM'02), pages 49-54,2002.
    Wang et al.2003. Data extraction and label assignment for Web databases [C]. In Proceedings of the Twelfth International World Wide Web Conference (WWW),187-196.
    Wille, R.1982. Restructuring lattice theory:an approach based on hierarchiesof concepts [C]. In I. Rival (Ed.), Ordered sets. Reidel, Dordrecht-Boston,445-470 (1982)
    Wille, R.1999. Conceptual landscapes of knowledge:a pragmatic paradigmfor knowledge processing [J]. In W., Gaul,& Locarek-Junge (Eds.), Classificationin the Information Age. Berlin:Springer,1999,344-356
    X. Li and B. Liu.2003. Learning to Classify Texts Using Positive and Unlabeled Data [C]. Proc. Int'l Joint Conf. Artificial Intelligence, pp.587-592,2003
    Y. Li, C. Zhang, and J.R. Swan.2000. An Information Filtering Model on the Web and Its Application in JobAgent [J]. Knowledge-Based Systems, vol.13, no.5, pp.285-296,2000
    Y. Li and N. Zhong.2004. Web Mining Model and Its Applications on Information Gathering [J]. Knowledge-Based Systems, vol.17, pp.207-217,2004
    Zhai Y, Liu B.2005. Web data extraction based on partial tree alignment [C]. In:Proceedings of the 14th International World Wide Web Conference, Chiba,2005,76-85

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700