As technology advances and idea is updated, Web has become one of the main sources ofinformation to be obtained today. As the amount of information on the web is dramaticallyincreasing, web brings people lots of information, but has led to difficulties for one toaccurately search for the information needed. It is a trend to endow the web semanticinformation and to use it as knowledge based resource sharing so as to make it easy for peopleto obtain relevant knowledge and information through the web.
     The scientific data sharing project is an important part of the construction of nationalscientific and technological innovation system as well as of the basic platform for nationalscientific and technological development. Sharing of forestry scientific data is one part of thisproject. As a portal site, Scientific Data Center of Forestry has been greatly developed andexpanded during the construction and service in the past more than10years. More and morefields have been added and the amount of data has been greatly increased. Faced with such alarge amount of forestry scientific data, how to make the search of information more rapid andeasilier has become the goal to develop a such system. Focusing on the existing problems inthe traditional information retrieval, we tried to unearth the information and rules behind thedata in terms of semantics to provide one with high quality service.
     Semantic information retrieval is a kind of new technology that combines traditionalinformation retrieval with ontology knowledge management, data mining and natural languageprocessing. In this dissertation a research on the semantic information retrieval was conductedbased on ontology and a semantic information retrieval model for forestry scientific data wasproposed. A systematic analysis and research on the critical technologies such as ontologyknowledge model, document semantic pre-processing, semantic query expansion and semanticretrieval were conducted. The significant findings included:
     (1)In this study, the ontology model of forestry scientific data was developed based onthe theory and technology of ontology construction. The selection of concept assembly and the relationship among the core concepts were decribed in detail so as to provide an importantfoundation for the semantic information retrieval of forestry data.
     (2)This study conducted the research on semantic web framework and analyzed andexplored the methods of maintaining, storing, inferring and querying with the ontologyknowledge model of forestry scientific data. Results showed that in comparision with therelational database, the ontology based TDB persistent storage was more efficient and themaximum efficiency was60times better. At the same time, if Jena and Pellet reasoning werecombined for triple group reasoning in forestry scientific data ontology, the efficiency wouldbe10%higher than that using Jena and Pellet separately.
     (3)The study on document semantic pre-processing was conducted. To increase theaccuracy of the dictionary, a total of over70,000professional words were collected throughanalysis of current forestry scientific data. The feature weights of words and terms in thedocument were expressed using vector space. The feature sets of concepts were extracted fromforestry scientific data ontology and used as cluster centers. The document clustering wascarried out using k-means model and the similarity of cosine was employed as the distancemeasure. Finally, the reverse index based method was explored. Results showed that theaccuracy of clustering was81.4%.
     (4)In this study, a kind of semantic query expansion method was put forward. In thismethod, the queries were first classified into three categories: single key words, multi-keywords and question sentences. For single key words, the modified semantic similarity was usedfor query expansion. For multi-key words, integration of semantic reasoning and semanticsimilarity was applied. For question sentences, syntax analysis and semantic reasoning werecombined. These semantic query expansion methods are critical for semantic informationretrieval.
     (5)Based on the above research, a semantic information retrieval system of forestryscientific data was developed using the semantic web framework and the means of semanticquery for information was realized. The system was compared with traditional retrieval modelsthat are based upon key words matching. Results showed that the semantic retrieval method developed in sthis study performed much better than the traditional retrieval methods in termsof success and accuracy.
     The research on semantic information retrieval is theoretically and practically important.This dissertation focused on studying and exploring the semantic retrieval of forestry scientificdata using the eight categories of forestry data that currently exists in Forestry Science DataCenter. Based on the ontology theory, the forestry scientific data ontology was built, whichprovided the potential for knowledge model sharing and reuse in forestry domain. Meanwhile,a method about the semantic retrieval based on the forestry scientific data ontology wasexplored. By combined with the network counting technology, moreover, a semantic retrievalsystem of forestry scientific data was designed, developed and evaluated. This provided atheoretical basis and technical support for sharing of massive forestry scientific data on thesemantic level. The realization of the semantic retrieval system provided a new way forforestry scientific data sharing and it can also be used as a reference for other data sharingplatforms.
