Along with the rapid development of Internet, there are abundant, isomeric, semi-structured and dynamic information resources on Web. Among these Web information, above 80 percent exist in the form of Web text. How to seek and gain the valuable information and knowledge model from these vast Web information resources, have already become the question urgently awaited to be solved in the information processing domain.The questions mentioned above can be resolved effectively by Web knowledge acquiration. It can classify search results, which not only enhances the efficiency of search for Web users, but also improves the ability of localization to goal knowledge, and extracts the valuable knowledge.
     On basis of analyzing the present research situation and existing question of Web knowledge acquisition, this dissertation mainly studies the essential technologies of concept semantic generation, the common text classification methods, user profile construction and approximate query technique based on concept. The main research works are shown as follows.
     (1) With the aid of realizes on simple, explainable metrics from the NMF algorithm's decomposition result, a concept semantic generation method is proposed. In analogy with image decompotion, the NMF is applied to extract the concept semantics from text vector, providing one new way for the large-scale text processing. The experimental results as well as the related work comparison indicate that the concept semantics from the application of NMF can reflect accurately the partial characteristic of the sample, which help to solve the natural language expression problem.
     (2) The mechanism of text callasification based on NMF is studied. The local concept semantics vector from NMF has stronger clssification capacity than that of global concept semantics, because the fromer can correspond directly with the sample characteristic, which manifests each classified text respective characteristic. Experiment to compare the influence of local concept semantics space and the global concept semantics space construction to the text classification result is conducted. The experiment results indicate that the classification in the local concept semantics space by NMF is most precise.
     (3) Taking advantage of the decomposion efficiency of the large-scale text matrix by NMF, a method based on NMF for construction typical user conversation profile is presented. According to NMF, the term-text matrix is decomposed to capture the relations between terms. Then, the concepts of semantic vectors and weight vectors are introduced. Futhermore, the the class closeness degree is defined to extract the user profile. From the point of guaranting the concept semantics vector orthogonal, reducing the concept semantics vector redundancy, LNMF is carried on the dimensionality reduction. Because LNMF obtains the concept semantics vector is as far as possible orthogonal, the experiment result shows the LNMF method not only improve filtering precision markedly, but also has the merits of aggregation
     (4) To deal with query reformulation, an ontology concept approximate query method based on most concise multi-dimensional concept is proposed. Firstly, the most approximate concept is defined. Using the implication relations between the complex concepts, the multi-dimensional and the most concise multi-dimensional concept are defined, which makes it possible to obtain the most approximate concept from the multi-dimensional concept. So the question to get most approximate concept is transformed to get the most concise multi-dimensional concept. Related properties and theorems show that the method can reduce the query reformulation redundancy effectively and improve the approximate query quality and efficiency.
     (5) An algorithm to get the most concise multi-dimensional least upper concept is proposed. The detailed procedure and method to reduce search space and improve efficiency are discussed. Last but not the least, the algorithm accuracy and completeness is proved.
