基于领域知识库的地勘单位分类算法研究
详细信息    查看官网全文
摘要
地质资料不规范化造成了管理和检索的不方便,本文以实际应用为目标,以地质勘查单位名称为例,针对等同关系进行梳理,着重构建地质勘查单位间的等同关系,以提高检索查全率。通过少量的训练语料和对于部分实例的学习,归类总结地质勘查单位名称等价关系,并不断扩充到地质勘查单位领域知识库中。利用领域知识库,使用半监督式的分类学习方式,最终设计并实现了基于领域知识库的半监督分类算法。在识别效果显著的情况下,对算法初步识别结果进行进一步人工干预,最终达到92.20%的识别结果,提高了地质勘查单位名称在地质资料数据库中的检索效果,使其具有良好的可扩展性和复用性。
The informal geological data devote to the inconvenience of the management and retrieval, so to meet the of practice application, this paper sort out the equal relations of the geological prospecting companies, focus on the construction of the equal relations of the geological prospecting companies to improve the retrieval recall. By training a small amount of corpus and studying some examples to summarize and classify the equal relations of geological prospecting companies, and it is extended to the domain knowledge debase constantly. The semi-supervised classification algorithm based on domain knowledge database is designed and implemented by using the domain knowledge database and the semi-supervised classification learning method. In the case of significant recognition, taking further manual intervention on algorithm preliminary identification results, and ultimately reach 92.20% recognition results, it has improved the geological exploration units in the name of geological data in the database search results, which has good scalability and reusability.
引文