基于集成词表和对照索引的文献分类法兼容互换研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
文献分类法的兼容互换已成为情报检索语言研究中亟待解决的问题之一。实现国内不同领域应用范围较为广泛的几种文献资料分类法的兼容互换,能够消除浏览、标引、检索中的障碍。实现文献分类法兼容互换能够满足信息检索和信息组织的迫切需要,更大程度实现文献信息资源的共享。
     文中以《中国图书馆分类法》(第四版)、《中文新闻信息分类与代码》和《社会科学检索词表》教育类为例分析了文献分类法兼容互换的可行性、实现原理,通过分析三部分类法类目之间的差异,认为类目之间应该存在多种转换方式。总结前文所述的国内外研究方法和经验的基础上,采用基于集成词表模式和对照索引模式的方法实现文献分类法兼容互换。文中把集成词表也看成广义上的对照索引,其中集成词表是文献分类法分类表结构和类目体系之间互相对照;其二是两种分类法的索引之间的对照,通过对照实现语义及语词字面的兼容互换。本文根据实际需要采用人工和机器结合的方法,在运用中对两种方法取长补短,主要运用词汇相似度计算的方法进行类目结构转换和字面转换,得到不同文献分类法的类目匹配对应结果,即是通过计算类名词、注释词、上位类类名词、下位类类名词以及类目对应的主题词等词汇的语义相似度,得到类目的相似度,进而确定类目之间的对照关系。在借鉴国内外分类语言互操作的经验和方法的基础上,设计和构建了一个可以方便修改和扩充的分类集成词对照表。本文中运用了Access2003、Visual Basic语言、Deamweaver等工具,采用计算机机械运算结果和人工相结合的方式构建一个具有多效能的实验系统,该系统能选择原有分类法单独查找,也能选择其中的两部分类法进行语词相似度匹配。实现了新闻领域与普通领域的不同分类法中具有相同语义的类目之间相互调用。
     文中参照《中图法》(第四版)编制了分类对照索引,即将不同词表的索引数据合并,利用计算机辅助编制轮排索引,此索引丰富了标引人员和用户查找、利用分类表和分类目录的方式,方便了分类法的使用。最后提出了有关分类法兼容互换的建议。
With the development of Internet, especially library classification, the interoperability becomes an evergrowing demand in Internet searching. Users urge a system to unify various all kinds of Literature information, which may eliminate browsing, indexing and retrieval of obstacles.
     Compatibility and conversion of Literature Classification could meet the urgent needs of information retrieval and information organizations, and achieve a greater degree of information resources sharing information. This paper analyses the main methods of the cross-browsing and cross-searching based on the Multiple thesaurus between Chinese Library Classification and other classification schemes, including the choice of general classification scheme, the process of making Multiple thesaurus etc, also introduces the main function of Multiple thesaurus and library Literature of this project. On basis of concluding described the research methods and experiences at home and abroad. In this paper, the term sheet to achieve compatibility exchange of literature based on integrated thesaurus model and index model method. This paper in the form of the integrated thesaurus is also seen as the broadest sense of control index, which is matching each other between literature classification categories and classification structure, and the second is the two index of the classification, through mapping to achieve the compatibility and the conversion of Semantic and words literally through mapping. Based on the actual need for a combination of machines and artificial methods, the use of the complementarities of the two methods, mainly the use of vocabulary similarity calculation method archive structural transformation and literal conversion, corresponding match results different the literature classification categories, That is, by calculating semantic similarity of Term, including class terms, the word Notes, the upper class category term, the term under-class category and the corresponding categories such as MeSH terms, by similarity of categories, thereby determining the compatibility and conversion of the relationship between categories. This paper Construction and design on integration of the classifications can easily modify and expansion of the word tables. This paper in the use of the Access2003, Visual Basic language, and other tools, using computer and mechanical build a multi-effectiveness of the experimental system, the system can choose a separate classification of the original find, can also select the two of the Classifications to match words similarity, To achieve the call the different areas in the same categories of semantics between the news field information and the general classification
     In this paper, the light of "Chinese Library Classification" (fourth edition) prepared a classification mapping index, that is consolidation different thesaurus index data, the use of computer-aided round Index, the index has enriched the indexing and users find, use tables Categories and the way to facilitate the use of the classification. At last, it proposes some suggestions about the application of compatibility and conversion of Literature Classification.
引文
[1]马张华,侯汉清.文献分类法主题法导论[M].南京:书目出版社出版,1999.7:334-344
    [2]崔明爱.图书分类法兼容问题的研究及类目对应转换试验[J].图书情报知识.1988(4)
    [3]刘话梅.基于集成词库的情报检索语言互操作研究.侯汉清指导[D].南京农业大学硕士论文,2004,6:1-10
    [4]侯汉清.计算机建立分类法和主题词表转换系统的尝试[J].现代情报.2003(06)
    [5]国家标准《中文新闻信息分类与代码》的地位及研制意义.新华网-《中国传媒科技》.http://www.bzw.com.cn/article/show.asp?id=5441 2008.01.05
    [6]张琪玉.中国情报语言20世纪回顾.图书与情报[J],1999(4).[PDR2-11].
    [7]联机计算机图书馆中心.http://www.oclc.org/dewey/resources/summaries/deweysummaries.pdf
    [8]联机计算机图书馆中心(online ComputerLibrary Center,Inc.).东南大学图书馆.http://library.dhu.edu.cn/pages/elect_oclc_intro.aspx 2007-11-12
    [9]杜威十进分类法(DDC)网络版(13简版).加拿大:Near North District School Board图书馆提供,http://www-lib.nearnorth.edu.on.ca/dewey/ddc.htm.[2008.03.07].
    [10]Scout Report Signpost.美国高等教育社区服务的网页.http://scout.wisc.edu
    [11]瑞典皇家图书馆.http://www.kb.se/hjalp/english/[2007.12.02]
    [12]瑞典皇家图书馆.http://libris.kb.se/[2007.12.02]
    [13]曾蕾.联网环境下的情报检索语言[M].北京:书目文献出版社.1996(12):79-82
    [14]About Renardus.[2008.04.28].http://www.renardus.org.
    [15]Software for building and editing thesauri.htm.[2008.05.02].http://www.searchrnonster.org/
    [16]Libraries.[2008.01.08].http://www.loc.gov/rr/international/european/sweden/resources/se-libraries.html
    [17]LIBRIS.http://websok.libris.kb.se/websearch/form?type=simple.[2008.01.12]
    [18]美国国会图书分类法(LCC).http://www.loc.gov/index.html
    [19]山西省图书馆.http://lib.sx.cn/.[2007.04.03]
    [20]张琪玉.我国情报语言20年来的进步与向21世纪前进的目标[J].图书馆,1999(4):1-7,21
    [21]Sention on Classification and Indexing.(2001) Newsletter Nr.24,December 2001http://www.ifla.org/Ⅶ/s29/pubs/Draft-multilingualthesauri.pdf
    [22]曾蕾(Kent State University).用于标引,浏览,检索的语义工具(Semantic Tools)[R].数字图书馆国际学术与产业论坛(会议).2005.6
    [23]Lei Zeng.U.S.Standards for Controlled Vocabularies-NISO Z30.19[CP/OL][S].http://www.slis.kent.edu/~mzeng/IFLACLASS/Standards.ppt 2008-01-18
    [24]张雪英,侯汉清.分类表叙词表转换系统的设计[J].情报学报,2000(4):342-348
    [25]邓茜,林红,新华社新闻信息中心《中国传媒抖技》:http://news.xinhuanet.com/newmedia/2005-11/22/content_3817640_1.htm 2005.09
    [26]徐曼.《中文新闻信息分类标准》研究[D].武汉大学学位论文.刘家真指导.2006.03.27
    [27]新华社.中文新闻信息分类表.北京:新华社出版,2007.12
    [28]社会科学检索词表编辑委员会编.社会科学检索词表。北京:社会科学文献出版社,1996.5
    [29]中国图书馆编辑委员会编.中国图书馆分类法(编制说明).北京:北京图书馆出版社,1993:1-10
    [30]徐曼.《中文新闻信息分类标准》研制的思路与方法[J].中国传媒科技,2005(5):16-19
    [31]Monge AE,Elkan CP.The field-matching problem:algorithm and applications.Proceeding of the Second Internet Conference on Knowledge Discovery and Data Mining,Oregon,Portland,1996,8:267-270
    [32]Metadata Research Program.http://metadata.sims.berkeley.edu/index.html,2008.3
    [33][转引]Pierre P.Senellart.Extraction of information in large graphs;Automaitc search for synonyms.Masters Intership Reports.University catholique de Louvam,Louvain-la-Neuve,Belgium.2001,1-17.
    [34]Resnik P.Semantic similarity in a taxonomy:an information-based measure and its application to problems of ambiguity in natural language,journal of artificial intelligence research.1999,11:95-130.
    [35]Li SJ,Zhang J,Huang X,Bai S.Semantic computation in Chinese question-answering system.Journal of Computer Science and Technology.2002,17(6):933-939.
    [36]基于词汇共现的统计算法.http://metadata.sims.berkeley.edu/index.htm
    [37]梅家驹.同义词词林[M].上海:上海辞书出版社,1983
    [38]Zhou ML.Some concepts and mathematical consideration of similarity system theory.Journal of System Science and System Engineering,1992,1(1):84-92
    [39]章成志.基于多层特征的字符串相似度计算模型.情报学报,2005(6)
    [40]侯汗清,新闻信息数据库后控词表的设计和编制[J].江苏图书馆学报,2000(2):12-16http://www.sciencenet.cn/blog/Print.aspx?id=16192 2008-03-02
    [41]孔翔勇.基于知网的汉语词相似度计算[D].哈尔滨工业大学硕士论文.王晓龙指导.2002.7
    [42]戴剑波.《中国图书馆分类法》和《杜威十进分类法》的映射系统[D].南京农业大学硕士论 文.2005.6
    [43]赵石顽,夏莹.马少平.基于统计的中文词自动分类研究.智能技术与系统国家重点实验室.清华大学计算机系.http://www.yyxx1.sdu.edu.en/content/guojihy-zsw.htm.2008-04-11
    [44][转引]罗竹风.汉语大词典第9册(M)上海:汉语大词典出版社,1992.7
    [45]侯汉清,王荣授.图书馆分类工作手册[M].北京:中国科学技术出版社,1992.
    [46]李华.《中国图书馆分类法》第2、3、4版索引测评[J].图书馆建设,2005(1):52-54
    [47]沈治宏.印刷型索引与数据库系统的比较研究[J].图书馆理论与实践,2000(1):49-51
    [48]沈治宏、吴洪泽.中文索引编制系统.全国高校古籍整理研究工作委员会资助项目.四川:四川出版音像中心出版.1997:1-20
    [49]陶原珂(广东省社会科学界联合会).术语轮排索引的学科覆盖优势与局限性.《术语标准化与信息技术》.2005(2):16-20
    [50]张琪玉.中国图书馆图书分类法(第二版)索引[M].北京:书目文献出版社,1984:1-20
    [51]中国图书馆图书分类法编辑委员会编.中国分类主题词表(Vol(2),No.6)[M].北京:华艺出版社,1994
    [52]李华,徐青青.《中国图书馆分类法》索引编制技术的演变——评《中国图书馆分类法》第2—4版索引[J].中国索引,2004(4):45-48,20http://www.cnindex.fudan.edu.cn/zgsy/2004n4/xvqinqin.htm[2004.4]
    [53]张琪玉.汉语检索词词素轮排索引编制法探索.图书与情报,1992(4):16-19
    [54]中国图书馆分类法编辑委员会.《中图法》(第四版)索引.北京:北京图书馆出版社,2000.11:482-483,594
    [55]侯汉清.索引技术和索引标准(第1版).北京:北京图书馆出版社,1997.10:290-320
    [56]侯汉清.新闻信息数据库后控词表的设计和编制[J].江苏图书馆学报,2000(2):12-16
    [57]张贻秀.计算机叙词轮排索引系统[J].西安电子科技大学学报.1997(1):153-159
    [58]戴维民.汉语叙词表轮排索引编制技术论析[J].中国图书馆学报,1996(6):25-31

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700