现代汉语书面语中跨标点句句法关系约束条件的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目前,汉语的句法分析研究基本上以单句为对象,但在真实语料中,汉语单句边界的自动确定是很困难的。在句子层面上,主要的形式标记是标点。计算机处理汉语的前提是汉语的形式化,因此标点句自然而然就成了计算机处理汉语句子的基本单位。标点句的边界是清楚的,但很多标点句的句法成分不完整,需要到上下文语境中去寻找。但跨标点句的句法分析问题尚无系统性方法,这就使得汉语长句分析和长句生成效果很差,并已经成为汉外机器翻译和汉语理解等深层次汉语处理应用系统的瓶颈。为了解决这个问题,首先要对汉语跨标点句的句法关系作一番仔细的调查分析,总结出一些规律和约束条件。
     本项工作是在跨标点句句法关系的理论框架下展开的,主要目的是解决跨标点句共享成分的识别问题,找出这类句法关系在满足栈形规律之外,还应满足哪些可以形式化的约束规则,以便计算机处理。
     本文的工作包括两方面:
     (1)语料库标注、调查和统计。
     标注了钱钟书《围城》全文,共计22,6641字,2,4115个标点句。标注内容包括跨标点句间的句法关系类型、共享成分、标点句的内部的浅层句法结构,从中得到了标记语料中各种跨标点句句法关系的统计数据。笔者还借助文本检索工具对数千万字的中国现代小说、当代小说进行了多项专门调查和统计。
     (2)约束条件挖掘。
     在标注语料和专项调查的基础上,分列大小一百多方面总结出跨标点句句法关系发生的各种约束条件。重点研究原配句和续配句同源并且是正序关系的情况。涉及的内容包括:
     名词或代词开始的标点句主语是否缺失。
     主动宾结构的标点句,续配句主语是原配句主语还是宾语。其中讨论了原配句为感知动词句、“有”字句、句宾动词句、连动结构、“像”字句、“V着”句、“V完”句的情况以及一些关联词、副词、形容词、名词对于共享成分的影响。
     续配句共享原配句状语的认定,涉及多种形式的状语,专章讨论了否定词的跨标点句管辖的判断。
     续配句共享原配句定语的认定,涉及量词、形容词、代词、名词和名词短语的情况。
     原配句是把字句、被字句时,句内成分被共享的情况。
     “跟”与“和”连接的名词短语被续配句整体共享或部分共享的区分。
     原配句是兼语句时,句内成分被共享的情况。
     本文的工作在如下方面是有特色的:
     (1)研究范围方面,除了前人已有的研究跨越标点句的主谓关系之外,还研究了跨越标点句的定中关系、状中关系、述宾关系、述补关系、介宾关系等,全面铺开了跨标点句的句法体系的研究。
     (2)研究角度方面,侧重于约束条件中的形式化特征,研究成果具有较强的可操作性,为计算机自动进行跨标点句句法关系的分析打下了一定的基础。
     (3)研究方法方面,不满足于举例说明。除了使用传统的自省方法,寻找语言规律的认知理据外,重视真实语料的语言现象统计,以统计数据作为规律可靠性的佐证。
     本文的创新性主要表现在语言特征的多角度的深入挖掘方面。择要列举如下:
     原配句是主动宾结构的情况下,关于缺主语的续配句共享原配句主语还是宾语,本文指出了几种重要的区别特征:
     指出区别主语话题与宾语话题的主要标志之一是静态句、动态句,从形式上界定了这两种标点句,指出了这两种标点句同主语话题和宾语话题的关系。
     根据动词对施事、受事的影响,把动词划分为只对施事产生影响的动词和对施事、受事都产生影响的动词,用以区别主语是否转换。
     提出信息量的概念,指出原配句是“有”字句以及续配句是中间态形容词谓语句时,续配句的主语确定同原配句宾语的信息量有关,宾语信息量越小,宾语作为续配句主语的可能性越大。
     把标点句分为独立标点句和不独立标点句,用于解决标点句之间是否发生共享关系。
     把名词从总体上分为独立名词和不独立名词,用于判断标点句的完整与否。对于一些主-副型的连动谓语句,本文采用句型变换的方法归结为主动宾型的单谓语句,再决定续配句的主语认定问题。
     把动词和形容词作谓语的情况总体划分为方向性谓语和非方向性谓语,用于解决并列名词短语被整体共享还是部分共享的问题。
     把副词、时间词等状语总体划分为句子状语和词语状语,用于解决状语成分是否被共享的问题。
     对于各种词性的词语从语义角度进行了细致的分类,用于解决跨标点句共享成分的确定问题。这些词类多数曾散见于多种语言学文献中,但界定方法和使用目标不同,有些是本文首次提出的。本文将这些词类综合使用,有些进行了重新界定,并在高频词范围内给出了这些词类的词表。其中包括:
     动词词类:存现动词、准存现动词、感官动词、关系动词、认知动词、心理动词、行为动词、使令动词、身体行为动词;
     名词词类:器官名词(部件名词)、属性名词、亲属名词、心理名词;
     形容词词类:动态形容词、静态形容词、中间态形容词;
     副词词类:短暂动作副词、心理副词、情态副词、时间副词、关联副词、评注性副词、范围副词、程度副词等;
     提出了心理词的概念,包括心理名词、心理动词、心理形容词、心理副词。
     其中本文首次提出的词类有:中间态形容词、短暂动作副词、心理词、心理副词、心理形容词。
     语言学文献中出现过,但界定方法和范畴不同的有:准存现动词、动态形容词、静态形容词、情态副词。
     使用平行结构的方法判断成分共享。等等。
     在跨标点句句法关系领域,本文的工作是相当初步的。由于时间的关系,许多问题还未涉及到,许多问题只是开了一个头。研究成果还比较零乱,系统性不够,更未涉及算法化、程序化的工作。这些工作将在今后逐步展开。
Currently, the Chinese syntactic analysis is basically targeted at single sentence. However, the border of Chinese single sentence is very difficult to assure automatically in real corpus. The main form tag is punctuation sentence levels. The prerequisite of Chinese language processing is to formalize. So punctuation sentence become the basic units that computer processes Chinese sentence automatically. The border of Punctuation sentence is clear, but the syntactic elements of many of punctuation sentences is incomplete ,and we need to find them in context. But the problem of syntax analysis of inter - punctuation sentence is not systemic .This makes the parsing of Chinese Long Sentences and the generating of long sentences a poor result,and has become the most difficulty of foreign and Chinese machine translation and the deep-rooted understanding of Chinese Processing. To solve this problem, first, we must investigate the syntactic relations of Chinese-punctuation-sentences carefully and summed up some rules and constraints.
     This work is based on the theory framework of the punctuation sentence. The main purpose is to identify the common element in punctuation sentence, and in order to computer process punctuation sentences expediently ,we need find the formal binding rules besides the stack-type rules in the syntax relation. This work consists of two aspects:
     (1) mark the corps and make a survey and statistics We Marked the total of Qian Zhongshu's "WeiCheng", 22, 6641 words and 2,4115 punctuation sentence. The tags include the syntactic relations between punctuation sentence, the common ingredients, the shallow syntactic structure within the punctuation sentence and we gain the statistical data about each kind of punctuation sentence in marked Corpus. I also use text retrieval tools to do some specialized investigations and statistics on modern and contemporary Chinese novel of tens of millions of characters.
     (2) Mining the constraints
     On the basis of marked corpus and special investigations,we summed up various of constraints of punctuation sentence from about a hundred of big or small aspects . We focus on the punctuation sentences that yuanpei sentence and xupei sentence is homologous and ordinal .The contents include :
     whether the punctuation sentences whose beginning element is noun or pronoun miss subject.
     If structure of yuanpei-sentence is subject–verb-object,the subject of xupei-sentence is subject or object in yuanpei-sentence .We discusses these punctuation sentence whose yuanpei-sentence’predicate is sense verb ,”有”,sentence-object verb,two-verb structure,”像”,”V着”,”V完”as well as the affect to common elements of relevance words, adverb, adjective and noun.
     How to identify the adverbial modifier of xupei-sentence,involving various forms of adverbial. We discuss the domain of negative word in punctuation sentence in a special chapter.
     How to identify the attribute of xupei-sentence, involving quantifiers, adjectives, pronouns, nouns and noun phrase.
     If yuanpei-sentence is把sentence and被sentence ,how to identify the common components in sentence.
     How to identify the overall or part of the noun phrase connected with“跟”in yuanpei-sentence is shared by xupei-sentence.
     If Yuanpei-sentence is jianyu-sentence, how to identify the common components in sentence.
     This work is characteristics in the following aspects:
     (1) About the scope of the study, in addition to previous studies about the subject-predicate punctuation sentence, We also studied the attribute-head punctuation sentence, adverb-head punctuation sentence ,predicate-object punctuation sentence ,predicate-complement punctuation sentence,preposition-object punctuation sentence, spreading completely the syntactic system research of the punctuation sentence.
     (2) About the research perspective, We focus on the formal features of constraints, so the studying results is convenient to operate, and lay a solid foundation for computer processing automatically.
     (3) About the research methods, besides examples ,we not only try to find the language cognitive reasons in traditional methods of self-examination, but also focus on the language phenomenon statistics in real Corpus, and look the statistical data as the reliability corroboration of the rules. In this paper, the major innovative features is the deep mining of the language features from many perspective. The main features are given in the following:
     If Yuanpei-sentence is the structure of“subject-verb-object”,and the xupei-sentence lack of subject,how to identify the xupei-sentence uses the subject or object of yuanpei-sentence .Tthe paper pointed out several important differences features:
     To identify the subjects topic and the topic of object, one of the main indicators is static sentence and dynamic sentence ,and formally defined both punctuation sentence, pointing out the relation about the two kinds of punctuate sentences with the subject topic and the object topic.
     According to the affect of verbs to agentive nouns , verb is divided into the verbs only impacting on Agent nouns and verbs which will have an impact on the patient nouns to distinguish whether the subject convert or not.
     Put forward the concept of information ,and point out if yuanpei-sentence is“有”sentence and or xupei-sentence‘s predicate is middle-state adjective phrase,the confirming of the subject of xupei-sentence has relation to informativity of the object in yuanpei-sentence . The smaller the informativity of object is, the more likelihood object is the subject of xupei-sentence. We divided punctuation sentence into independent punctuation sentence and dependent punctuation sentence to judge whether the two punctuation sentences has relation .with each other.
     We divided nouns into independent and non-independent nouns overall to judge whether the punctuation sentence is integrated or not.
     For the punctuation sentence whose predicate has two vebs and has the relation of main-Vice , the paper used Sentence transform method to attribute them to single predicate sentence which is subject-verb-object and then confirm the subject of xupei-sentence.
     We divide verbs and adjectives predicate overall into directional predicate and non-directional predicate to settle the question whether overall parallel noun phrase is used or part or them is used.
     Put adverbial modifier into sentences adverbial modifier and lexical adverbial modifier to judge whether the adverbial modifier is shared.
     The above concepts and classifications were introduced for the first time in this paper.
     Make detailed classification to many words of each POS from semantic to resolve the confirming of common components in cross-sentence punctuation. Many of these parts has appeared in much linguistics literature, but the methods to define them and the purpose is different, Some of this is put forward the first time. This paper will use these word classes synthetically, and some have been redefined, and we given word list within high-frequency words. These include:
     Verb classes:existential-presentative verbs,pre- existential-presentative verbs,sensory verbs,cognitive verbs,mental verbs,motion-verbs,command verbs,body-motion verbs.
     Nouns classes :organ nouns,attribute nouns,family nouns,mental nouns; Adjective classes :dynamic adjective,static adjective,middle adjective; Adverb classes: momently-motion adverb,mental adverb,modal adverb,time adverb,conjunction adverb,scope adverb,extend adverb and so on; Put forward the concept of mental words,including mental nouns, mental verbs,mental adjective,mental adverb.
     The words classes put forward the first time in the paper are:organ nouns,middle adjective,momently adverb,mental adverb,mental nouns,mental words
     The words classes which appear in linguistics literature but the the method of defined and domain is different sre: pre-existential-presentative verbs,body-action verb,dynamic adjective,static adjective.
     We also use parallel structure to settle the question.
     And so on.
     This work is very preliminary in the field of syntax relation of punctuation sentence. Due to time constraint, many of the issues are not mentioned, many of the problems have only the first step. Research results are more chaotic, not much systemic, not covered algorithm, the procedures. These will be gradually carried out in the future.
引文
[1] 陈平:汉语零形回指的话语分析,《中国语文》1987 年第 5 期
    [2] 陈平:《现代语言学研究》,重庆出版社,1991 年 5 月第一版。
    [3] 方梅:关于复句中分句主语省略的问题,《延边大学学报》,1988 年
    [4] 傅爱平: 汉英机器翻译源语分析中词的识别,《中文信息学报》,1999 年第 5 期
    [5] 候敏,孙建军:汉语中的零形回指及其在在汉英机器翻译中的处理对策,《中文信息学报》,2004 年第 19 卷第 1 期
    [6] 胡德清:流水句的理解与英译,大连外国语学院学报,1999 年第 3 期
    [7] 华宏仪:主语承主语省略探讨,烟台师范学院学报(哲学社会科学版),2001,18(1)
    [8] 华宏仪:主语承非主语省略探讨,烟台师范学院学报(哲学社会科学版),2002,19(2)
    [9] 黄河燕,陈肇雄:基于多策略分析的复杂长句翻译处理算法,《中文信息学报》,2002年,第16卷第3期
    [10]黄南松:现代汉语叙事体语篇中的成分省略,中国人民大学学报,1996 年第 5 期
    [11]黄南松:现代汉语的指称形式及其在篇章中的运用,《世界汉语教学》,2001 年第 2 期。
    [12]蒋平:零形回指的句法和语篇特征研究,上海外国语大学博士学位论文,2004 年。
    [13]李临定:《现代汉语动词》,中国社会科学出版社,1990 年
    [14]李临定:《现代汉语句型》,商务印书馆,1986 年
    [15] 李幸,宗成庆,引入标点处理的层次化汉语长句句法分析方法,《中文信息学报》,2006年,第4期
    [16] 廖秋忠:现代汉语中动词支配成分的省略,《中国语文》1984 年第 4 期
    [17] 廖秋忠:《廖秋忠文集》,北京语言学院出版社,1992 年 10 月。
    [18] 刘长庆:汉语动态形容词的界说及其基本特征,武汉理工大学学报,2006 年 10 月
    [19]刘月华等:《实用现代汉语语法》,商务印书馆,2006 年
    [20]刘倬,傅爱平:机器翻译中汉语的形式和语义分析二题,《中文信息学报》,1999 年
    [21]鲁松:《汉英机器翻译中主语省略现象的处理方法》,北京工业大学硕士论文
    [22]鲁松,宋柔:汉英机器翻译中描述型复句的关系识别与处理,《软件学报》,2001 年 11卷
    [23]吕叔湘:汉语句法的灵活性,《中国语文》,1986 年第 1 期
    [24] 毛奇,连乐新,周文翠,袁春风,基于标点符号分割的汉语句法分析算,中文信息学报,2007年,第2期
    [25]孟琮等:《汉语动词用法词典》,商务印书馆,2000 年 3 月
    [26]沈阳:动词的句位和句位变体结构中的空语类,《中国语文》,1994 年第 2 期
    [27]沈阳,郑定欧:《现代汉语配价语法研究》,北京大学出版社,1995 年 6 月。
    [28]宋柔:汉语小句前部省略现象初析,《中文信息学报》,1992 年 3 期
    [29]宋柔:一种汉语主语承前省略现象的分析兼谈汉语叙述文处理, 全国机器翻译研讨会, 1992 年
    [30]宋柔,潘维桂,尹振海:关于主语省略的一项实验 ICCC1992,北京,1992.10
    [31]宋柔:基于前缀省略的汉语叙述文篇章结构模型,全国计算语言学联合学术会议,1991年 11 月,杭州
    [32]宋柔:从主语省略现象看汉语记叙文处理,《机器翻译研究进展》,电子工业出版社,1992 年
    [33]宋柔:现代汉语书面语中跨小句的句法关系,香港城市大学语言资讯中心学术报告,19912.12
    [34]宋柔:现代汉语书面语中跨标点句的句法关系研究. 未发表
    [35]田然:现代汉语叙事语篇中 NP 的省略,北京语言大学硕士学位论文,2000 年。
    [36]邢福义:《汉语复句研究》,商务印书馆,2003 年
    [37]夏军:现代汉语省略系统研究,山西大学,2004 年硕士学位论文
    [38]徐纠纠:《现代汉语篇章回指研究》,中国社会科学出版社,2003 年 10 月
    [39]徐烈炯:与空语类有关的一些汉语语法现象,《中国语文》,1994 年第 5 期
    [40]许余龙:《篇章回指的功能语用探索》,上海外语教育出版社,2004 年 11 月
    [41]许余龙:从回指确认的角度看汉语叙述体篇章中的主题标示,《当代语言学》,2005 年2 期
    [42]袁毓林:《汉语动词的配价研究》,江西教育出版社,1998 年
    [43]袁毓林,郭锐:《现代汉语配价语法研究》第二辑,北京大学出版社,1998 年
    [44]袁毓林:并列结构的否定表达,《语言文字应用》,1999 年第 3 期。
    [45] 袁毓林:流水句中否定的辖域及其警示标志,《世界汉语教学》,2000 年第 3 期
    [46]张伯江:论“把”字句的句式语义,《语言研究》,2000 年第 1 期
    [47]张国宪:现代汉语的动态形容词,《中国语文》,1995 年第 3 期
    [48]张国宪:现代汉语形容词的体及形态化的历程,《中国语文》,1998 年第 6 期
    [49]张国宪:延续性形容词的续段结构及其体表现,《中国语文》,1999 年第 6 期
    [50]张谊生:《现代汉语副词研究》,学林出版社,2006 年 6 月
    [51]赵元任:《汉语口语语法》,商务印书馆,2005 年 6 月
    [52]郑锦全:通信本位汉语篇章语法,《世界汉语教学》,1988 年第 1 期
    [53]朱德熙:《语法讲义》,商务印书馆,2000 年
    [54]中国社科院现代汉语研究室《句型和动词》,语文出版社,1987 年 4 月
    [55]Cher-Leng Lee:Zero Anaphora in Chinese,台北文鹤出版有限公司,2002 年 5 月。
    [56] Megumi Kameyama ``A Property-Sharing Constraint in Centering'', Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, Columbia University New York, New York, USA, 1986, pp 200-206.
    [57] Susan E Brennan, Marilyn W Friedman and Carl J Pollard ``A Centering Approach to Pronouns'', Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford University, Stanford, California, USA, 1987, pp 155-162.
    [58] Megumi Kameyama, Rebecca Passonneau and Massimo Poesio ``Temporal Centering'', Proceedings of the 31st Association for Computational Linguistics, Ohio State University, Columbus, 1993, pp 70-77.
    [59] Susan E. Brennan, Marilyn W. Friedman and Carl Pollard ``A centering approach to pronouns'', ACL Proceedings, 25th Annual Meeting, 1987, pp 155-162.
    [60] Megumi Kameyama ``A property-sharing constraint in centering'', ACL Proceedings, 24th Annual Meeting, 1986, pp 200-206.
    [61] Marilyn Walker, Masayo iida and Sharon Cote ``Japanese Discourse and the Process of Centering'', Computational Linguistics, Vol. 20, 2, 1994, pp 193-231.
    [62] Barbara J Grosz, Aravind K Joshi and Scott Weinstein ``Centering: A Framework for Modeling the Local Coherence of Discourse'', Computational Linguistics, MiT Press, Vol. 21, 2, 1995, pp 203-226.
    [63] Andrew Kehler ``Current Theories of Centering for Pronoun interpretation: A Critical Evaluation'', Computational Linguistics, Vol. 23, 3 , 1997.
    [64] S. Cote, M. iida and M. Walker ``Centering in Japanese Discourse'', COLiNG-90: Proceedings of the 13th international Conference on Computational Linguistics, Helsinki, Finland, Vol. 1, 1990, pp ?.
    [65] B. Di Eugenio ``Centering Theory and italian Pronouns'', COLiNG-90: Proceedings of the 13th international Conference on Computational Linguistics, Helsinki, Finland, Vol. 2, 1990, pp 270-276.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700