语言形式化原理
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
论文主要从语言学和计算机科学的视角,探讨语言形式化的一般原理和方法。除绪论外,论文的主体还包括语音形式化、语义形式化、语法形式化、语用修辞形式化、文字形式化等,共六章。各章的主要内容及观点归纳如下:
     第一章绪论重点探讨语言形式与意义的关系问题,指出形式联系意义既是语言学研究的根本原则,也是语言形式化研究的根本原则,它是贯穿全文的指导思想。本章还探讨了形式化研究的学科支持、其在语言学体系中的地位和作用,以及语言形式化的层次和基本架构等。
     第二章为语音形式化。首先探讨语音的三种属性及其内在联系,这是语音形式化的基础,也是设计各种语音编码方案及压缩方案的重要参考。语音形式化的基本过程是采样、量化、编码;利用语音属性的不同特点,可以采取不均匀量化、差分量化、矢量量化、频域波形编码、参数编码等手段,以提高语音形式化的效率和质量。本章还分别探讨了语音压缩、语音合成的自然度以及语音识别的概率模型等问题。
     第三章语义形式化是全文的重点。首先探讨符号主义范式的基本架构及工具,包括图灵机、有限状态自动机、正则表达式等;以及基于符号主义的几种代表性的语义形式化方法,包括义素分析、逻辑语义分析、语义格分析、词性分析等;这些形式化方法的效果都不理想,其根本原因在于忽视语义系统无限性这一本质属性,而任何对语义系统的有限化改写都将造成语义缺失,破坏其完整性,最终导致失败。
     与此相对,联结主义从人的自然生理结构出发,把人脑看成由众多节点联结而成的开放式关系网络,具有并行处理、容错、自学习、遗忘、规则浮现等特征,这与人脑中的概念网络结构十分相似,是词汇语义形式化的理想模型。计算机语言作为典型的符号主义描写工具,伴随其智能化处理能力的严重不足,业已表现出明显的联结主义转向。
     模糊性是语义形式化的另一基本问题。语言的模糊性非源于语言单位的有限性,也非源于客观世界的模糊性,它源于人脑对客观世界的认知方式,其中比较和概念化过程是模糊性产生的关键节点,而模糊性的产生反又促进了人脑认知效率的大幅提升。符号主义范式对语义进行有限化改写的过程中所摒弃主要内容正是模糊性,而联结主义范式可以实现对语义清晰与模糊的全覆盖。
     第四章讨论语法形式化。概念意义是明示的、开放的,语法意义是暗示的、封闭的,概念意义抽象为语法意义的过程,就是从明示的到暗示、从无限到有限的过程,它受到语言发展经济规律的制约。概念关系是多维的、普遍联系的,从深层概念结构到表层句法结构,是一个降维的线性化过程,语法就是作为多维信息损失的补偿机制而产生的。语法单位的有限性决定了其较词汇语义更易于形式化,符号主义范式可以胜任这一工作。
     本章还讨论了语法形式化的一些具体问题和难点,包括上下文无关语法及N元语法、词类划分、汉语的分词及词性标注等。最后作为示例探讨了“把”字结构,指出其句型意义为“不同类个体之间竞争关系的表达”,在此基础上给出其句法结构的语义构成,包括优势竞争者、劣势竞争者、竞争方式、竞争结果四项。
     第五章探讨语用修辞形式化,其基础是语境的形式化,包括参与者信息、客观环境、上下文、语言知识、常识性知识、社会文化背景知识等六类。基于实用性考虑,形式语境的构成不再区分语言性和知识性,而是影响意义表达和意义理解的一切因素的总和。本章用C++程序构建了一个基本的语境类,并讨论了该语境类在具体言语交际中的运作模式,虽然很不完善,却是一次全新的尝试。
     本章还讨论了一类特殊的修辞格——通感。通感既是五种感觉之间的相通,同时也是内省的情绪、情感之间的交融。通感与比喻、比拟等传统辞格具有相同的认知心理基础,都是处在心智连续统上的不同区域间的彼此联通,因此可以把它们共同纳入广义的通感范畴。心智连续统是辞格形式化的重要参考模型。
     最后一章是文字形式化。首先探讨文字的信息量——熵的概念,指出汉字的诸多特点包括字形复杂、数量庞大、区别度高、信息量大等,都与其高熵值密切相关。进一步观察,还可以发现隐藏在信息熵之下的语言共性,而词汇概念体系的复杂程度是衡量一种语言发达程度的根本标准。第二部分阐述文字形式化的具体内容,主要围绕文字的内码、外码和形码展开,包括各种主要的形式化方案和各自的优缺点。最后探讨文字识别的基本原理及实现。
The thesis studies the general principles and methods of the language formalization mainly from the perspective of linguistics and computer sciences. In addition to the introduction, the paper consists of six chapters, including formal phonetics, formal semantics, formal grammar, formal pragmatics and rhetoric, formal writing system, etc. The main content and views of each chapter are summarized below:
     The first chapter mainly probes into the relation between linguistic form and meaning which points out that the integration of form and meaning is the fundamental principle of linguistics and formal linguistics. It is the guideline of the thesis. This chapter also examines subject support of formal linguistics and its role in the linguistic system, level and basic framework of formal linguistics, etc.
     The second chapter is formal phonetics. It first studies three phonetic attributes and their internal relation which is the base of formal phonetics and the important reference of designing various phonetic coding scheme and compression scheme. The basic process of formal phonectics consists of sampling, quantization and coding.With the different characteristics of phonetic attrebutes, it can take means of nonuniform quantizatio, differential quantization, Vector Quantization, Frequency domain waveform coding, Parametric Coding,etc in order to improve efficiency and quality of formal phonetics. The chapter also studies voice compression, natualness of speech synthesis, probability model of speech recognition respectively.
     Chapter 3 studies formal sematics which is the main points in this thesis. It first probes into basic frame of symbolic paradigm and tools including Turing Machine, Finite-State Automaton, Regular Expression, etc and several typical symbolism-based methods of formal sematics including Semanteme Analysis, Logical Semantic Analysis, Parts of Speech, etc. However, these methods are all not desirable because they neglect the infinity of sematic system. Any limited rewritings of sematic system cause the loss of sematic, break its integrity and eventually lead to failure.
     On the other hand, connectionism is based on physiological structure and regards the human brain as a complex network of interrelated nodes which is characterized by parallel distributed processing, fault tolerance, self-learning, forgetting, rule emergence, etc. It is quite similar to the conceptual network of human brain and therefore it is the ideal model of formal sematics.As a typical tool for describing symbolism, computer language has shown the obvious turning to connectionism with its severe lack of the intelligent ability.
     Fuzziness is another basic problem of formal sematics. Fuzziness of language doesn't originate from finiteness of language’s units or fuzziness of the objective world. It comes from cognitive styles. The processes of comparison and conceptualization are the keys to the emergence of fuzziness which in turn improves the efficiency of cognition greatly. That symbolic paradigm rejects the main content in the process of limited rewriting of sematics is just the fuzziness while connectionist paradigm can wholly cover accuracy and fuzziness.
     Chapter 4 discusses formal grammar. Conceptual Meanings are explicit and open while Grammatical Meanings are implicit and closed. The process of abstraction of conceptural meanings to grammatical meanings is just from explicitness to implicity, from infinity to finiteness which is restricted by Economy Principle of language development. Conceptual relation is multidimensional and generally related.It is serialization of dimensionality reduction from deep Conceptual Structure to superficial syntactic structure.Grammar is the product of compensation mechanism of multidimensional information loss. The finiteness of grammar unit determines formal sematics. Symbolic paradigm is competent.
     The chapter also discusses some concrete problems and difficulties of formal grammar including context-free grammar & N-gram, Classification of Words, Chinese word segmentation, part-of-speech tagging.etc. Finally, take“ba”structure as an example which points out that the meaning of“ba”structure expresses the competitive relation among individuals of different kinds.Based on this example, semantic role of syntactic structure is presented including superior competitors, inferior competitors, competitive ways and competitive outcome.
     Chapter 5 probes into Formal pragmatics and rhetoric which is based on formal context including participant information, external environment, conversational context, linguistry, general knowledge, sociocultural knowledge,etc.Based on practical consideration, formation of formal context no longer distinguishes between linguistry and general knowledge and it is the combination of factors which affects the expression and understanding of meanings.This chapter discusses work pattern of the basic context class constructed by C++ programme in the concrete language communication. Although the discussion is not perfect, it is a new try.
     This chapter also discusses a special figure of speech-Synaesthesia. Synaesthesia is the transfer or empathy among the five senses and also the communication or fusion of introspective mood and emotion. There are no essential differences between the cognitive psychology of synaesthesia and such traditional figure of speech as metaphor, analogy. They are in the mental continuum between different regions, and thus can be connected together into their general synaesthesia category. Mental continuum is the important reference model of formal figure of speech.
     The last chapter discusses formal writing system. First it discusses the amount of information—the concept of entropy. It points out that many features of Chinese characters including complex font, great quantity, many differences, huge amounts of information are closely related to the high value of entropy. Linguistic Universalism hidden under entropy can be found when further observed. The complexity of lexical system is the fundamental standard for measuring the development levels of language.The second part of this chapter explains the concrete content of formal writing system.It focuses on Internal Code, external Code and graphemic Code including advantages and disavantages of various formal schemes. Finally it discusses the basic principles and realization of Character Recognition.
引文
[1] Austin, J. L. How to Do Things with Words [ M ]. Oxford: Oxford University Press, 1962.
    [2] Brown, P. & Levinson, S. Universals in language usage: Politeness phenomena. In Questions and Politeness: Strategies in Socail Interaction, ed. E. Goody [ M ]. Cambridge: Cambridge University Press, 1978.
    [3] Dreyfus, H. L. What Computers Still Can't Do [ M ]. The MIT Press, 1972.
    [4] Eckel, Bruce. Thinking in C++ [ M ]. Prentice Hall, 1999.
    [5] Elman, J. L. Finding St ructure in Time [ J ]. Cognitive Science, 1990(14).
    [6] Fauconnier, G. Mental Spaces: Aspects of Meaning Construction in Natural Language [M]. Cambridge: Cambridge University Press, 1994.
    [7] Izard, C. E. The Face of Emotion [ M ]. New York: Appleton Century Crofts, 1971.
    [8] Lakoff, G. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind [ M ]. Chicago: The University of Chicago Press, 1987.
    [9] Langacker, R. W. Foundations of cognitive grammar: Theoretical Prerequisites. Stanford, CA.: Stanford University Press, 1987.
    [10] Leech, G. Semantics [ M ]. Penguin, 1981.
    [11] Levinson, S. C. Pragmatics [ M ]. Cambridge: CUP, 1983.
    [12] Lyons, J. Semantics [ M ]. CUP, 1977.
    [13] Newell, A. & Simon, H. A. Computer Science as Empirical Inquiry: Symbols and Search [ J ]. Communications of the ACM, 1976(19).
    [14] Ogden, C & Richards, I. The Meaning of Meaning [ M ]. London: Routledge, 1946.
    [15] Rumelhart, D. E. & McClelland, J. L. Parallel Distributed Processing(Vols.1, 2) [ M ]. Cambridge, MA.: The MIT Press, 1986.
    [16] Searle, J. R. Minds, brains, and programmes [J]. Behavioral and Brain Sciences, 1980, (3): 417-57
    [17] Sejnowski, T. J. & Rosenberg, C. R. Parallel Networks that Learn to Pronounce English Text [ J ]. Complex Systems, 1987(1).
    [18] Shannon, C. E. Prediction and Entropy of Printed English [ J ]. The BELL System Technical Journal, 1951,Jan.
    [19] Sperber, D. & Wilson, D. Relevance: Communication and Cognition [ M ]. Oxford: Blackwell, 1986.
    [20] Verschueren, J. Understanding pragmatics [ M ]. London: Arnold, 1999.
    [21]边肇琪,张学工.模式识别[ M ].北京:清华大学出版社, 2002.
    [22]岑麒祥.国际音标[ M ].湖北:湖北人民出版社, 1982.
    [23]陈保亚.论语言符号的模糊与指称[ J ].思想战线, 1989, (4).
    [24]陈波.逻辑学导论[ M ].北京:中国人民大学出版社, 2003.
    [25]陈庆汉.通感格研究述评[ J ].修辞学习, 2002 (1).
    [26]陈望道.修辞学发凡[ M ].上海:复旦大学出版社, 2008.
    [27]陈维振,吴世雄,颜色词语义模糊性的原型描述[ J ].福建师范大学学报, 2002, (3).
    [28]陈维振,吴世雄.范畴与模糊语义研究[ M ].福建:福建人民出版社, 2002.
    [29]谌卫军,李建民,林福宗,张钹.汉语文语转换系统(TTS) [ J ].计算机工程与应用, 2000,(9)
    [30]初敏.自然言语的韵律组织中的不确定性及其在语音合成中的应用[ J ].中文信息学报, 2004,(4)
    [31]崔刚,姚平平.联结主义引论[ J ].外语与外语教学, 2006,(02).
    [32]崔希亮.“把”字句的若干句法语义问题[ J ].世界汉语教学, 1995(3).
    [33]董振东.语义关系的表达和知识系统的建造[ J ].语言文字应用, 1998,(3).
    [34]杜功焕,朱哲民等.声学基础[ M ].南京:南京大学出版社, 2001.
    [35]范晓.动词的配价与汉语的把字句[ J ].中国语文, 2001(4).
    [36]冯志伟.从格语法到框架网络[ J ].解放军外国语学院学报, 2006,29(3).
    [37]冯志伟.汉字的熵[ J ].语文建设, 1984,(4).
    [38]高清伦,谭月辉等.基于离散隐马尔科夫模型的语音识别技术[ J ].河北省科学院学报, 2007,(2).
    [39]葛鲁嘉.联结主义:认知过程的新解释和认知科学的新发展[ J ].心理科学, 1994, 17(4).
    [40]郭锐.现代汉语词类研究[ M ].北京:商务印书馆, 2004.
    [41]韩纪庆,冯涛等.音频信息处理技术[M].北京:清华大学出版社, 2007.
    [42]何兆熊.新编语用学概要[ M ].上海:上海外语教育出版社, 2000.
    [43]何自然.语用三论:关联论·模因论·顺应论[M].上海:上海教育出版社, 2007.
    [44]胡范铸.钱钟书的《通感》与陈望道的《官能底交错》[ J ].华东师范大学学报, 1998(5).
    [45]胡航.语音信号处理[ M ].哈尔滨:哈尔滨工业大学出版社, 2005.
    [46]胡明扬.汉语词类兼类研究[ J ].语言文字应用, 2000,(1).
    [47]胡文泽.也谈“把”字句的语法意义[ J ].语言研究, 2005,(2).
    [48]胡裕树.现代汉语[ M ].上海:上海教育出版社, 1995.
    [49]黄昌宁.中文信息处理中的分词问题[ J ].语言文字应用, 1997(1).
    [50]贾林祥.试论新联结主义的方法论[ J ].南京师范大学学报社科版, 2004,(2).
    [51]金瑜,陆启明,高峰.基于上下文相关的最大概率汉语自动分词算法[ J ].计算机工程, 2004,(16).
    [52]孔令达,王葆华.汉语词类研究的回顾与展望——纪念汉语词类问题大讨论50周年专家座谈会纪要[ J ].汉语学习, 2005,(4).
    [53]李平.语言习得的联结主义模式[ J ].当代语言学, 2002,(3).
    [54]林颖,史晓东,郭锋.一种基于概率上下文无关文法的汉语句法分析[ J ].中文信息学报, 2006,(2).
    [55]刘谧辰.义素分析综述[ J ].外国语, 1988,(02).
    [56]刘润清.西方语言学流派[ M ].北京:外语教学与研究出版社, 2002:8-9
    [57]刘源.信息处理用现代汉语分词规范及自动分词方法[ M ].北京:清华大学出版社, 1994.
    [58]吕叔湘.现代汉语八百词[ M ].北京:商务印书馆, 1999.
    [59]罗素.论模糊性[ J ].模糊系统与数学, 1990, (1).
    [60]马大猷.现代声学理论基础[ M ].北京:科学出版社, 2001.
    [61]钱钟书.通感[ J ].文学评论, 1962 (1).
    [62]秦旭卿.论通感——兼论修辞的心理基础[ A ].复旦大学语言教研室.《修辞学发凡》与中国修辞学[ C ].上海:复旦大学出版社, 1983.315-333
    [63]沈达阳,孙茂松,黄昌宁.基于统计的汉语分词模型及实现方法[ J ].中文信息, 1998,(Z1).
    [64]沈家煊. langacker的认知语法[ J ].国外语言学, 1994(01).
    [65]沈家煊.人工智能中的“联结主义”和语法理论[ J ].外国语, 2004,(3).
    [66]沈家煊.我看汉语的词类[ J ].语言科学, 2009,(1).
    [67]施春宏.从句式群看“把”字句及相关句式的语法意义[ J ].世界汉语教学, 2010,(3).
    [68]石安石.语义研究[ M ].语文出版社,北京:1998.
    [69]史磊,吕强. TrueType字形描述技术和TTF文件[ J ].中文信息, 1995,(5).
    [70]舒忠梅等.浅析TrueType中的Hinting原理及相关技术[ J ].计算机应用研究, 1999,(7).
    [71]宋佳.模式识别综述及汉字识别原理[ J ].科技广场, 2007,(9).
    [72]宋柔.关于分词规范的探讨[ J ].语言文字应用, 1997(3).
    [73]陶建华,蔡莲红.基于音节韵律特征分类的汉语语音合成中韵律模型的研究[ J ].声学学报(中文版), 2003,(5) .
    [74]汪少华.通感·联想·认知[ J ].现代外语, 2002(2).
    [75]王德春,陈晨.现代修辞学[ M ].上海:上海外语教育出版社, 2001.
    [76]王德春.论隐喻[ J ].外语学刊, 2009, (1).
    [77]王德春.论语言学的建构性循环网络[ J ].外语研究, 2009,(05).
    [78]王德春.论语义与认知[ J ].外语电化教学, 2009, (5).
    [79]王德春.语言学概论[ M ].上海:上海外语教育出版社, 1997.
    [80]王红旗.“把”字句的意义究竟是什么[ J ].语文研究, 2003(2).
    [81]王进宝,刘正刚.曲线矢量数据压缩算法实现及评述[ J ].测绘与空间地理信息, 2006, 29(2).
    [82]王晓龙,关毅.计算机自然语言处理[ M ].北京:清华大学出版社, 2005.
    [83]文勖,张宇,刘挺,马金山,基于句法结构分析的中文问题分类[ J ].中文信息学报, 2006,(2) .
    [84]吴家安.现代语音编码技术[ M ].北京:科学出版社, 2008.
    [85]吴士文.修辞格论析[ M ].上海:上海教育出版社, 1986.
    [86]吴义坚,王仁华.基于HMM的可训练中文语音合成[ J ].中文信息学报, 2006,(4)
    [87]吴志勇,蔡莲红.语音合成中的韵律关联模型[ J ].中文信息学报, 2004,(2)
    [88]伍铁平.模糊语言学[ M ].上海:上海外语教育出版社, 1999.
    [89]肖云,孙茂松.利用上下文信息解决汉语自动分词中的组合型歧义[ J ].计算机工程与应用, 2001(19).
    [90]许慎.说文解字[ M ].(东汉)
    [91]颜红菊.语义场理论的认知拓展[J].求索, 2007,(4).
    [92]杨尔弘,方莹,刘冬明,乔羽,汉语自动分词和词性标注评测[ J ].中文信息学报, 2006,(1).
    [93]杨善茜,黄汉明,蒋正锋,李锐.基于HTK的语音识别网络优化算法[ J ].计算机工程, 2010,(14)
    [94]姚天顺,张俐,高竹. WordNet综述[ J ].语言文字应用, 2001,(1).
    [95]佚名.图灵机[ J ].计算机工程与应用, 1977,(Z2).
    [96]于晓明,柏松.基于前向-后向HMM的连续语音识别系统的研究[ J ].计算机工程与设计, 2009,(18)
    [97]俞士汶,朱学锋,王惠,张芸芸.现代汉语语法信息词典规格说明书[ J ].中文信息学报, 1996,(2).
    [98]俞士汶,朱学锋,王惠.《现代汉语语法信息词典》的新进展[ J ].中文信息学报, 2001,(1).
    [99]袁毓林.对“词类是表述功能类”的质疑[ J ].汉语学报, 2006,(3)
    [100]袁毓林.关于等价功能和词类划分的标准[ J ].语文研究, 2006,(3).
    [101]翟伟斌,周振柳,蒋卓明,许榕生.汉语分词词典设计[ J ].计算机工程与应用, 2007,(1).
    [102]张伯江.论“把”字句的句式语义[ J ].语言研究, 2000(1).
    [103]张万有.义素分析略说[ J ].语言教学与研究, 2001,(1).
    [104]张志毅.词的理据[ J ].语言教学与研究, 1990,(3).
    [105]章婷.试论中国模糊语义学研究现状[ J ].外语研究, 2005, (3).
    [106]赵博,蔡莲红.合成语音自然度客观测度[ J ].计算机工程与应用, 2005,(7)
    [107]赵元任.汉语的歧义问题[ A ].北大中文系.语言学论丛第十五辑[ C ].北京:商务印书馆, 1988.
    [108]朱德熙.语法讲义[ M ].北京:商务印书馆, 1992.
    [109]朱维彬.支持重音合成的汉语语音合成系统[ J ].中文信息学报, 2007,(3)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700