基于CDTB语料库的篇章关系标注研究_毕业论文

毕业论文移动版

毕业论文 > 计算机论文 >

基于CDTB语料库的篇章关系标注研究

摘要宏观篇章关系作为自然语言处理方向的重点和难点,已经成为当前最为活跃的研究内容之一。目前,英文方面对于篇章关系的研究较为成熟,中文方面篇章关系起步较晚,宏观篇章关系暂时研究较少,因此研究宏观篇章关系具有较为重要的意义和作用。87126

本文主要研究宏观篇章关系的表示体系、标注方式与方法、构建标注平台。具体内容包括以下三个方面:

第一,本文针对中文语言特点,充分结合RST、PDTB等国内外经典语料库的优点,提出适用于中文的宏观篇章关系表示体系,把篇章分为段落主题、段落关系、篇章主题三个层次。本文针对段落关系和宏观表示做了深入的研究。

第二,本文提出了宏观篇章关系的标注的具体方式和方法,并对标注格式作出规范。该方法采用人工标注的标注方式,自顶向下与自下而上相结合进行结构树构建,并对篇章主题、段落主次及关系、段落主题等方面进行了相关的标注。

第三,本文构建了宏观篇章关系的标注平台,为标注人员提供了三种标注模式,加快了篇章段落关系结构树的生成速度。实验证明,该标注平台能够显著增加标注人员标注速度,提高标注效率。

本文针对宏观篇章关系进行了相关的研究和探索,其宏观篇章关系表示体系和标注方法及标注平台,对于今后宏观篇章关系的进一步研究起到了促进作用。

毕业论文关键词:宏观篇章关系;标注方法;标注平台;可视化

Abstract As one of the most important and difficult points in the direction of Natural Language Processing, the macro chapter has become one of the most active research contents。 At present, English for discourse relation of the more mature, Chinese discourse relations started late, macro discourse relation temporarily less research。 Therefore, the study of macro discourse relation has more important significance and role。

In this paper, we mainly study the representation system, annotation methods and methods of macro text relations, and construct the annotation platform。 The specific content includes the following three aspects:

First, this paper according to the characteristics of Chinese language, fully integrated advantages of RST, PDTB domestic and foreign classic corpus, is proposed on the Chinese macro discourse relation representation system, the text is pided into three levels of paragraphs, paragraphs relations and discourse theme。 This paper makes a deep research on the relationship between the paragraphs and the macro representation。

Second, this paper puts forward the way of annotation and annotation of macro text relations。 In this method, the method is used to construct the structure tree with the combination of top-down and bottom-up, and the main topic of the text, the main primary and secondary, and the relationship between the main and the other aspects of the relationship, paragraph theme and so on。

Third, this paper constructs the annotation platform of the macro text relation, which provides three annotation modes for the annotation personnel, and speeds up the construction speed of the tree。 Experimental results show that the platform can significantly increase the speed of the label, and improve the efficiency。

The related research and exploration to the relationship between the macro text are presented in the paper。 The macro discourse relation said system and method of marking and tagging platform, for the further study in the future macro discourse relation to promote。

Keywords: Macro Text Relation; Annotation Method; Annotation Platform; Visual

目录

第一章  绪论 1

1。1 研究背景及意义 1 (责任编辑:qin)