4 The research design

The tagged words of the SWECCL

The tagging is a method making human language processable by the machine。 The test we human can easily read is the raw text, which is not suitable for the computer to analyze。 The most common tagging method is the “word_tag” mode。 Here we adopt the CLAWS 4 Tagging Collection to tag the raw text。 After the tagging, the tagged text can be processed by the software Colligator developed by Beijing Foreign Studies University professor (Liang Maocheng: 2008), which can analyze the tagged text prepared in advanced。 

The regular expression

In order to sort as many as possible qualified results out of the corpus。 We should seek for some items which can cover many various kinds of conditions of our research aim。

In order to do that, we should firstly set a list of all the colligation items we are looking for。 Here we mainly focus on 7 different colligations of the infinitive particle and what precedes it。 They are listed as follows:来,自,优.尔:论;文*网www.youerw.com +QQ752018766-

infinitive as subject preceded by a stop period mark (IAS)

infinitive as direct object preceded by verb (VI)

infinitive preceded by noun(NI)

infinitive preceded by adjective(AI)

infinitive preceded by present ”ING” participle (INGI)

infinitive preceded by past “ED” particle(EDI)

infinitive preceded by adverb(ADI)

The reason why this research chooses this seven colligation items is that the preceding part of the TO infinitive comes from the major lexical categories in linguistics: noun, verb, adjective and adverb。 They take up a majority of the total number of the English vocabularies; according to the math principle the possible combination result is the most various。 And for the convenience of referring to the items above, this article will use the abbreviation in the parentheses。 

In addition, we have to use the regular expression to retrieve targeted information out of the corpus。 “The regular expression is a kind of special character string which is applied to describing and matching string with same or similar property。” (Jurafsky &Martin: 2009) According to the tagged expression and the rules (Liang Maocheng: 2009) stated in forming it, we can turn some our human language grammatical devices into regular expression which can be interpreted by the machine。 The 7 abbreviation mentioned above can be rewritten by the regular expression。 Here listed as follows:(for more details, please check the appendix)

上一篇:夏洛特·帕金斯·吉尔曼《黄色墙纸》的叙事分析
下一篇:戴维·赫伯特·劳伦斯《儿子与情人》中灵与肉的不平衡

大学英语教材分析及理论框架应用

浙江省英语师范生课堂指示语的研究

汉英语言中狗习语文化对比分析

中国诗歌和西方民谣的柔...

朝鲜语论文崔致远与中国...

中美英语口语类慕课多模态对比分析

影响初中英语课堂任务设计的因素分析

FeTiMn尖晶石协同控制燃煤...

视觉辨识技术的视频监控...

大学生网络成瘾与品行问题倾向的关系研究

公立医院财务管理及财务...

2023开放三胎政策,中國三...

论《人间喜剧》的“金钱”主题

企业中女性管理者职业发展的障碍及对策

基于DirectX技术的3D游戏Demo设计与实现

微探联通主义观照下慕課...

功率因数校正技术研究现状和发展趋势