摘要RNA甲基化是指发生在RNA分子上不同位置的甲基化修饰现象。腺嘌呤第6位氮原子上的甲基化修饰(N6-methyladenosine, m6A)是高等生物mRNA中最丰富、最重要的转录后修饰之一。研究表明这种m6A甲基化对生物体的昼夜节律、细胞分裂和胚胎干细胞增殖有重要的调节作用,并与肥胖、不孕不育、癌症等疾病的成因密切相关。随着后基因时代的来临和高通量技术的发展,海量的未被识别的RNA序列不断涌现,采用生物学湿实验的方法进行实验检测需要耗费大量的时间、人力和金钱等。因此,迫切需要开发出一种高效的基于智能计算的RNA甲基化位点识别方法来加速生物信息学的研究进程。87100

针对上述问题,本文设计并实现了一种RNA序列上m6A甲基化位点识别预测器。在特征抽取上,提出一种新颖的核酸统计特征和物化属性特征相融合的特征表示方法,其中RNA序列样本的统计特征采用核酸组成成分和核酸位置特异性方法进行抽取,物理化学属性特征采用自相关系数、互相关系数和伪核酸组成成分方法进行抽取。在分类器设计上,采用SVM分类器来构建预测模型,通过10重交叉验证方法优化预测模型的参数。在预测模型的性能评价上,采用严格的Jackknife测试通过基础数据集进行性能评测。实验结果表明,本文提出的方法与最新研究成果相比在预测器的总体评价指标Acc和Mcc都有显著的提高,这也进一步验证了本文方法的有效性。

毕业论文关键词:特征提取;RNA序列;甲基化;支持向量机

Abstract RNA methylation refers to the methylation modification of different locations on the RNA molecule。 N6-methyladenosine, m6A is one of the most abundant and most important post transcriptional modification in the advanced biological mRNA。 Studies have indicated that the m6A methylation has important effects on the circadian rhythm, cell pision and proliferation of embryonic stem cells, and is closely related to the causes of obesity, infertility, cancer and other diseases。 With the advent of post genome era and the development of high throughput technology, vast amounts of unrecognized RNA sequences is ceaseless emerging。 The wet experimental methods of biological experiment spend a lot of time, manpower and money in identifying m6A methylation。 Therefore, it is urgent to develop an efficient method based on intelligent computing for the identification of RNA methylation sites to accelerate the process of bioinformatics research。

Aiming at the above problems, this paper designs and implements a m6A methylation site identification predictor on RNA sequence。 In feature extraction, a novel feature representation method is proposed, which is based on the combination of statistical features of nucleic acids and physicochemical properties features。 The statistical features of RNA sequences samples were extracted by the method of nucleic acid composition and position specificity。 The physical and chemical properties were extracted by the method of auto-covariance, cross-covariance and pseudo nucleic acid composition。 In the design of the classifier, the SVM classifier is used to construct the prediction model, and the parameters of the prediction model are optimized by the 10 cross validation method。 In the performance evaluation of the prediction model, the performance evaluation is carried out by using the strict Jackknife test。 Experimental result shows that, the proposed method are improved significantly in the predictor of overall evaluation index Acc and Mcc when comparing with the latest research results。 This further verifies the validity of the method。 

Keywords: Feature Extraction; RNA Sequences; Methylation; Supporting Vector Machine 

目   录

第一章 绪论 1

1。1 研究背景及意义

上一篇:N阶行列式计算方法总结与应用探讨
下一篇:最小二乘问题的几种数值解法

浅谈中学数学函数最值问题的求解方法

基于决策树算法的篮球联赛预测

数形结合在中学数学中的...

浙江省工业企业发展的因子分析

中美小学数学课堂教学的比较

杭州历年中考三角形的题型分析

论数形结合在中学数学教育中的应用

网络语言“XX体”研究

老年2型糖尿病患者运动疗...

我国风险投资的发展现状问题及对策分析

麦秸秆还田和沼液灌溉对...

LiMn1-xFexPO4正极材料合成及充放电性能研究

ASP.net+sqlserver企业设备管理系统设计与开发

张洁小说《无字》中的女性意识

互联网教育”变革路径研究进展【7972字】

安康汉江网讯

新課改下小學语文洧效阅...