伪核酸特征的DNA甲基化识别方法研究

摘要DNA 甲基化主要发生在胞嘧啶上，细胞通过甲基化这一过程修饰其 DNA，以达到修改基因表达的目的。不单单是在生命发展过程中，DNA 甲基化在几乎所有类型的癌症的形成中也都扮演了非常重要的角色。因而 DNA 甲基化的相关知识不仅仅是对基础研究，对药物发展也具有重要意义。给定一条包含大量胞嘧啶残留的未识别 DNA 序列，如何对其中的潜在甲基化位点进行识别预测是当今研究的热点。80371

在后基因组时代的背景下，发展能够精确识别 DNA 序列中甲基化位点的计算方法势在必行。本文旨在开发出更加高效、准确的预测方法。本课题基于 DNA 序列本身以及二核苷酸的物理化学属性，利用伪二核苷酸组成成分特征、自协方差和互协方差组合的方法，分别对 DNA 序列进行特征提取，再利用 SVM 分类器对 DNA 甲基化位点进行预测、识别。最后，严格的 Jackknife 测试实验表明本文方法在 Acc 和 MCC 这两个全局评价指标上比现有的预测器都有一定的提高。

毕业论文关键词：生物信息学；DNA 甲基化；特征提取；伪核苷酸特征

ABSTRACT Mainly occurring on cytosine, DNA methylation is a process by which cells modify their DNAs to change the expression of genes。 Not only in life development, it also plays a significant role in nearly all types of cancer's formation。 Therefore, knowledge of DNA methylation sites is indispensability for drug development as well as basic biological research。 Given an uncharacterized DNA sequence which contains numerous cytosine residues, how to identify potential methylation sites in the sequence is a hot issue in today's research。

In the background of postgenomic age, it is highly desired to develop computational methods for accurately and efficiently identifying the methylation sites in DNA。 This study aims to develop a more efficient and accurate identification method。 In this study, based on the DNA sequence and physicochemical properties of dinucleotide, features are extracted from DNA sequence by using pseudo dinucleotide composition and auto-covariance and cross-covariance combination, which can be used to identify the DNA methylation sites by SVM classification program。 And rigorous jackknife test shows that our prediction program has higher Acc and MCC than other existing predictors。

Key words: bioinformatics; DNA methylation; feature extraction; pseudo nucleotide composition

第 1 章绪论 1

1。1 研究背景 1

1。2 研究现状 1

1。3 本文研究内容 3

第 2 章生物识别方法综述 4

2。1 生物识别的一般流程 4

2。2 数据集构建和特征提取 5

2。3 分类器 5

2。4 测试评价 6

第 3 章 DNA 序列特征编码 8

3。1 引言 8

3。2 基准数据集 9

3。3 基于统计特征的特征表示方法 10

3。4 基于物理化学属性的特征表示方法 12

3。4。1 物理化学属性矩阵 12

3。4。2 PseDNC 方法

上一篇：HOG多特征的行人检测AdaBoost分类算法

下一篇：Node.js在线判题自动阅卷系统设计与实现

伪核酸特征的DNA甲基化识别方法研究

基于消费者个性特征的化...

最小二乘法生物数字特征间的关系

基于颜色特征的图像检索系统研究

HTML5伪云桌面资源整合系统设计

粗糙集的特征选择及其分...

PseDNC特征的RNA甲基化识别研究

基于颜色特征的多肉图像检索

互联网教育”变革路径研究进展【7972字】

我国风险投资的发展现状问题及对策分析

老年2型糖尿病患者运动疗...

ASP.net+sqlserver企业设备管理系统设计与开发

新課改下小學语文洧效阅...

网络语言“XX体”研究

麦秸秆还田和沼液灌溉对...

安康汉江网讯

张洁小说《无字》中的女性意识

LiMn1-xFexPO4正极材料合成及充放电性能研究