摘要二硫键是蛋白质的关键结构特性之一。准确定位二硫键有助于更好地理解蛋 白质的结构和功能。在后基因时代,大量结构与功能未知的蛋白质序列正在快速 累积,研发从蛋白质序列直接预测二硫键连接模式的方法具有重要意义。本研究 在传统的用于蛋白质二硫键预测的特征基础上,提出了从预测的蛋白质三维结构 中提取空间距离信息特征,以提高二硫键预测的性能;此外,还基于蛋白质图像 表示,使用特征选择技术来降低特征维数并消除冗余。在标准数据集上的交叉验 证和独立测试结果表明,所提的方法优于现有的基于序列的蛋白质二硫键预测工 具。68474
毕业论文关键词 蛋白质结构预测; 二硫键连接预测; 特征提取; 随机森林
Title Disulfide Connectivity Prediction based on Image Representation
Abstract
Disulfide connectivity is one of the most important protein structure characteristics. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. we also tried some feature selection method introduced from image processing field to remove redundant information. Based on the extracted features, random forest regression model is taken to perform protein disulfide connectivity prediction. We compared the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. Experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the new developed feature and the powerful modelling capability of the random forest.
Keywords Protein structure prediction; Disulfide connectivity prediction; Feature extraction; Random forest
目 次
1 绪论 1
1.1 研究背景及意义 1
1.2 研究现状 2
1.3 本研究概及本文内容安排 3
1.3.1 本研究概况 3
1.3.2 本文内容安排 4
2 特征提取 5
2.1 特征表示 5
2.2 特征选择 7
3 预测模型 10
3.1 回归模型的选择 10
3.2 工作流程 11
4 实验结果与分析 14
4.1 衡量指标