适合大数据分析MLCS算法的设计与实现
时间:2023-01-15 11:16 来源:毕业论文 作者:毕业论文 点击:次
摘 要寻找多个字符序列的最长公共子序列是一个NP难问题,其中,求解两个字符串中的最长公共子序列LCS(Longest Common Subsequence,LCS)是求解多个字符串中的最长公共子序列MLCS(Multiple Longest Common Subsequence,MLCS)问题的一种特殊情况,它被广泛的应用于生物信息和基因计算等领域,是相关领域的基础性工作。尽管前人为此付出很大的努力解决该问题或其特殊情况,但随着数据量的急剧增长,字符序列的数量和长度不断增长,传统的算法效率较低,需要研究一种更为高效,可靠的方法来处理该问题。本文首先研究传统的LCS/MLCS算法,主要是动态规划方法和基于支配点的方法,以及最新的LCS/MLCS算法,如FAST_LCS算法和Quick_DP算法,并对其做出一定的改进,使用Java语言实现最新的算法和改进的算法,在主流平台上测试,并对其进行分析,提出进一步优化的方向。87037 毕业论文关键词:MLCS;动态规划;支配点;并行计算 Abstract Finding the longest common subsequence of a sequence consisted of characters is a NP hard problem, the solution of two strings of the longest common sub sequence LCS (longest common subsequence LCS) is a special case of the solution of a string of the longest common sub sequence MLCS of several strings (multiple longest common subsequence, MLCS), which is been widely used in the fields of biological information and gene computing, and it is the basic work in related fields。 Although previous paid great effort to solve the problem or the special circumstances, with a sharp increase in the amount of data, the continuous growth of character sequence number and length and the low efficiency of the traditional algorithm, we need to study a more efficient and reliable method to deal with the problem。 First of all, this paper studies the traditional LCS/MLCS algorithm, mainly is dynamic programming method and method based on control point and the latest LCS/MLCS algorithm such as FAST_LCS algorithm and Quick_DP algorithm, and try my best to make some improvements。 Finally, I use Java language to realize the FAST_LCS, Quick_DP algorithm and improved algorithm tested in the mainstream platform test and further optimization method is proposed in this paper。 Keywords: LCS; Dynamic Programming; Control Point; Parallel Computing 目 录 第一章 引言 1 1。1 研究背景和意义 1 1。3 本文工作 2 1。4 本文组织结构 2 第二章 LCS/MLCS问题的概述 3 2。1 LCS/MLCS问题的描述 3 2。2 传统算法概述 3 2。2。1 基于动态规划的算法 3 2。2。2 基于支配点的算法 4 2。2。3 并行算法 6 第三章 LCS/MLCS问题的算法 7 3。1 FAST_LCS 7 3。1。1 匹配点和它的后继表 7 3。1。2 裁剪操作 9 3。1。3 FAST_LCS的框架和复杂度分析 10 3。1。4 结论 15 3。2 Quick_DP 15 3。2。1 串行算法 15 3。2。2 并行算法 18 3。3 MLCS算法的优化 (责任编辑:qin) |