Summary  Bioinformaticians are tackling increasingly computation-intensive tasks. In the meantime, workstations are shifting towards multicore architectures and even massively multicore may be the norm soon. Bag-of-Tasks (BoT) applications are commonly encountered in bioinformatics. They consist of a large number of independent computation-intensive tasks. This note introduces PAR, a scalable, dynamic, parallel and distributed execution engine for Bag-of-Tasks. PAR is aimed at multicore architectures and small clusters. Accelerations obtained thanks to PAR on two different applications are shown.68728

Availability: PAR is released under the GNU General Public License version three and can be freely downloaded .

1 Introduction

      Bioinformaticians are significant high-performance computing users, in particular for simulations of biologic phenomena. On the other hand, the available hardware is getting faster but also much more parallelized (Intel publicly reported working on 80 cores prototype chips in 2007). In this context, most bioinformaticians could benefit from an easy-to-use software to harness such computing power.

      The focus of this note is Bag-of-Tasks (BoT) applications execu-tion. As the name suggests, BoT applications can be seen as a bag, filled with tasks to do, each being independent from all the others. A middle-ware for BoT applications is called a job crusher. It has to consist of at least a server component connected to a set of clients.

      This note introduces PAR, a parallel and distributed job crus-her working in pull mode and inspired by desktop grid platforms. Workers join the computation and can be added dynamically at run-time; the server delivers tasks to workers available at a given moment. PAR is actually a transposition of some concepts and fea-tures from previous distributed middle-ware to small HPC clusters and multi-core workstations.

       This paper is organized as follows: Section 2 presents an overview of related projects and technologies used in bioinformatics. Section 3 presents two examples using PAR to illustrate scalability. The last section lists upcoming enhancements.

2 Related projects

        A wide variety of tools and technologies have been used over the last two decades in bioinformatics. While PAR is a user-level tool with its own niche, it has some limitations. At the cost of a little more complexity, some of the tools listed hereafter allow fair share of resources, stronger reliability and even faster job or data throughput.

        At the programming level, the Message-Passing Interface (MPI, Forum (1994)), CORBA (Object Management Group (1998)) or even MapReduce (Dean and Ghemawat (2004)) are noteworthy technology candidates. 

        MPI has become the defacto standard for programming highly parallel applications. It has been used in computational genomics (Swain et al. (2005)) and in molecular dynamics (Johnston et al. (2005); de Lomana et al. (2008)).

        For applications following a client-server model, CORBA can be used. Handling of genome maps has successful examples (Huetal. (1998), Jungfer and Rodriguez-Tome´ (1998)). 

        For data-intensive applications, MapReduce and its open source implementation Hadoop2 are more appropriate. They unleash operations over huge amounts of data and were used recently in sequence alignment (Sadasivam and Baktavatchalam (2010)).

        However, at the application level, Desktop Grids (DG) are closer to the focus of this note. A server distributes tasks to workers located on machines that do not communicate with each other, potentially anywhere on the Internet. Condor (Litzkowetal. (1988)), XtremWeb (Fedaketal. (2001)) and BOINC (Anderson (2004)) are three platforms for highly parallel, multiuser applications. One of the best-known DG project in bioinformatics is probably Folding@home (Bebergetal. (2009)).

上一篇:移动破碎英文文献和中文翻译
下一篇:风力发电技术英文文献和中文翻译

数控机床制造过程的碳排...

新的数控车床加工机制英文文献和中文翻译

抗震性能的无粘结后张法...

锈蚀钢筋的力学性能英文文献和中文翻译

未加筋的低屈服点钢板剪...

台湾绿色B建筑节水措施英文文献和中文翻译

汽车内燃机连杆载荷和应...

网络语言“XX体”研究

麦秸秆还田和沼液灌溉对...

老年2型糖尿病患者运动疗...

我国风险投资的发展现状问题及对策分析

互联网教育”变革路径研究进展【7972字】

新課改下小學语文洧效阅...

安康汉江网讯

ASP.net+sqlserver企业设备管理系统设计与开发

LiMn1-xFexPO4正极材料合成及充放电性能研究

张洁小说《无字》中的女性意识