菜单
  
    UNDERSTANDING THE MAP-REDUCE PROGRAMMING MODEL AND SYSTEM When talking about Map-Reduce, one may mean the Map-Reduce programming model (the narrow sense), or the Map-Reduce system (the wider sense) that provides the support for the programming model, as well as a rich set of system services and advanced features. Understanding the programming model is sufficient to write a semantically correct program. However, from our experiences, in order to write a program that performs well on a datacenter-scale system, it is often necessary to understand the map-reduce system. In the following, we describe the programming model (Section 2.1), system services (Section 2.2), and advanced features (Section 2.3), based on the original Map-Reduce paper [5] and the open source Hadoop implementation [11].2.1 Programming Model Figure 1 illustrates the Map-Reduce programming model using a word counting example. A programmer implements two functions, a Map function and a Reduce function. Their semantics are shown at the top of Figure 1. Conceptually, the input to the Map-Reduce computation consists of a list of (in_key, in_value) pairs. Each Map function call takes a pair of (in_key, in_value) as input and produces zero or more pairs of intermediate key-value, (mid_key, mid_value). Then, the system automatically performs a group-by operation on the intermediate mid_key. After that, each Reduce function call processes a group, i.e. a mid_key and a list of mid_values, and produces zero or more output results. In the word counting example shown in Figure 1, the Map function generates a (word, 1) as an intermediate result for each encoun-tered word, where mid_value = 1. The Reduce function simply sums up the list of mid_values to obtain the word count.39185
    System Services The system provides three categories of services to facilitate the running of a Map-Reduce program across a large number of machines, as shown in Figure 2. (i) Distributed file system: the input and output are stored in a distributed file system to be globally accessible. (ii) Automatically distributing tasks: the system instantiates. Map and Reduce tasks across a large number of machines, partitions and assigns the input to inpidual Map tasks, then coordinates the copying of intermediate results (stored on local disks) from the machines running Map tasks to machines running Reduce tasks. A typical optimization is to co-locate a Map task to a machine (or rack) that also holds a copy of the assigned input partition in the distributed file system. (iii) Fault tolerance: the system monitors the task executions and re-executes tasks if they fail (due to machine crashes etc.). Since the input to a Reduce task may come from any of the Map tasks, it is necessary to wait for all Map tasks to finish before launching Reduce tasks. Near the end of the Map or the Reduce phase, the system schedules backup executions of the remaining in-progress tasks to protect against “stragglers”, which are machines taking anunusually long time to complete tasks due to permanent or transient hardware or software problems.
    Advanced Features Figure 2 shows the many advanced components surrounding and supporting the Mapper and the Reducer computations. Typically,the default implementations of these components can be replaced with customized implementations. InputFormat extracts the input record (in_key, in_value) from the input (default: extracting a line from a text file). This gives part of the functionality of a database access method. Partitioner computes a Reduce task ID from mid_key, determining which local file to append an intermediate result (default: hash code(mid_key) modulo R). One may perform range based partitioning instead. The local files containing intermediate results are copied to the machine running the corresponding Reduce task. Then Sorter sorts the intermediate results to complete the group-by operation (default: merge sort). After the reduce computation, OutputFormat formats output results for writing to output files (default: out_key tab out_value per line).The network traffic of copying can be reduc by implementing Combiner, which is a partial reduce function, to be called on the intermediate results local on inpidual Map task machines. This is an optimization for reduce operations that are associative and commutative. For example, in the above word counting example, we can use the Reduce function as the Combiner.
  1. 上一篇:java异常错误处理英文文献和中文翻译
  2. 下一篇:评价推荐系统英文文献和中文翻译
  1. 汽车内燃机连杆载荷和应...

  2. 审计的优化管理英文文献和中文翻译

  3. RANSAC算法全景图像拼接关键技术研究+源程序

  4. FPGA的全景拼接相机的优化...

  5. 气味源定位的有限时间粒...

  6. PLC仿真的虚拟工厂英文文献和中文翻译

  7. ZigBee-RFID混合网络的节电英文文献和中文翻译

  8. NFC协议物理层的软件实现+文献综述

  9. C++最短路径算法研究和程序设计

  10. 高警觉工作人群的元情绪...

  11. 浅析中国古代宗法制度

  12. 上市公司股权结构对经营绩效的影响研究

  13. g-C3N4光催化剂的制备和光催化性能研究

  14. 中国传统元素在游戏角色...

  15. 巴金《激流三部曲》高觉新的悲剧命运

  16. 江苏省某高中学生体质现状的调查研究

  17. 现代简约美式风格在室内家装中的运用

  

About

优尔论文网手机版...

主页:http://www.youerw.com

关闭返回