MapReduce Wiki - Software Framework

MapReduce Wiki - Software Framework


Description of MapReduce Wiki

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

MapReduce Software Framework

A MapReduce framework is usually composed of 3 operations :

  • Map : each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of redundant input data is processed.
  • Shuffle : worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to one key is located on the same worker node.
  • Reduce: worker nodes now process each group of output data, per key, in parallel.

Trivia :

  • MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, Singular Value Decomposition, document clustering and machine learning.
  • At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that updated the index and ran the various analyses.
  • MapReduce model has been adapted to several computing environments like multi-core and many-core systems, desktop grids, multi-cluster, mobile environments, and high-performance computing environments.

MapReduce Wiki < Software Framework

Next Post Previous Post