Wednesday, September 25, 2013

MapReduce - Distributed Programming Model

Google’s MapReduce programming model is based on following simple concepts: (i) iteration over the input; (ii) computation of key/value pairs from each piece of input; (iii) grouping of all intermediate values by key; (iv) iteration over the resulting groups; (v) reduction of each group. For instance, consider a repository of documents from a web crawl as input, and a word-based index for web search as output, where the intermediate key/value pairs are of the form . The programmer may abstract from the issues of distributed and parallel programming because it is the MapReduce implementation that takes care of load balancing, network performance, fault tolerance, etc.

No comments:

Post a Comment