Wednesday, September 25, 2013

MapReduce - Distributed Programming Model

Google’s MapReduce programming model is based on following simple concepts: (i) iteration over the input; (ii) computation of key/value pairs from each piece of input; (iii) grouping of all intermediate values by key; (iv) iteration over the resulting groups; (v) reduction of each group. For instance, consider a repository of documents from a web crawl as input, and a word-based index for web search as output, where the intermediate key/value pairs are of the form . The programmer may abstract from the issues of distributed and parallel programming because it is the MapReduce implementation that takes care of load balancing, network performance, fault tolerance, etc.