Google’s MapReduce programming model is based on following simple concepts:
(i) iteration over the input;
(ii) computation of key/value pairs from each piece of input;
(iii) grouping of all intermediate values by key;
(iv) iteration over the resulting groups;
(v) reduction of each group.
For instance, consider a repository of documents from a web crawl as input,
and a word-based index for web search as output, where the intermediate key/value
pairs are of the form .
The programmer may abstract from the issues of distributed and parallel programming because
it is the MapReduce implementation that takes care of load balancing, network performance,
fault tolerance, etc.