Sunday, 6 November 2011

MapReduce in Python

Over the last while I've been using NodeJS, MongoDB and Google's App Engine. Everything seems to have MapReduce functionality and frankly I was missing it when I went back to do some work in plain old Python. This is a nice short example of how it can be done.

The basic format that you need is something like this:

Now, a simple working example using the usual MapReduce example:

The output will look something like this:


  1. Hey there!

    While your multiprocessing usage is nice, it seems to me that you missed the point of MapReduce in your example because most of the work actually happens in partition().

    A more classic example would look something like this:

    There you have proper partitioning BEFORE the mapping happens, solving sub-problems in the map phase and finally reducing sub-results to a single aggregate result.

  2. Is there a way to reduce the part of the partition part?

  3. Easy jumpstart to python Mapreduce :