"MapReduce-MPI WWW Site"_mws - "MapReduce-MPI Documentation"_md :c

:link(mws,http://www.cs.sandia.gov/~sjplimp/mapreduce.html)
:link(md,Manual.html)

:line

MapReduce reduce() method :h3
MapReduce multivalue_blocks() method :h3
MapReduce multivalue_block() method :h3

int MapReduce::reduce(void (*myreduce)(char *, int, char *, int, int *, KeyValue *, void *), void *ptr) :pre

int MapReduce::multivalue_blocks() :pre

int MapReduce::multivalue_block(int iblock, char **ptr_multivalue, int **ptr_valuesizes) :pre

This calls the reduce() method of a MapReduce object, passing it a
function pointer to a reduce function you write.  It operates on a
KeyMultiValue object, calling your myreduce function once for each
unique key/multi-value pair owned by that processor.  A new KeyValue
object is created which stores all the key/value pairs generated by
your myreduce() function.  The method returns the total number of new
key/value pairs stored by all processors.

You can give this method a pointer (void *ptr) which will be returned
to your myreduce() function.  See the "Technical
Details"_technical.html section for why this can be useful.  Just
specify a NULL if you don't need this.

In this example the user function is called myreduce() and it must
have the following interface, which is the same as that used by the
"compress()"_compress.html method:

void myreduce(char *key, int keybytes, char *multivalue, int nvalues, int *valuebytes, KeyValue *kv, void *ptr) :pre

A single key/multi-value pair is passed to your function from the
KeyMultiValue object stored by the MapReduce object.  The key is
typically unique to this reduce task and the multi-value is a list of
the nvalues associated with that key in the KeyMultiValue object.

There are two possibilities for the multi-value.  The first is that
all its values fit in the memory allocated for the MapReduce object,
which is the usual case.  See the {memsize} "setting"_settings.html
for details on memory allocation.

In this case, the char *multivalue argument is a pointer to the
beginning of the multi-value which contains all nvalues, packed one
after the other.  The int *valuebytes argument is an array which
stores the length of each value in bytes.  If needed, it can be used
by your function to compute an offset into char *values for where each
individual value begins.  Your function is also passed a kv pointer to
a new KeyValue object created and stored internally by the MapReduce
object.

If the values do not fit in memory, then the meaning of the arguments
passed to your function is changed.  Your function must call two
additional library functions in order to retrieve a block of values
that does fit in memory, and process them one block at a time.

In this case, the char *multivalue argument will be NULL, which is how
your function can test for this possibility.  If you know huge
multi-values will not occur or if you don't need to examine the values
themselves, then the test is not needed.  Nvalues still holds the
total number of values in the multi-value.  The meaning of the kv and
ptr arguments is the same as discussed above.  However, the int
*valuebytes argument is changed to be a pointer to the MapReduce
object.  This is to allow you to make the following two kinds of calls
back to the library:

MapReduce *mr = (MapReduce *) valuebytes;
int nblocks = mr->multivalue_blocks();
for (int iblock = 0; iblock < nblocks; iblock++) \{ 
  int nv = mr->multivalue_block(iblock,&multivalue,&valuebytes);
  for (int i = 0; i < nv; i++) \{
    process each value within the block of values
  \}
\} :pre

The call to multivalue_blocks() returns the number of blocks of values
in the multi-value.  Each call to multivalue_block() retrieves one
block of values.  The number of values in the block (nv in this case)
is returned.  The multivalue and valuebytes arguments are pointers to
a char * and int * (i.e. a char ** and int **), which will be set to
point to the block of values and their lengths respectively, so they
can then be used just as the multivalue and valuebytes arguments in
the myreduce() callback itself (when the values do not exceed
available memory).

Note that in this example we are re-using (and thus overwriting) the
original multivalue and valuebytes arguments as local variables.

Also note that your myreduce() function can call multivalue_block() as
many times as it wishes and process the blocks of values multiple
times or in any order, though looping through blocks in ascending
order will typically give the best disk I/O performance.

Your myreduce() function can produce key/value pairs (though this is
not required) which it registers with the MapReduce object by calling
the "add()"_kv_add.html method of the KeyValue object.  The syntax for
registration is described on the doc page of the KeyValue
"add()"_kv_add.html method.  Alternatively, your myreduce() function
can write information to an output file.

See the "Settings"_settings.html and "Technical
Details"_technical.html sections for details on the byte-alignment of
keys and values that are passed to your myreduce() function and on
those you register with the KeyValue "add()"_kv_add.html methods.
Note that only the first value of a multi-value (or of each block of
values) passed to your myreduce() function will be aligned to the
{valuealign} "setting"_settings.html.

This method is an on-processor operation, requiring no communication.
When run in parallel, each processor performs a myreduce() on each of
the key/value pairs it owns and stores any new key/value pairs it
generates.

:line

[Related methods]: "Keyvalue add()"_kv_add.html, "map()"_map.html
