"MapReduce-MPI WWW Site"_mws - "MapReduce-MPI Documentation"_md :c

:link(mws,http://www.cs.sandia.gov/~sjplimp/mapreduce.html)
:link(md,Manual.html)

:line

MapReduce map() method :h3

int MapReduce::map(int nmap, void (*mymap)(int, KeyValue *, void *), void *ptr)
int MapReduce::map(int nmap, void (*mymap)(int, KeyValue *, void *), void *ptr, int addflag) :pre

int MapReduce::map(char *file, void (*mymap)(int, char *, KeyValue *, void *), void *ptr)
int MapReduce::map(char *file, void (*mymap)(int, char *, KeyValue *, void *), void *ptr, int addflag) :pre

int MapReduce::map(int nmap, int nfiles, char **files, char sepchar, int delta, void (*mymap)(int, char *, int, KeyValue *, void *), void *ptr)
int MapReduce::map(int nmap, int nfiles, char **files, char sepchar, int delta, void (*mymap)(int, char *, int, KeyValue *, void *), void *ptr, int addflag) :pre

int MapReduce::map(int nmap, int nfiles, char **files, char *sepstr, int delta, void (*mymap)(int, char *, int, KeyValue *, void *), void *ptr)
int MapReduce::map(int nmap, int nfiles, char **files, char *sepstr, int delta, void (*mymap)(int, char *, int, KeyValue *, void *), void *ptr, int addflag) :pre

int MapReduce::map(MapReduce *mr2, void (*mymap)(uint64_t, char *, int, char *, int, KeyValue *, void *), void *ptr)
int MapReduce::map(MapReduce *mr2, void (*mymap)(uint64_t, char *, int, char *, int, KeyValue *, void *), void *ptr, int addflag) :pre

This calls the map() method of a MapReduce object.  A function pointer
to a mapping function you write is specified as an argument.  This
method either creates a new KeyValue object to store all the key/value
pairs generated by your mymap function, or adds them to an existing
KeyValue object.  The method returns the total number of key/value
pairs in the KeyValue object.

For the first set of variants (with and without addflag) you specify a
total number of map tasks {nmap} to perform across all processors.
The index of a map task is passed back to your mymap() function.

For the second set of variants you specify a master file that contains
a list of filenames.  A filename is passed back to your mymap()
function.  The master file should list one file per line.  Blank lines
are not allowed.  Leading and trailing whitespace around the filename
is OK.

For the third set of variants you specify an array of one or more file
names and a separation character (sepchar).  For the fourth set of
variants, you specify an array of one or more files names and a
separation string (sepstr).  The file(s) are split into nmap chunks
with roughly equal numbers of bytes in each chunk.  One chunk from one
file is read and passed back to your mymap() function, so your code
does not read the file.  See details below about the splitting
methodology and the delta input parameter.

For the fifth set of variants, you specify an existing MapReduce
object mr2 with key/value pairs, which can either be this MapReduce
object or another one.  The key/value pairs from mr2 are passed back
to your mymap() function, one key/value at a time, allowing you to
generate new key/value pairs from an existing set.

You can give any of the map() methods a pointer (void *ptr) which will
be returned to your mymap() function.  See the "Technical
Details"_technical.html section for why this can be useful.  Just
specify a NULL if you don't need this.

If the last argument {addflag} is omitted or is specified as 0, then
map() will create a new KeyValue object, deleting any existing
KeyValue object.  If addflag is non-zero, then key/value pairs
generated by your mymap() function are added to an existing KeyValue
object, which is created if needed.

If the fifth map() variant is called using the MapReduce object itself
as an argument, and if addflag is 0, then the existing KeyValue object
is effectively replaced by the newly generated key/value pairs.  If
addflag is non-zero, then the newly generated key/value pairs are
added to the existing KeyValue object.

In this example the user function is called mymap() and it has one of
four interfaces depending on which variant of the map() method is
invoked:

void mymap(int itask, KeyValue *kv, void *ptr)
void mymap(int itask, char *file, KeyValue *kv, void *ptr)
void mymap(int itask, char *str, int size, KeyValue *kv, void *ptr)
void mymap(uint64_t itask, char *key, int keybytes, char *value, int valuebytes, KeyValue *kv, void *ptr) :pre

In all cases, the final 2 arguments passed to your function are a
pointer to a KeyValue object (kv) stored internally by the MapReduce
object, and the original pointer you specified as an argument to the
map() method, as void *ptr.

In the first case, itask is passed to your function with a value 0 <=
itask < nmap, where nmap was specified in the map() call.  For
example, you could use itask to select a file from a list stored by
your application.  Your mymap() function could open and read the file
or perform some other operation.

In the second case, itask will have a value 0 <= itask < nfiles, where
nfiles is the number of filenames in the master file you specified.
Your function is also passed a single filename, which it will
presumably open and read.

In the third case, itask will have a value from 0 <= itask < nmap,
where nmap was specified in the map() call and is the number of file
segments generated.  It is also passed a string of bytes (str) of
length size from one of the files.  Size includes a trailing '\0' that
is appended to the string.

For map() methods that take files and a separation criterion as
arguments, you must specify nmap >= nfiles, so that there is one or
more map tasks per file.  For files that are split into multiple
chunks, the split is done at occurrences of the separation character
or string.  You specify a delta of how many extra bytes to read with
each chunk that will guarantee the splitting character or string is
found within that many bytes.  For example if the files are lines of
text, you could choose a newline character '\n' as the sepchar, and a
delta of 80 (if the longest line in your files is 80 characters).  If
the files are snapshots of simulation data where each snapshot is 1000
lines (no more than 80 characters per line), you could choose the
first line of each snapshot (e.g. "Snapshot") as the sepstr, and a
delta of 80000.  Note that if the separation character or string is
not found within delta bytes, an error will be generated.  Also note
that there is no harm in choosing a large delta so long as it is not
larger than the chunk size for a particular file.

If the separation criterion is a character (sepchar), the chunk of
bytes passed to your mymap() function will start with the character
after a sepchar, and will end with a sepchar (followed by a '\0').  If
the separation criterion is a string (sepstr), the chunk of bytes
passed to your mymap() function will start with sepstr, and will end
with the character immediately preceeding a sepstr (followed by a
'\0').  Note that this means your mymap() function will be passed
different byte strings if you specify sepchar = 'A' vs sepstr = "A".

In the fourth case, itask will have a value from 0 <= itask < nkey,
where nkey is a unsigned 64-bit int and is the number of key/value
pairs in the specified MapReduce object.  Key and value are the byte
strings for a single key/value pair and are of length keybytes and
valuebytes respectively.

The MapReduce library assigns map tasks to processors.  Options for
how it does this can be controlled by "MapReduce
settings"_settings.html.  Basically, nmap/P tasks are assigned to each
processor, where P is the number of processors in the MPI communicator
you instantiated the MapReduce object with.

Typically, your mymap() function will produce key/value pairs which it
registers with the MapReduce object by calling the "add()"_kv_add.html
method of the KeyValue object.  The syntax for registration is
described on the doc page of the KeyValue "add()"_kv_add.html method.

See the "Settings"_settings.html and "Technical
Details"_technical.html sections for details on the byte-alignment of
keys and values you register with the KeyValue "add()"_kv_add.html
methods or that are passed to your mymap() function.

Aside from the assignment of tasks to processors, this method is
really an on-processor operation, requiring no communication.  When
run in parallel, each processor generates key/value pairs and stores
them, independently of other processors.

:line

[Related methods]: "Keyvalue add()"_kv_add.html, "reduce()"_reduce.html
