Batch processing with for_each

Image processing often involves executing the same command on many different subjects or time points within a study. MRtrix3 includes a Python script called for_each to simplify this process. The main benefit of using for_each compared to a bash for loop is a simpler and less verbose syntax. However other benefits include multi-threaded job execution (to exploit modern multi-core CPUs when the command being run is not already multi-threaded), and automatic identification of path basenames and prefixes. To view the full help page run for_each on the command line with no arguments.

Example 1 - using IN

Many people like to organise their imaging datasets with one directory per subject. For example:

study/001_patient/dwi.mif
study/002_patient/dwi.mif
study/003_patient/dwi.mif
study/004_control/dwi.mif
study/005_control/dwi.mif
study/006_control/dwi.mif

The for_each script can be used to run the same command on each subject, for example:

.. code-block:: console

$ for_each study/* : dwidenoise IN/dwi.mif IN/dwi_denoised.mif

The first part of the command above is the for_each script name, followed by the pattern matching string (study/*) to identify all the files (which in this case are directories) to be looped over. The colon is used to separate the invocation of for_each, along with its inputs and any command-line options, from the command to be executed. In this example the dwidenoise command will be run multiple times, by substituting the keyword IN with each of the directories that match the pattern (study/001_patient, study/002_patient, etc.).

Example 2 - using NAME

Other people may pefer to organise their imaging datasets with one folder per image type and have all subjects inside. For example:

study/dwi/001_patient.mif
study/dwi/002_patient.mif
study/dwi/003_patient.mif
study/dwi/004_control.mif
study/dwi/005_control.mif
study/dwi/006_control.mif

The NAME keyword can be used in this situation to obtain the basename of the file path. For example:

$ mkdir study/dwi_denoised
$ for_each study/dwi/* : dwidenoise IN study/dwi_denoised/NAME

Here, the IN keyword will be substituted with the full string from the matching pattern (study/dwi/001_patient.mif, study/dwi/002_patient.mif, etc), however the NAME keyword will be replaced with the basename of the matching pattern (001_patient.mif, 002_patient.mif, etc).

Alternatively, the same result can be achieved by running for_each from inside the study/dwi directory. In this case NAME would not be required. For example:

$ mkdir study/dwi_denoised
$ cd study/dwi
$ for_each * : dwidenoise IN ../dwi_denoised/IN

Example 3 - using PRE

For this example let us assume we want to convert all dwi.mif files from example 2 to NIfTI file format (*.nii). This can be performed using:

$ for_each study/dwi/* : mrconvert IN study/dwi/PRE.nii
$ rm *.mif

There the PRE keyword will be replaced by the file basename, without the file extension.

Example 4 - Sequential Processing

As an example of a single for_each command running multiple sequential commands (e.g. with the bash ;, |, &&, || operators), let’s assume in the previous example we wanted to remove the *.mif files as they were converted. We could use the && operator, which means “run next command only if current command succeeds without error”.

$ for_each study/dwi/* : mrconvert IN study/dwi/PRE.nii "&&" rm IN

The && operator here must be escaped with quotes in order to prevent the shell from interpreting it. Bash operator characters can also be escaped with the “" character; for example, to pipe an image between two MRtrix commands (assuming the data set directory layout from example 1):

$ for_each study/* : dwiextract -bzero IN/dwi.mif - \| mrmath - mean -axis 3 IN/mean_b0.mif

Example 5 - Parallel Processing

To run multiple jobs at once, use the standard MRtrix3 command-line option -nthreads N, where N is the number of concurrent jobs required. For example:

$ for_each study/* -nthreads 8 : dwidenoise IN/dwi.mif IN/dwi_denoised.mif

will run up to 8 of the required jobs in parallel. Note that unlike in other MRtrix3 commands where command-line options can be placed anywhere on the command-line, in this particular context the -nthreads option must be specified before the colon separator. This is necessary in order for the for_each script to recognise that this command-line option applies to its own operation, as opposed to the command that for_each is responsible for invoking. To demonstrate this, consider the following usage:

$ for_each study/* : dwidenoise IN/dwi.mif IN/dwi_denoised.mif -nthreads 8

Here, for_each would execute the dwidenoise command entirely sequentially, once for each input; but each time it is run, dwidenoise would be instructed to use 8 threads.

Indeed these two usages can in theory be combined. Imagine that a hypothetical MRtrix3 command, “dwidostuff”, tends to not be capable in practise of utilising any more than four threads, regardless of how many threads are in fact available on your hardware / explicitly invoked. However you have a system with eight hardware threads, and wish to utilise them all as much as possible. In such a scenario, you could use:

$ for_each study/* -nthreads 2 : dwidostuff IN/dwi.mif IN/dwi_stuffdone.mif -nthreads 4

This would instruct for_each to always have two jobs running in parallel, each of which will be explicitly instructed to use four threads.

Note that most MRtrix3 commands are multi-threaded, and will generally succeed in individually using all available CPU cores, in which case running multiple jobs in parallel using for_each is unlikely to provide a benefit in computation time (or it may in fact be detrimental). If however a particular command is known to be single-threaded (or have only limited multi-threading capability), and your system possesses enough RAM to support running multiple instances of that command at once, this usage may yield a considerable reduction in total processing time.