Parallelization
By default, each function step is executed only after the previous one is fully completed. If any amount of steps should be executed in parallel, the following steps detailed below must be met:
In the protocol
__init__
definition add the following instruction:
def __init__(self, **args):
super().__init__(**args)
# ...
self.stepsExecutionMode = STEPS_PARALLEL
# ...
This attribute will make all the steps functions run at the same time by default now, and it will be necessary to define the dependencies between them to only parallelize the functions that can be parallelized.
In the protocol
_defineParams
method add the parallelization section defining the number of threads to use.
def _defineParams(self, form):
# Subsitute N for the number of threads you want
# your protocol's form to use by default.
# It has to be an integer greater than 0.
form.addParallelSection(threads=N)
# ...
In the protocol’s
_insertAllSteps
function, the steps to be executed by the protocol need to be inserted with their dependencies. An example is provided below:
def _insertAllSteps(self):
"""
In this function the steps that are going to be executed should
be defined. Two of the most used functions are: _insertFunctionStep or _insertRunJobStep
"""
# Defining list of function ids to be waited for by the createOutputStep function
deps = []
for element in myList:
# Calling processConversion in parallel with each input data
deps.append(self._insertFunctionStep(self.processStep, element, prerequisites=[]))
# Insert output generation step
self._insertFunctionStep(self.createOutputStep, prerequisites=deps)
def processStep(self, element):
# Do something with that element
def createOutputStep(self):
# Generate ouputs
In this example, we have two functions:
processStep
createOutputStep
processStep
is a function that has to be executed once for each element in the list, and, in this case, it is going to happen in parallel.
Calling function _insertFunctionStep
, returns an id given by Scipion to the function being inserted. Also, when calling that function, a param named prerequisites
has to be supplied. This param must be a list containing all the ids corresponding to functions that need to be executed before the function being inserted can start. If that function has no dependencies, the list needs to be empty, but it still needs to be supplied, or else some errors will occur (this will get fixed soon, but, in the mean time, keep in mind that at least an empty list has to be passed).
Going back to the example above, processStep
has no dependencies, so the prerequisites
param has an empty list for each of them. Additionally, this function takes one positional param (element
), so that param needs to be passed before the keyword argument prerequisites
.
Every function id being generated by the insertion of each instance of processStep
has to be stored in a list, in this case deps
. This list will be the param prerequisites
of function createOutputStep
. Which means that createOutputStep
will only start once every instance of processStep
has finished. If an empty list was passed instead of those function ids, createOutputStep
will start at the same time than the rest of functions, resulting in errors if it needs some data produced by other ones.
Note: Every step function needs to be inserted within _insertAllSteps
. That is, because, the protocol’s GUI while running, shows a progress status in the format of StepsCompleted/TotalSteps, and TotalSteps only take account the steps introduced within the _insertAllSteps
function. If there is any call to _insertFunctionStep
from another function, even it that function is being called inside _insertAllSteps
, the protocol GUI will end up with more completed steps than total steps (i.e. 100/80). This does not break protocol’s results at all, but it is not ideal for a user to look at either.