scipion logo

Parallelization

By default, each function step is executed only after the previous one is fully completed. If any amount of steps should be executed in parallel, the following steps detailed below must be met:

  • In the protocol __init__ definition add the following instruction:

def __init__(self, **args):
   super().__init__(**args)
   # ...
   self.stepsExecutionMode = STEPS_PARALLEL
   # ...

This attribute will make all the steps functions run at the same time by default now, and it will be necessary to define the dependencies between them to only parallelize the functions that can be parallelized.

  • In the protocol _defineParams method add the parallelization section defining the number of threads to use.

def _defineParams(self, form):
   # Subsitute N for the number of threads you want
   # your protocol's form to use by default.
   # It has to be an integer greater than 0.
   form.addParallelSection(threads=N)
   # ...
  • In the protocol’s _insertAllSteps function, the steps to be executed by the protocol need to be inserted with their dependencies. An example is provided below:

def _insertAllSteps(self):
   """
   In this function the steps that are going to be executed should
   be defined. Two of the most used functions are: _insertFunctionStep or _insertRunJobStep
   """
   # Defining list of function ids to be waited for by the createOutputStep function
   deps = []
   for element in myList:
      # Calling processConversion in parallel with each input data
      deps.append(self._insertFunctionStep(self.processStep, element, prerequisites=[]))

   # Insert output generation step
   self._insertFunctionStep(self.createOutputStep, prerequisites=deps)

def processStep(self, element):
   # Do something with that element

def createOutputStep(self):
   # Generate ouputs

In this example, we have two functions:

  • processStep

  • createOutputStep

processStep is a function that has to be executed once for each element in the list, and, in this case, it is going to happen in parallel.

Calling function _insertFunctionStep, returns an id given by Scipion to the function being inserted. Also, when calling that function, a param named prerequisites has to be supplied. This param must be a list containing all the ids corresponding to functions that need to be executed before the function being inserted can start. If that function has no dependencies, the list needs to be empty, but it still needs to be supplied, or else some errors will occur (this will get fixed soon, but, in the mean time, keep in mind that at least an empty list has to be passed).

Going back to the example above, processStep has no dependencies, so the prerequisites param has an empty list for each of them. Additionally, this function takes one positional param (element), so that param needs to be passed before the keyword argument prerequisites.

Every function id being generated by the insertion of each instance of processStep has to be stored in a list, in this case deps. This list will be the param prerequisites of function createOutputStep. Which means that createOutputStep will only start once every instance of processStep has finished. If an empty list was passed instead of those function ids, createOutputStep will start at the same time than the rest of functions, resulting in errors if it needs some data produced by other ones.

Note: Every step function needs to be inserted within _insertAllSteps. That is, because, the protocol’s GUI while running, shows a progress status in the format of StepsCompleted/TotalSteps, and TotalSteps only take account the steps introduced within the _insertAllSteps function. If there is any call to _insertFunctionStep from another function, even it that function is being called inside _insertAllSteps, the protocol GUI will end up with more completed steps than total steps (i.e. 100/80). This does not break protocol’s results at all, but it is not ideal for a user to look at either.