API workflows
To create Scipion workflows from Python scripts, you only need to do a few logical steps:
In addition, we include an special mention in `how to set EM objects as protocol inputs<Setting EM objects as protocol inputs>`_.
Creating a project
To create an empty project, you only must use the Manager.createProject() method
from pyworkflow.project import Manager
manager = Manager()
project = manager.createProject(configDict[PROJECT_NAME],
location=configDict[SCIPION_PROJECT]
where PROJECT_NAME
and SCIPION_PROJECT
constants return the project name and
the project path from the configDict
, respectively.
Creating some protocols
Once a project is created, we must fill it with protocols.
To be able to instance a protocol, it must be imported from the containing plugin (or from Scipion). Here an example of importing the Import Movies protocol from Scipion and the Motioncor protocol from the scipion-em-motioncor plugin.
import pyworkflow.utils as pwutils
from pyworkflow.em.protocol import ProtImportMovies
ProtMotionCorr = pwutils.importFromPlugin('motioncorr.protocols', 'ProtMotionCorr')
Notice that we use the pwutils.importFromPlugin() method to import classes from plugins in order to avoid uncontrolled errors if that plugin is not installed in the current Scipion.
To create a protocol, the Project class has a method to make new protocols. For instance, for the Import movies
protImport = project.newProtocol(
ProtImportMovies,
objLabel='import movies',
importFrom=ProtImportMovies.IMPORT_FROM_FILES,
filesPath=configDict.get(DATA_FOLDER),
filesPattern=configDict.get(PATTERN),
amplitudeContrast=configDict.get(AMP_CONTR),
sphericalAberration=configDict.get(SPH_AB),
voltage=configDict.get(VOL_KV),
samplingRate=configDict.get(SAMPLING),
doseInitial=configDict.get(DOSE0, 0),
dosePerFrame=configDict.get(DOSEF, 0),
gainFile=gainFn,
dataStreaming=True,
timeout=configDict.get(TIMEOUT, 43200) # 12h default
)
Notice that the first argument of the project.newProtocol() method is an EM-protocol subclass, corresponding to that protocol to be created (and imported above). The following arguments are all those form parameters defined by the protocol using the addParam() method. If a parameter is not set, the default value is used.
Adding protocols to the project
Once a protocol is instanced, we should register it, which means writing it in disk at the data bases. This can be done in two alternative ways:
Saving the protocol using the project.saveProtocol() method:
project.saveProtocol(protImport)
This option is conceptually similar to create a JSON block when making Scipion’s templates Therefore, that protocol does not run until the whole workflow will be launched after including all the rest protocols. Thus, the output objects from the saved protocols are not created yet and, then, no information can be gotten from them. Even though, they can be used for the next protocols with no inconvenience as we will see below.
Launching the protocol using the project.launchProtocol() method:
project.launchProtocol(protImport, wait=False)
This option is to launch the protocol as soon as we register it (conceptually similar to a manual processing using the GUI). The drawback here is that we must take into account that all inputs have to be ready before launching a certain protocol. In this way, we must monitor the outputs of the protocols to prevent launching posterior protocols with empty inputs.
Notice that we introduce wait=False to continue with the script. If wait=True, the script stops here until that protocol finishes and this doesn’t make sense for streaming processing.
These two options are noticeable different and we must take a procedure decision, since we should do the same for all the protocols.
Setting EM objects as protocol inputs
We have seen how to set protocol parameters in the initialization arguments. However, the set(value) method sets a value to any protocol parameter. For instance,
protMotionCor = project.newProtocol(ProtMotionCorr)
protMotionCor.doApplyDoseFilter.set(configDict.get(DOSEF, 0)>0)
where protMotionCor is initialized in the first line whereas the
doApplyDoseFilter is set to True
if
the dose per frame introduced by the user is bigger than 0 or
to False
, instead.
Until here, we only have set Scalar objects or Built-in Python Types. However, usually we want to use EMSets outputs (SetOfMicrographs, SetOfCtfs, SetOfCoordinates, SetOfParticles, SetOfClasses…) from previous protocols as input for next protocols. At this point, we must take into account that previous protocols may be not running yet (see Adding protocols to the project). Then, in that cases, the previous outputs are not created yet. Therefore it cannot be passed as value in the set(value) method. To fix this situations, we set the whole protocol as input parameter, while indicating which object should be retrieved in the running time from that protocol by means of the setExtended() method. For instance,
protMotionCor.inputMovies.set(protImport)
protMotionCor.inputMovies.setExtended('ouputMovies')
where the whole protImport
protocol is attached to the
protMotionCor.inputMovies
in the first line and the outputMovies
is set as
an extension for this parameter, indicating that it will be gotten in the
running time.
If you decided to launch protocols as soon as they are created when adding protocols to the project, then a direct assignation is possible as long as the object is ready
protMotionCor.inputMovies.set(protImport.outputMovies)
Notice that to use this, protImport
must have an attribute
called outputMovies
if not, this line will break.
In this case, a waiting function to ensure that the protImport.outputMovies is ready to be used becomes crucial. For instance
from time import sleep
def waitOutput(protocol, outputAttributeName, checkNotEmpty=True, timeout=5000):
""" Wait until the output is being generated by the protocol.
Returns False if the TimeOut is reached or True, instead.
"""
timeStep = 5. # checking every 5 second
def _loadProt():
# Load the last version of the protocol from its own database
prot2 = getProtocolFromDb(protocol.getProject().path,
protocol.getDbPath(),
protocol.getObjId())
# Close DB connections
prot2.getProject().closeMapper()
prot2.closeMappers()
return prot2
counter = 1
prot2 = _loadProt()
while not prot2.hasAttribute(outputAttributeName):
sleep(timeStep)
prot2 = _loadProt()
if counter > timeout/timeStep:
return False
counter += 1
outputObject = prot2.getAttributeValue(outputAttributeName)
while not outputObject is not None and not outputObject.getSize() > 0:
sleep(timeStep)
prot2 = _loadProt()
if counter > timeout/timeStep:
return False
counter += 1
# Update the protocol instance to get latest changes
project._updateProtocol(protocol)
return True
# Waiting for a non empty output from MotioCor2 to continue
waitOutput(protMotionCor, 'outputMicrographs')
# Importing, creating and launching the gCTF protocol
ProtGctf = pwutils.importFromPlugin('gctf.protocols', 'ProtGctf')
protGCTF = project.newProtocol(ProtGctf,
objLabel='gCTF estimation',
gpuList=str(configDict.get(GCTF)))
protCTF2.inputMicrographs.set(protMotionCor.outputMicrographs)
project.launchProtocol(protGCTF, wait=False)
# Waiting for at least one CTF estimation to continue
waitOutput(protGCTF, 'outputCTF')