xmipp3.protocols.protocol_ransac module
- class xmipp3.protocols.protocol_ransac.XmippProtRansac(**args)[source]
Bases:
ProtInitialVolumeComputes an initial 3d model from a set of projections/classes using RANSAC algorithm.
This method is based on an initial non-lineal dimensionality reduction approach which allows to select representative small sets of class average images capturing the most of the structural information of the particle under study. These reduced sets are then used to generate volumes from random orientation assignments. The best volume is determined from these guesses using a random sample consensus (RANSAC) approach.
AI Generated
## Overview
The RANSAC protocol generates initial 3D volumes from a set of 2D class averages or projection averages.
Initial model generation is a critical step in single-particle cryo-EM. Before a high-quality 3D refinement can be performed, the workflow usually needs a starting volume that has approximately the correct global shape and orientation distribution. The RANSAC protocol addresses this problem by generating many candidate initial volumes from small random subsets of the input averages and then selecting the best candidates according to how well they explain the full set of input images.
The method follows a random sample consensus strategy. Each RANSAC iteration uses a small subset of 2D averages to propose a tentative 3D volume. The candidate volume is then projected, and the projections are compared with the input averages. Volumes supported by many well-correlated images are considered better candidates. The best volumes are then refined by projection matching.
The output is a set of proposed initial volumes, each annotated with scoring information.
## Inputs and General Workflow
The main input is a set of 2D class averages or projection averages.
The protocol first converts the input images into Xmipp metadata format. It then low-pass filters and resizes the input averages to a working size suitable for initial-model search. This makes the procedure faster and focuses the search on low- to medium-resolution structural information.
The protocol then runs many independent RANSAC iterations. In each iteration, it selects a small subset of input averages, assigns or estimates orientations, reconstructs a tentative volume, projects that volume, and compares the projections with the input averages.
After all iterations, the protocol evaluates the candidate volumes using an inlier criterion based on projection correlation. The best volumes are selected and refined through several projection-matching iterations. Finally, the refined volumes are resized back to the original box size and written as output volumes.
## Input Averages
The Input averages parameter should point to a set of 2D classes or averages.
If the input is a SetOfClasses2D, the classes should have representative images. These representative images are the class averages used by the protocol. If the input is a SetOfAverages or similar particle-average set, the images are used directly.
The quality of the input averages is very important. RANSAC initial-model generation works best when the averages show clear structural features and represent a broad range of particle views.
If the input averages are noisy, dominated by contaminants, affected by strong preferred orientation, or contain many inconsistent particle populations, the candidate volumes may be poor or ambiguous.
## Symmetry Group
The Symmetry group parameter defines the symmetry assumed during projection generation, reconstruction, and volume evaluation.
For asymmetric particles, use c1. If the particle has known symmetry, the appropriate Xmipp symmetry group should be provided.
Correct symmetry can help generate better initial volumes by enforcing equivalent views and reducing ambiguity. However, using an incorrect symmetry can produce misleading models by averaging non-equivalent features.
Users should only impose symmetry when it is biologically justified.
## Angular Sampling Rate
The Angular sampling rate parameter defines how finely projection directions are explored, in degrees.
A smaller angular sampling value means a denser angular search. This can improve orientation assignment and candidate-volume evaluation, but increases computation time.
A larger value is faster but may miss important orientation differences and reduce the quality of the initial model.
The default value is a practical compromise for many datasets. Advanced users may adjust it depending on particle size, symmetry, and expected angular complexity.
## Number of RANSAC Iterations
The Number of RANSAC iterations parameter controls how many candidate volumes are generated and tested.
Each iteration proposes a volume from a different random subset or sampling of the input averages. Increasing the number of iterations increases the chance of finding a good initial model, especially when the input set contains outliers or multiple particle populations.
However, more iterations require more computation. The default is designed to provide broad exploration while remaining practical for typical workflows.
If the output volumes are unstable or poor, increasing the number of RANSAC iterations may help.
## Dimensionality Reduction
The Perform dimensionality reduction option changes how representative input averages are selected during RANSAC.
When enabled, the protocol uses Local Tangent Space Alignment, or LTSA, to organize the input averages in a reduced-dimensional space. It then samples representative images from this space using a grid.
This can help select a small set of averages that captures the diversity of the dataset, rather than choosing purely random images.
Dimensionality reduction is an advanced option. It requires that the number of input classes be large enough relative to the grid size. If there are too few classes, the protocol reports a validation error.
## Number of Grids per Dimension
When dimensionality reduction is enabled, the Number of grids per dimension parameter controls how the reduced space is sampled.
For example, a value of 3 creates a 3 by 3 grid, giving up to 9 regions from which representative classes can be sampled.
A larger grid gives more detailed sampling of the reduced space, but requires more input classes. A smaller grid is more conservative.
This parameter should be chosen according to the number of available classes and the diversity of the input averages.
## Number of Random Samples
When dimensionality reduction is disabled, the Number of random samples parameter defines how many input averages are selected in each RANSAC iteration.
These randomly selected averages are assigned random orientations, or orientations constrained by an optional initial volume, and are used to reconstruct a candidate volume.
If too few samples are used, the candidate volumes may be unstable or insufficiently constrained. If too many are used, each iteration becomes more expensive and less exploratory.
The default is intended to generate many quick candidate volumes from small subsets.
## Initial Volume
The Initial volume parameter allows the user to provide a very rough volume to constrain the angular search.
This is optional. The protocol can run without an initial volume.
A rough initial volume may be useful when the specimen has a known overall shape, such as a cylinder for a filament-like object. In that case, the initial volume can help assign more plausible tilt angles, while still allowing the rotational angle to be uncertain.
The initial volume should be used carefully. If it is too specific or incorrect, it may bias the search toward a wrong structure. It is best used as a broad geometrical constraint rather than as a detailed reference.
## Maximum Frequency of the Initial Volume
The Max frequency of the initial volume parameter controls the resolution used during the initial-model search, expressed in angstroms.
The protocol low-pass filters and resizes the input averages according to this parameter. The goal is to focus the RANSAC search on robust low-resolution features rather than noisy high-resolution details.
A lower-resolution search is usually appropriate for initial model generation. The purpose is to obtain the correct global shape, not to recover fine features.
## Inliers Threshold
The Inliers threshold defines the correlation value used to decide whether an input average supports a candidate volume.
After a candidate volume is generated, the protocol projects it and compares those projections with the input averages. Averages with correlation above the threshold are considered inliers.
Candidate volumes with more and better inliers receive higher scores.
If the threshold is too high, few or no candidate volumes may be considered valid. If this happens, the user should lower the threshold.
If the threshold is too low, poor candidate volumes may appear to have many inliers, reducing the selectivity of the method.
## Number of Best Volumes to Refine
The Number of best volumes to refine parameter controls how many of the best RANSAC candidates are selected for refinement.
After all RANSAC iterations are evaluated, the protocol ranks the candidates by their inlier support. The best candidates are copied into refinement branches.
Keeping several candidates is useful because initial-model generation can be ambiguous. Different candidates may correspond to alternative orientations, different conformations, or different local optima.
The output contains the refined versions of these selected candidates.
## Number of Iterations to Refine the Volumes
The Number of iterations to refine the volumes parameter defines how many projection-matching refinement cycles are applied to each selected candidate.
In each cycle, the current volume is projected, input averages are assigned to projections, and a new volume is reconstructed from the updated assignments.
More iterations can improve the consistency of the candidate volume, but also increase computation time. Too many iterations are not always beneficial if the initial candidate is poor or if the input averages are inconsistent.
## Use All Images to Refine
The Use all images to refine option controls which averages are used during the refinement of selected RANSAC volumes.
If disabled, only inlier images are used. This keeps refinement focused on the images that strongly support the candidate volume.
If enabled, all input averages are used for refinement. This may be useful when the input set is clean and the user wants each candidate to be refined using all available information.
For heterogeneous or noisy input sets, using only inliers may be safer.
## GPU Execution
The protocol can use GPU acceleration for reconstruction steps when available.
GPU execution is enabled through the hidden GPU parameters. If GPU use is requested but the required Xmipp CUDA programs are not available, the protocol reports a validation error.
GPU acceleration can substantially reduce computation time, especially when many RANSAC iterations and refinement steps are requested.
## Output Volumes
The main output is outputVolumes, a set of refined candidate initial volumes.
Each output volume is written in MRC format, resized to the original input box size, and assigned the sampling rate of the input averages.
The output volumes are annotated with Xmipp scoring values, including:
total volume score;
thresholded volume score;
mean correlation score;
minimum correlation score.
These scores help the user compare candidate volumes, but they should not be used blindly. Visual inspection and downstream refinement behavior are also important.
## Interpreting the Candidate Volumes
The RANSAC output volumes should be interpreted as possible initial models.
They are not final reconstructions. Their role is to provide plausible starting points for later 3D refinement.
A good candidate should have a reasonable global shape, be compatible with the 2D class averages, and produce stable behavior in subsequent refinement. A poor candidate may show distorted shape, missing regions, strong artifacts, or features inconsistent with the class averages.
When several output volumes are produced, users should inspect them and choose the most plausible candidates for downstream refinement.
## Practical Recommendations
Use clean and representative 2D class averages as input. Remove obvious junk classes before running RANSAC.
Use the correct symmetry group only when symmetry is known. Do not impose symmetry merely to make the output look cleaner.
Start with the default number of RANSAC iterations and best volumes. Increase the number of iterations if the results are unstable or if the dataset is difficult.
If no valid maps are found, lower the inliers threshold.
Use dimensionality reduction only when there are enough input classes and when you want representative sampling of class variability.
Consider providing a very rough initial volume only when there is a strong reason to constrain the angular search. Avoid using a detailed reference that could bias the initial model.
Inspect all output volumes visually before selecting one for refinement.
## Final Perspective
RANSAC is an initial-volume generation protocol based on repeated random hypotheses and consensus scoring.
For biological users, its value is that it can propose several possible 3D starting models from 2D averages without requiring tilted-pair data. By testing many candidate volumes and retaining those best supported by the input averages, the protocol reduces dependence on any single random initialization.
The resulting volumes should be treated as starting hypotheses. Their quality must be assessed by visual inspection, agreement with the 2D averages, scoring information, and performance in subsequent 3D refinement.