12. Structure validation and comparison

At the end of the refinement process of metHgb \alpha subunit (a similar one would be required for \beta subunit), we need to assess the geometry of our model regarding the starting volume to detect model controversial elements or model parameters that disagree with the map. Although each refinement program has their own tools to assess the progress of refinement (Coot Validate menu; PHENIX real space refine real space correlations; Refmac R factor and Rms BondLength), in this tutorial section, three assessment tools will be described to obtain comparative validation values after using any protocol in the workflow: Protocols EMRinger (phenix-emringer, Appendix PHENIX EMRinger [Barad et al., 2015]), MolProbity (phenix-molprobity, Appendix PHENIX MolProbity [Davis et al., 2004]), and Validation CryoEM (phenix-validation cryoem, Appendix PHENIX Validation CryoEM [Afonine et al., 2018]). Validation CryoEM protocol will show MolProbity validation values as well as correlation coefficients in real space. Old versions of PHENIX (v. 1.13) do not include this tool. Correlation values in real space will thus be computed if a map is provided in MolProbity protocol. Additionally, we are going to introduce the protocol phenix-superpose pdbs (Appendix PHENIX Superpose PDBs [Zwart et al., 2017]) useful to compare visually the geometry of two atomic structures.
Observe the first validation steps in the modeling Scipion workflow in Fig. 12.1 starting from output models generated by PHENIX real space refine and Refmac.
*Scipion* framework detailing the workflow to validate the model of the human *Hgb* :math:`\alpha` subunit.

Fig. 12.1 Scipion framework detailing the workflow to validate the model of the human Hgb \alpha subunit.

NOTE: Structure validation is a model building step that you have to perform recursively during the refinement process to assess if you are improving your structure or not. Once you finish the refinement process you’ll obtain the final assessment values. These values should be in a certain range if you want to submit the atomic structure to databases. These final validation scores should be computed regarding the density map that you submit as main map, although during the recursive process you might have used the sharpened maps for refinement/validation.

12.1. EMRinger

Specifically designed for cryo-EM data, EMRinger tool assesses the appropriate fitting of a model to a map, validating high-resolution features such as side chain arrangements. The placement of side chains regarding the molecule skeleton depends on the \chi_{1} dihedral angle (a dihedral angle is the angle between two intersecting planes), which is determined by atomic positions of (N, C\alpha, C\beta) and (C\alpha, C\beta, C\gamma) (see Fig. 12.2). The side chain dihedral angles tend to cluster near 180^\circ and \pm60^\circ. The lower deviations regarding these values, the better model, and the EMRinger higher value.

Naming convention in side chains explained in a lysine-tyrosine strand. Note that these two residues are within a protein and thus have no terminal region.

Fig. 12.2 Naming convention in side chains explained in a lysine-tyrosine strand. Note that these two residues are within a protein and thus have no terminal region.

We can start assessing with EMRinger the metHgb \alpha subunit models that we have generated along the modeling workflow. In each case, open the protocol phenix-emringer (Fig. 12.3 (1)), load the extracted map asymmetric unit (initial or saved with Coot) (2) and the atomic structure that you’d like to validate in relation to the map (3), execute the program (4) and analyze results (5). A menu to check results in detail will be opened (bar EMRinger results). Phenix EMRinger plots with density thresholds, with rolling window for each chain, as well as dihedral angles for each residue are shown here. The most relevant results, especially the EMRinger score, will also be written in the protocol SUMMARY (6).

Completing *EMRinger* protocol form.

Fig. 12.3 Completing EMRinger protocol form.

Run EMringer protocol and determine the respective score after running ChimeraX rigid fit, Coot refinement, PHENIX real space refine (form parameters indicated in Fig. 11.9) after Coot, and Refmac refinement with MASK before and after PHENIX real space refine. Considering EMRinger score, does our metHgb \alpha subunit models seem to be OK or, at least, did they improve? (Answers in appendix Solutions; Question8). Try the same validation with \beta subunit models.

12.2. MolProbity

The atomic structure validation web service MolProbity, with better reference data has been implemented in the open-source CCTBX portion of PHENIX [Williams et al., 2018]. This widely used tool assesses model geometry and quality at both global and local levels. Originally designed to evaluate structures coming from X-Ray diffraction and NMR, it does not take into account the quality of the fitting with a 3D density map. The implementation of MolProbity in PHENIX, nevertheless, includes the possibility of adding a volume and assessing the correlation in the real space.
The assessment process that we have carried out with EMRinger can also be done with MolProbity in Scipion. We are going to validate the geometry of metHgb \alpha subunit models that we have generated along the modeling workflow. In each case, open the phenix-molprobity protocol (Fig. 12.4 (1)), load the extracted unit cell volume (initial or generated by Coot) (2) with its resolution (3) only if your PHENIX version is 1.13 and you want to have real space correlation between map and model. For PHENIX versions higher than 1.13 simply load the model atomic structure (4) and execute the protocol (5). With Analyze results (6) menu bars are shown. MolProbity results bar include validation statistics. Protocol SUMMARY emphasizes the most relevant ones (7).
Completing *MolProbity* protocol form.

Fig. 12.4 Completing MolProbity protocol form.

Run MolProbity protocol to obtain its statistics after running ChimeraX rigid fit, Coot refinement, PHENIX real space refine (form parameters indicated in Fig. 11.9) after Coot, and Refmac refinement with MASK before and after PHENIX real space refine.

12.3. Validation CryoEM

PHENIX versions higher than 1.13 combine multiple tools for validating cryo-EM maps and models into the single tool called Validation CryoEM [Afonine et al., 2018]. This tool has been implemented in PHENIX versions higher than 1.13.
To carry out the global validation of maps and models obtained from cryo-EM data, open the protocol phenix-validation cryoem in Scipion (Fig. 12.5 (1)), load the map (initial or generated by Coot) (2) with its resolution (3), load the model atomic structure (4) and execute the protocol (5). Analyze results (6) shows the same menu bars available in results section of PHENIX real space refine protocol. MolProbity results bar include validation statistics. Protocol SUMMARY (7) emphasizes the most relevant ones.
Filling in *PHENIX Validation CryoEM* protocol form.

Fig. 12.5 Filling in PHENIX Validation CryoEM protocol form.

In order to compare validation results of :models obtained along the modeling workflow, fill in the next table (Table 2) including, in addition to MolProbity statistics, EMRinger scores and CC(mask) values obtained before. (Answers in appendix Solutions; Question9). The same table (Table 2) can be completed for metHgb \beta subunit (Appendix Solutions; Question10)
Table2: Validation statistics of human *metHgb* :math:`\alpha` subunit *model*. *RSRAC* stands for *Real Space Refine* after *Coot*. *Rama* stands for *Ramachandran*.
Results compiled in this table indicate that statistics are uncorrelated. From the point of view of correlation in real space, the best model was obtained from PHENIX real space refine after Coot. Considering EMRinger score, the best model derives from the whole workflow Coot-> PHENIX real space refine. With MolProbity Overall score as validation rule, the last step in the workflow could be suppressed because the best value was obtained after Coot -> PHENIX real space refine (last modification of parameters). We’d like to select the best model and continue refining it in order to improve it as much as possible. Assuming that no one model is perfect, how can we select the best one?

12.4. Model Comparison

The question posed in the previous item does not have an easy answer in the real world, in which we do not know the final atomic structure. In this tutorial, nevertheless, we know the atomic structure already published for this cryo-EM map and we may wonder how far we are from it. The question can be answered by comparing a) validation statistics that we have obtained for our models with the statistics computed for the available \alpha subunit in PDB structure 5NI1, and b) the atomic structures themselves by overlapping.

12.4.1. Comparison of validation statistics

Validation statistics of metHgb \alpha subunit of PDB structure 5NI1 should be obtained as first step to compare them with validation statistics of our models. With this aim we are going to follow the workflow remarked in Fig. 12.6:

*Scipion* framework detailing the last part of the validation workflow.

Fig. 12.6 Scipion framework detailing the last part of the validation workflow.

  • Protocol import atomic structure:
    Download from PDB structure 5NI1
  • Protocol chimerax-operate (Appendix CHIMERAX operate):
    Similar to ChimeraX rigid fit, ChimeraX operate protocol allows to perform operations with atomic structures. We are going to use this protocol to save independently in Scipion the metHgb \alpha subunit. Open the protocol (Fig. 12.7 (1)), complete the parameter PDBx/mmCIF including the atomic structure 5NI1 previously imported (2), and execute the protocol (3).
    Filling in **chimerax-operate** protocol form.

    Fig. 12.7 Filling in chimerax-operate protocol form.

    The ChimeraX graphics window will be opened with the structure 5NI1 as model number #2. To save independently the structure of human metHgb \alpha subunit (chain A), write in ChimeraX command line:
    select #2/A
    save /tmp/5ni1_chainA.cif format mmcif models #2 selectedOnly true
    open /tmp/5ni1_chainA.cif
    scipionwrite #3 prefix 5ni1_chainA_
    
    Remark that the model saved in ChimeraX command line includes both the aminoacid chain and the HEME group. In case you are interested in extracting only the aminoacid chain, you can use the protocol atomstructutils-operator, specifically designed to extract/add individual chains from/to an atomic structure (Appendix Atomic Structure Chain Operator). Compare the results of protocols ChimeraX operate and Atomic Structure Chain Operator in Fig. 12.8. The red arrow points at HEME group.
    Comparison of results obtained with the protocols **chimerax-operate** (left) and **atomstructutils-operator** (right).

    Fig. 12.8 Comparison of results obtained with the protocols chimerax-operate (left) and atomstructutils-operator (right).

  • Protocol phenix-dock in map:

    Open PHENIX dock in map protocol and follow the instructions above indicated. The structure saved in ChimeraX operate will replace this time our previous model. Results can be observed in Fig. 12.9.
    Results view of **phenix-dock in map** protocol.

    Fig. 12.9 Results view of phenix-dock in map protocol.

  • Protocol chimerax-rigid fit: Open again ChimeraX rigid fit protocol and, following the already indicated instructions, include this time the atomic structure placed_model.cif generated in the previous step. To fit the metHgb \alpha subunit from 5NI1 structure in the extracted asymmetric unit and save the fitting write in ChimeraX command line:
    fitmap #3 inMap #2
    scipionwrite #3 prefix 5ni1_chainA_fitted_
    
  • Validation protocols phenix-emringer and phenix-validation_cryoem:

    Compute validation statistics with these two protocols for metHgb \alpha subunit from PDB structure 5NI1, write respective values in the previous table (Table 2), and compare them with the statistics of our models.

    Considering results shown in appendix Solutions; Question9) for metHgb \alpha subunit, we can conclude that published structures are not perfect and we are not very far from this published one. In fact, we have overcome every statistic except CC(mask). Nevertheless, the different models generated after Coot refinement can still be improved by iterative refinement processes. Validation statistics thus allow to follow the quality improvement of atomic models.

    Comparison of atomic structures

    PHENIX protocol phenix-superpose pdbs allows to compare two atomic structures by overlapping them. Root mean square deviation (RMSD) between the fixed structure (the published one) and one of our models supports the classification of models according to its similarity to the published model. Open PHENIX superpose pdbs protocol form (Fig. 12.10 (1)), include the published structure of the metHgb \alpha subunit as fixed structure (2), each one of the models generated along the worflow (3), execute the protocol (4) and check results by pressing Analyze results (5). Arrows of Fig. 12.11 remark differing parts between the atomic structure of the metHgb \alpha subunit from PDB structure 5NI1 (green) and our model generated by automatic refinement with PHENIX real space refine protocol (pink). By opening these structures in you can see the differences between them. Finally, complete the Table 2 with the value of RMSD (final) (6) obtained for each model. (Answers in appendix Solutions; Question9).

    Completing **phenix-superpose pdbs** protocol form.

    Fig. 12.10 Completing phenix-superpose pdbs protocol form.

    *Model* generated for *metHgb* :math:`\alpha` subunit superposed to the published :math:`\alpha` chain of *5NI1* structure.

    Fig. 12.11 Model generated for metHgb \alpha subunit superposed to the published \alpha chain of 5NI1 structure.

A model for metHgb \alpha subunit has to be selected at the end of the validation process. According to the statistics of Table 6 (Appendix Solutions; Question9), select the model obtained in modeling workflow showing the smallest RMSD value, high value of EMRinger score, quite high value of CC(mask) and acceptable MolProbity statistics. Follow a similar process to validate and select the model generated for metHgb \beta subunit. Appendix Solutions; Question10 contains a statistics table for metHgb \beta subunit, similar to that obtained for metHgb \alpha subunit.
In the real world the selected models usually are the starting point to improve specific validation parameters by additional refinement. Since the improvement of certain parameters normally implies worsening of other parameters, a final compromise solution has to be taken.