.. _`seq:structurevalidation`: Structure validation and comparison =================================== | At the end of the refinement process of *metHgb* :math:`\alpha` subunit (a similar one would be required for :math:`\beta` subunit), we need to assess the geometry of our *model* regarding the starting volume to detect *model* controversial elements or *model* parameters that disagree with the map. Although each refinement program has their own tools to assess the progress of refinement (*Coot Validate* menu; *PHENIX real space refine* real space correlations; *Refmac R factor* and *Rms BondLength*), in this tutorial section, three assessment tools will be described to obtain comparative validation values after using any protocol in the workflow: Protocols *EMRinger* (**phenix-emringer**, Appendix :ref:`PHENIX EMRinger ` :cite:p:`barad2015`), *MolProbity* (**phenix-molprobity**, Appendix :ref:`PHENIX MolProbity ` :cite:p:`davis2004`), and *Validation CryoEM* (**phenix-validation cryoem**, Appendix :ref:`PHENIX Validation CryoEM ` :cite:p:`afonine2018b`). *Validation CryoEM* protocol will show *MolProbity* validation values as well as correlation coefficients in real space. Old versions of *PHENIX* (v. 1.13) do not include this tool. Correlation values in real space will thus be computed if a map is provided in *MolProbity* protocol. Additionally, we are going to introduce the protocol **phenix-superpose pdbs** (Appendix :ref:`PHENIX Superpose PDBs ` :cite:p:`zwartUrl`) useful to compare visually the geometry of two atomic structures. | Observe the first validation steps in the modeling *Scipion* workflow in :numref:`model_building_scipion_workflow_validation` starting from output *models* generated by *PHENIX real space refine* and *Refmac*. .. figure:: Images/Fig69.svg :alt: *Scipion* framework detailing the workflow to validate the model of the human *Hgb* :math:`\alpha` subunit. :name: model_building_scipion_workflow_validation :align: center :width: 100.0% *Scipion* framework detailing the workflow to validate the model of the human *Hgb* :math:`\alpha` subunit. .. note:: Structure validation is a model building step that you have to perform recursively during the refinement process to assess if you are improving your structure or not. Once you finish the refinement process you’ll obtain the final assessment values. These values should be in a certain range if you want to submit the atomic structure to databases. These final validation scores should be computed regarding the density map that you submit as main map, although during the recursive process you might have used the sharpened maps for refinement/validation. .. _`requestion4`: *EMRinger* ---------- Specifically designed for cryo-EM data, *EMRinger* tool assesses the appropriate fitting of a model to a map, validating high-resolution features such as side chain arrangements. The placement of side chains regarding the molecule skeleton depends on the :math:`\chi_{1}` dihedral angle (a dihedral angle is the angle between two intersecting planes), which is determined by atomic positions of (*N*, *C*\ :math:`\alpha`, *C*\ :math:`\beta`) and (*C*\ :math:`\alpha`, *C*\ :math:`\beta`, *C*\ :math:`\gamma`) (see :numref:`model_building_emringer_chi1`). The side chain dihedral angles tend to cluster near :math:`180^\circ` and :math:`\pm60^\circ`. The lower deviations regarding these values, the better *model*, and the *EMRinger* higher value. .. figure:: Images/sidechains.svg :alt: Naming convention in side chains explained in a lysine-tyrosine strand. Note that these two residues are within a protein and thus have no terminal region. :name: model_building_emringer_chi1 :align: center :width: 65.0% Naming convention in side chains explained in a lysine-tyrosine strand. Note that these two residues are within a protein and thus have no terminal region. We can start assessing with *EMRinger* the *metHgb* :math:`\alpha` subunit *models* that we have generated along the modeling workflow. In each case, open the protocol **phenix-emringer** (:numref:`model_building_emringer_protocol` (1)), load the extracted map asymmetric unit (initial or saved with *Coot*) (2) and the atomic structure that you’d like to validate in relation to the map (3), execute the program (4) and analyze results (5). A menu to check results in detail will be opened (bar *EMRinger results*). *Phenix EMRinger* plots with density thresholds, with rolling window for each chain, as well as dihedral angles for each residue are shown here. The most relevant results, especially the *EMRinger* score, will also be written in the protocol *SUMMARY* (6). .. figure:: Images/Fig34.svg :alt: Completing *EMRinger* protocol form. :name: model_building_emringer_protocol :align: center :width: 85.0% Completing *EMRinger* protocol form. | Run *EMringer* protocol and determine the respective score after running *ChimeraX rigid fit*, *Coot* refinement, *PHENIX real space refine* (form parameters indicated in :numref:`model_building_phenix_real_space_refine_protocol`) after *Coot*, and *Refmac* refinement with MASK before and after *PHENIX real space refine*. Considering *EMRinger score*, does our *metHgb* :math:`\alpha` subunit *models* seem to be OK or, at least, did they improve? (Answers in appendix :ref:`Solutions `; :ref:`Question8 `). Try the same validation with :math:`\beta` subunit *models*. *MolProbity* ------------ | The atomic structure validation web service *MolProbity*, with better reference data has been implemented in the open-source CCTBX portion of PHENIX :cite:p:`williams2018`. This widely used tool assesses *model* geometry and quality at both global and local levels. Originally designed to evaluate structures coming from X-Ray diffraction and NMR, it does not take into account the quality of the fitting with a 3D density map. The implementation of *MolProbity* in *PHENIX*, nevertheless, includes the possibility of adding a volume and assessing the correlation in the real space. | The assessment process that we have carried out with *EMRinger* can also be done with *MolProbity* in *Scipion*. We are going to validate the geometry of *metHgb* :math:`\alpha` subunit *models* that we have generated along the modeling workflow. In each case, open the **phenix-molprobity** protocol (:numref:`model_building_molprobity_protocol` (1)), load the extracted unit cell volume (initial or generated by *Coot*) (2) with its resolution (3) only if your *PHENIX* version is 1.13 and you want to have real space correlation between map and *model*. For *PHENIX* versions higher than 1.13 simply load the *model* atomic structure (4) and execute the protocol (5). With *Analyze results* (6) menu bars are shown. *MolProbity* results bar include validation statistics. Protocol *SUMMARY* emphasizes the most relevant ones (7). .. figure:: Images/Fig35.svg :alt: Completing *MolProbity* protocol form. :name: model_building_molprobity_protocol :align: center :width: 85.0% Completing *MolProbity* protocol form. Run *MolProbity* protocol to obtain its statistics after running *ChimeraX rigid fit*, *Coot refinement*, *PHENIX real space refine* (form parameters indicated in :numref:`model_building_phenix_real_space_refine_protocol`) after *Coot*, and *Refmac* refinement with MASK before and after *PHENIX real space refine*. .. _`requestion5`: *Validation CryoEM* ------------------- | *PHENIX* versions higher than 1.13 combine multiple tools for validating cryo-EM maps and models into the single tool called *Validation CryoEM* :cite:p:`afonine2018b`. This tool has been implemented in *PHENIX* versions higher than 1.13. | To carry out the global validation of maps and models obtained from cryo-EM data, open the protocol **phenix-validation cryoem** in *Scipion* (:numref:`model_building_validationCryoEM_protocol` (1)), load the *map* (initial or generated by *Coot*) (2) with its resolution (3), load the *model* atomic structure (4) and execute the protocol (5). *Analyze results* (6) shows the same menu bars available in results section of *PHENIX real space refine* protocol. *MolProbity* results bar include validation statistics. Protocol *SUMMARY* (7) emphasizes the most relevant ones. .. figure:: Images/Fig60.svg :alt: Filling in *PHENIX Validation CryoEM* protocol form. :name: model_building_validationCryoEM_protocol :align: center :width: 90.0% Filling in *PHENIX Validation CryoEM* protocol form. | In order to compare validation results of :*models* obtained along the modeling workflow, fill in the next table (*Table 2*) including, in addition to *MolProbity* statistics, *EMRinger* scores and *CC(mask)* values obtained before. (Answers in appendix :ref:`Solutions `; :ref:`Question9 `). The same table (*Table 2*) can be completed for *metHgb* :math:`\beta` subunit (Appendix :ref:`Solutions `; :ref:`Question10 `) .. figure:: Images/Table2.svg :alt: Table2: Validation statistics of human *metHgb* :math:`\alpha` subunit *model*. *RSRAC* stands for *Real Space Refine* after *Coot*. *Rama* stands for *Ramachandran*. :name: model_building_Table2 :align: center :width: 100.0% | Results compiled in this table indicate that statistics are uncorrelated. From the point of view of correlation in real space, the best *model* was obtained from *PHENIX real space refine* after *Coot*. Considering *EMRinger score*, the best *model* derives from the whole workflow *Coot-> PHENIX real space refine*. With *MolProbity Overall score* as validation rule, the last step in the workflow could be suppressed because the best value was obtained after *Coot -> PHENIX real space refine* (last modification of parameters). We’d like to select the best *model* and continue refining it in order to improve it as much as possible. Assuming that no one *model* is perfect, how can we select the best one? *Model* Comparison ------------------------ The question posed in the previous item does not have an easy answer in the real world, in which we do not know the final atomic structure. In this tutorial, nevertheless, we know the atomic structure already published for this cryo-EM map and we may wonder how far we are from it. The question can be answered by comparing a) validation statistics that we have obtained for our *models* with the statistics computed for the available :math:`\alpha` subunit in *PDB* structure *5NI1*, and b) the atomic structures themselves by overlapping. Comparison of validation statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Validation statistics of *metHgb* :math:`\alpha` subunit of *PDB* structure *5NI1* should be obtained as first step to compare them with validation statistics of our *models*. With this aim we are going to follow the workflow remarked in :numref:`model_building_scipion_workflow_validation_2`: .. figure:: Images/Fig72.svg :alt: *Scipion* framework detailing the last part of the validation workflow. :name: model_building_scipion_workflow_validation_2 :align: center :width: 100.0% *Scipion* framework detailing the last part of the validation workflow. - | Protocol **import atomic structure**: | Download from *PDB* structure *5NI1* - | Protocol **chimerax-operate** (Appendix :ref:`CHIMERAX operate `): | Similar to *ChimeraX rigid fit*, *ChimeraX operate* protocol allows to perform operations with atomic structures. We are going to use this protocol to save independently in *Scipion* the *metHgb* :math:`\alpha` subunit. Open the protocol (:numref:`model_building_chimera_operate_protocol` (1)), complete the parameter *PDBx/mmCIF* including the atomic structure *5NI1* previously imported (2), and execute the protocol (3). .. figure:: Images/Fig36.svg :alt: Filling in **chimerax-operate** protocol form. :name: model_building_chimera_operate_protocol :align: center :width: 90.0% Filling in **chimerax-operate** protocol form. | The *ChimeraX* graphics window will be opened with the structure *5NI1* as model number *#2*. To save independently the structure of human *metHgb* :math:`\alpha` subunit (chain A), write in *ChimeraX* command line: :: select #2/A save /tmp/5ni1_chainA.cif format mmcif models #2 selectedOnly true open /tmp/5ni1_chainA.cif scipionwrite #3 prefix 5ni1_chainA_ | Remark that the model saved in *ChimeraX* command line includes both the aminoacid chain and the *HEME* group. In case you are interested in extracting only the aminoacid chain, you can use the protocol **atomstructutils-operator**, specifically designed to extract/add individual chains from/to an atomic structure (Appendix :ref:`Atomic Structure Chain Operator `). Compare the results of protocols *ChimeraX operate* and Atomic Structure Chain Operator in :numref:`model_building_chimera_operate_protocol_2`. The red arrow points at *HEME* group. .. figure:: Images/Fig70.svg :alt: Comparison of results obtained with the protocols **chimerax-operate** (left) and **atomstructutils-operator** (right). :name: model_building_chimera_operate_protocol_2 :align: center :width: 75.0% Comparison of results obtained with the protocols **chimerax-operate** (left) and **atomstructutils-operator** (right). - Protocol **phenix-dock in map**: | Open *PHENIX dock in map* protocol and follow the instructions above indicated. The structure saved in *ChimeraX operate* will replace this time our previous *model*. Results can be observed in :numref:`model_building_chimera_operate_protocol_3`. .. figure:: Images/Fig71.svg :alt: Results view of **phenix-dock in map** protocol. :name: model_building_chimera_operate_protocol_3 :align: center :width: 70.0% Results view of **phenix-dock in map** protocol. - | Protocol **chimerax-rigid fit**: Open again *ChimeraX rigid fit* protocol and, following the already indicated instructions, include this time the atomic structure *placed_model.cif* generated in the previous step. To fit the *metHgb* :math:`\alpha` subunit from 5NI1 structure in the extracted asymmetric unit and save the fitting write in *ChimeraX* command line: :: fitmap #3 inMap #2 scipionwrite #3 prefix 5ni1_chainA_fitted_ - Validation protocols **phenix-emringer** and **phenix-validation_cryoem**: Compute validation statistics with these two protocols for *metHgb* :math:`\alpha` subunit from *PDB* structure *5NI1*, write respective values in the previous table (Table 2), and compare them with the statistics of our *models*. | Considering results shown in appendix :ref:`Solutions `; :ref:`Question9 `) for *metHgb* :math:`\alpha` subunit, we can conclude that published structures are not perfect and we are not very far from this published one. In fact, we have overcome every statistic except *CC(mask)*. Nevertheless, the different *models* generated after *Coot* refinement can still be improved by iterative refinement processes. Validation statistics thus allow to follow the quality improvement of atomic models. .. rubric:: Comparison of atomic structures :name: comparison-of-atomic-structures :class: unnumbered *PHENIX* protocol **phenix-superpose pdbs** allows to compare two atomic structures by overlapping them. Root mean square deviation (RMSD) between the fixed structure (the published one) and one of our *models* supports the classification of *models* according to its similarity to the published model. Open *PHENIX superpose pdbs* protocol form (:numref:`model_building_superpose_pdbs_protocol` (1)), include the published structure of the *metHgb* :math:`\alpha` subunit as fixed structure (2), each one of the *models* generated along the worflow (3), execute the protocol (4) and check results by pressing *Analyze results* (5). Arrows of :numref:`model_building_superpose_pdbs_chimera` remark differing parts between the atomic structure of the *metHgb* :math:`\alpha` subunit from *PDB* structure *5NI1* (green) and our *model* generated by automatic refinement with *PHENIX real space refine* protocol (pink). By opening these structures in you can see the differences between them. Finally, complete the Table 2 with the value of *RMSD (final)* (6) obtained for each *model*. (Answers in appendix :ref:`Solutions `; :ref:`Question9 `). .. figure:: Images/Fig37.svg :alt: Completing **phenix-superpose pdbs** protocol form. :name: model_building_superpose_pdbs_protocol :align: center :width: 90.0% Completing **phenix-superpose pdbs** protocol form. .. figure:: Images/Fig38.svg :alt: *Model* generated for *metHgb* :math:`\alpha` subunit superposed to the published :math:`\alpha` chain of *5NI1* structure. :name: model_building_superpose_pdbs_chimera :align: center :width: 50.0% *Model* generated for *metHgb* :math:`\alpha` subunit superposed to the published :math:`\alpha` chain of *5NI1* structure. | A *model* for *metHgb* :math:`\alpha` subunit has to be selected at the end of the validation process. According to the statistics of Table 6 (Appendix :ref:`Solutions `; :ref:`Question9 `), select the *model* obtained in modeling workflow showing the smallest RMSD value, high value of *EMRinger score*, quite high value of *CC(mask)* and acceptable *MolProbity* statistics. Follow a similar process to validate and select the *model* generated for *metHgb* :math:`\beta` subunit. Appendix :ref:`Solutions `; :ref:`Question10 ` contains a statistics table for *metHgb* :math:`\beta` subunit, similar to that obtained for *metHgb* :math:`\alpha` subunit. | In the real world the selected *models* usually are the starting point to improve specific validation parameters by additional refinement. Since the improvement of certain parameters normally implies worsening of other parameters, a final compromise solution has to be taken.