6. Get the initial model with ModelAngelo
In this section of the Modelangelo tutorial we cover steps 1, 2, 3 and 4 of the small workflow shown before (
to get Modelangelo prediction structures of the human TACAN protein (isoform 1).
ModelAngelo is able to predict the atomic structure of a 3D-EM map with or without protein sequences [Jamali et al., 2023]. We are going to predict the tracing of the human TACAN protein in both cases, a) using the protein sequence, and b) in absence of any protein sequence.
6.1. Modelangelo initial model using 3D map and protein sequence as starting inputs
As described in Import Input Data section, we are going to take advantage of Scipion software framework to import the 3D map and the sequence of the human TACAN protein isoform 1 using protocols import volumes and import sequence. Details about the parameters of these two protocols are shown in Appendices Import volumes and Import sequence, respectively.
First open the import volumes protocol (Fig. 6.17), fill in the form selecting EMDBid (1) to import the map from the database, and include the ID number for the TACAN homodimer (2). Finally, execute the protocol.
Then you may visualize the volume with ChimeraX [Goddard et al., 2018] clicking Analyze Results. The 3D map of the TACAN homodimer (Fig. 4.11, A) appears inserted in the lipid nanodisc on the right side of the coordinate axes.
The sequence of the human TACAN protein isoform 1 will be downloaded from UniprotKB. First of all, open the form of import sequence protocol (Fig. 6.18), then complete the form to download TACAN (1) aminoacid sequence (2) with UniprotKB (3) accession code Q9BXJ8. Execute the process and visualize the sequence in a text editor. The sequence will appear in fasta format.
Once we have map and sequence we can run modelangelo - model builder protocol (Fig. 6.19). Details about the parameters of this protocol are shown in Appendix ModelAngelo. Complete the form selecting the GPU that you plan to use (1), the 3D map (2) and the UniprotKB sequence (3) of the TACAN protein. Finally, execute the process.
As you can see (Fig. 6.20), parts of the predicted raw model (D) have been removed in the pruned one (F). These removed portions do not have good matches to the sequence based on a hidden Markov model alignment [Jamali et al., 2023]. You can observe the sequences included both in the raw and the pruned predictions simply by clicking in the upper menu of ChimeraX GUI -> Tools -> Sequence -> Show Sequence Viewer, and a window similar to shown in Fig. 6.21 (A) will display all sequences.
As observed in Fig. 6.21 (B), sequences Ad and Ae, with a higher number of residues (104 and 155, respectively), contribute the most to the final pruned prediction. In this example, 74% of the whole Q9BXJ8 protein sequence has been covered by the pruned prediction. The rest of the map, containing regions with lower resolution, has to be modeled manually.
6.2. Modelangelo initial model using only the 3D map as starting input
We are going to start this example running again the modelangelo - model builder protocol. As we have seen above (Fig. 6.19), this time we have to complete the form selecting the GPU that you plan to use (1) and the 3D map (2), to finally execute the process (workflow step 4; Fig. 5.6).
Although the prediction (Fig. 6.22, A) is aparently similar to the previous one (Fig. 6.20, D), some differences can be observed between the results displayed. First of all, only one prediction has been generated (B, red arrow). This unique prediction is similar to the raw prediction obtained when the sequence is included as input also. Postprocessing or pruning depends on the sequence matching. Then, the pruned structure thus cannot be generated. In addition, the whole structure is constituted by a high number of small fragments that have to be manually assessed. The goodness of this prediction of sequence and structure will depend on the map resolution. The better the resolution, the higher number of residues could be inequivocally assigned to the map density. Since the map EMD-31441 has already been traced (PDB 7F3U), we can compare the published structure with our ModelAngelo prediction in the background of the map density (Fig. 6.23) .
Red arrows in Fig. 6.23 point to three Phenylalanine residues located in the Q9BXJ8 sequence 217-QK F RNQ F LSFSMYQS F VQF-235 that have been correctly assigned. In general, residues with big lateral side chains are better assigned than others and they can be a good starting point to assign the rest of the sequence residues.