8. Structure Prediction by Sequence Homology. Searching for Homologues

As we have mentioned previously, in this tutorial we are going to use tools that allow to predict the atomic structure from sequence homology. If you want to predict the initial model with AlphaFold or ModelAngelo, follow instructions in Predicting initial models with AlphaFold and Predicting initial models with ModelAngelo, respectively.

Structure prediction by sequence homology only requires the sequence itself of the specimen that we would like to model, from now ahead the target sequence, and the access to databases to seek structures or templates of homologous molecules. The sequences of homologous molecules show statistically significant similarity because they share a common ancestry. Since the sequence encodes the structural information, from high similar sequences necessarily follows high similar structures. Structures from the nearest homologous molecules will thus be preferred over remote relative ones. Remark that molecules containing several domains usually require independent searching for homologous templates of each domain. A small review about sequence similarity searching can be found in [Pearson, 2013], and in [Kryshtafovych et al., 2018] the assessment of current template-based modeling methods, many of them implemented as fully automated servers. Modeling tools appropriate to search for remote homologous templates, folding recognition and template-free methods (ab initio), as well as de novo modeling tools, which besides sequences use the volume itself, have still to be included in the Scipion framework.

8.1. How to identify templates of the target sequence

Similarity searching programs like BLAST (Fig. 8.1) [Altschul et al., 1997], available in BLAST, use the target sequence (1) to screen the structure-containing database PDB (2). Selecting or excluding a particular organism is an option (3). We usually start our searching selecting the organism in which we are interested or the closest evolutionarily related ones. If no similar sequences are found in these organisms, unrelated organisms may be selected or no one at all. Different searching algorithms are available (4) and one of them has to be selected. After executing BLAST (5) a list of score-ordered templates is retrieved.

Fig. 8.1 Form of the similarity searching program BLAST.

Of course, the closest relatives to human Hgb subunits, structurally characterized, will be their own structures contained in PDB-5NI1. However, in this tutorial we are going to assume that in our example the closest relatives to the human \alpha and \beta subunits are the respective Hgb subunits (identity 49.3% and 45.21%) of the antarctic fish Pagothenia bernacchii [Camardella et al., 1992]. The atomic structure associated to this template has PDB accession code 1PBX. Information about the structure can be checked in 1PBX. In general, it is a good idea to read the information related with the template, do it so and answer the following questions: (Answers in appendix Solutions; Question1)

 -  How was this structure obtained (X-ray diffraction, EM, NMR)?

 -  What resolution does it have?

 -  How many chains does it include?


  .. note:: *ChimeraX* also incorporates the possibility of running the *BLAST* algorithm, although
with lower number of options than those shown in :numref:`model_building_blastp`. Nevertheless, if
you know that there are high similar homologous sequences with
associated structure, you can skip this searching step “outside” *Scipion* and
go to the next step to get directly your *template* and your *target model*.