Coronavirus Pandemic Phylogeography and genomic analysis of SARS-CoV-2 delta variant in Morocco

Introduction: Since the COVID-19 pandemic began in December 2019, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continuously evolved with many variants of concern emerging across the world. Methodology: In order to monitor the evolution of these variants in Morocco, we analyzed a total of 2130 genomes of the delta variant circulating around the world. We also included 164 Moroccan delta variant sequences in our analysis. Results: Our findings suggest at least four introductions from multiple international sources and a rise of a dominant delta sub-lineage AY.33 in Morocco. Moreover, we report three mutations in the N-terminal domain of the S protein specific to the Moroccan AY.33 isolates, T29A, T250I and T299I. The effect of these mutations on the secondary structure and the dynamic behavior of the S protein N-terminal domain was further determined. Conclusions: We conclude that these mutations might have functional consequences on the S protein of SARS-CoV-2.


Introduction
COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, first appeared in the Wuhan province in China in 2019. Since then, the ongoing pandemic has prompted the international community to take measures to stop the spreading of the virus. The symptoms of COVID-19 are similar to other viral upper respiratory illnesses and include fever, cough, fatigue and dyspnea [1]. SARS-CoV-2 genomic sequence analysis can be used to determine the origin and transmission patterns of the virus after it enters a new population, and is proving to be a key tool in developing pandemic management decisions.
Morocco reported its first COVID-19 case on the 2nd of March 2020. New positive cases and deaths from COVID-19 were reported a few days later. The number of infected cases had increased to approximately 951,482 and 14,796 deaths (http://www.covidmaroc.ma) by mid-December 2021. A Moroccan genomic monitoring network composed of several public and private laboratories was constructed in response to the COVID-19 pandemic to track the changes in the SARS-CoV-2 mutation profile in order to identify keystone events in the evolution of the virus and assess the outcomes of COVID-19 prevention measures taken by the Moroccan government.
Since the identification of the first complete genome of the virus, whole genomes of multiple strains have been deposited in public databases such as GISAID [2] and GENBANK [3]. Many tools have been developed to monitor the evolution of SARS-CoV-2. The most commonly used tool is the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) software [4] that assigns lineage to a given SARS-CoV-2 sequence consistently with the Pango dynamic lineage nomenclature scheme. Nextclade is an interesting project developed under the Nextstrain SARS-CoV-2 resources [5]. It is an opensource project for viral genome alignment, mutation calling, clade assignment, quality checks, and phylogenetic placement.
Since the beginning of the COVID-19 pandemic, genomic analysis of Moroccan samples has revealed the existence of variants of concern (VOCs). The emergence of these variants has been accompanied by waves of infection; the first wave was associated with the dominance of the alpha variant (B.1.1.7) in early February 2021 [6], and the second wave was associated with the appearance of the delta variant (B.1.617.2) in June 2021.
These variants have gained evolutionary advantages which include high virulence and increased infectivity due to multiple spike glycoprotein mutations, in particular, the D614G mutation that has shown the most efficient interaction with the Angiotensin-converting enzyme 2 (ACE2) receptor. This mutation changes the conformation of the receptor-binding domain, cleavage patterns of Sglycoprotein and replication fitness of SARS-CoV-2 variants [7]. Following the emergence of D614G, another mutation was identified in the receptor-binding motif (RBM), N439K, that enhances the ACE2-binding affinity and reduces the neutralizing activity of some monoclonal and polyclonal antibodies present in sera of individuals who have recovered from infection [8]. Another RBM mutation, Y453F, was associated with increased ACE2-binding affinity received considerable attention following its identification in sequences associated with infections in humans and mink in Denmark [9]. This pattern suggests that the virus accumulates mutations in order to reduce or avoid recognition by antibodies while maintaining or increasing binding to ACE2 [10].
The mutation profile of the delta variant provided in the outbreak.info project describes the mutations in at least 75% of the delta sequences. Overall, delta variant contains multiple mutations including 7 in the spike region, 10 in ORF1ab, 3 in the N, and 2 in ORF8 (www.outbreak.info) [11]. The spike gene mutations in B.1.617.2 variant are T19R, L452R, T478K, D614G, P681R, and D960N, with deletions at positions 157 and 158 [12]. Those mutations are suspected to increase the transmissibility which is in accordance with the high reproductive value (R0 = 5.08) of the delta variant [13].
In this study, we analyzed all Moroccan delta genomes available in GISAID and compared them to delta genomes from all over the world in order to identify the origin of the delta variant in Morocco and track its evolution.

Genomic analysis
Full genome nucleotide sequences of SARS-CoV-2 delta variant deposited between July 2021 and November 2021, were retrieved from GISAID repository and grouped into two sets. The first set contained all Moroccan sequences (164 sequences in total out of which 48 were sequenced in the Laboratory of Biotechnology). The other set contained 1966 complete delta sequences deposited by multiple countries: Brazil, USA, Chile, Denmark, Australia, Italy, Canada, France, South Africa, Ghana, Germany, India, Indonesia, Japan, South Korea, Thailand, Wales, England and Scotland. Our final dataset contained 2130 delta genomes.

Variant calling
Sequences of SARS-CoV-2 delta variant were retrieved from GISAID repository and mapped to the reference sequence MN908947.3 using Minimap [14]. The BAM files were sorted by SAMtools [15], and then used to call the genetic variants in variant call format (VCF) by SAMtools mpileup and bcftools. The files were then annotated and their impact was predicted using SnpEff [16].

Global alignment and phylogenetic analysis
In order to build a global phylogeny of delta genomes, a total of 2130 sequences were included in the final alignment. MAFFT (v7.487) [17] was used to align the sequences with MN908947.3 as the reference sequence. The maximum likelihood tree was constructed using IQTREE [18]; based on the performance of IQTREE model finder, GTR+F+I was selected as the best substitution model for our dataset. The phylogenetic results were graphically visualized using Figtree (v1.4.4) (http://tree.bio.ed.ac.uk/software/figtree/).

Ancestral reconstruction
Ancestral reconstruction was performed using PastML [19]. The program takes as input a rooted tree and/or dated tree (in Newick format) and an annotation table containing the state of each tree tip. At first, we annotated our dataset with the date of collection and the country from where samples were collected. Then, we used the Least Square Dating (LSD) for ancestral events dating and for rooting the resulting phylogenetic tree based on dates [20]. Maximum likelihood marginal Posterior Probabilities Approximation (MPPA) was used as the prediction method with Felsenstein 1981 (F81) as a model. Default values were used for the reaming parameters. The PastML generated full tree was visualized and edited using Itol [21].

Mutation effect prediction
We used Chou & Fasman Secondary Structure Prediction Server (CFSSP) to predict the secondary structure of SARS-CoV-2 spike N-terminal domain (NTD) [22]. To investigate the effect of the mutations on this domain's structural conformation, its molecular stability and flexibility, we used DynaMut software (University of Melbourne, Australia) [23]. We first downloaded the crystallographic structure of neutralizing antibody 2-51 in complex with SARS-CoV-2 spike N-terminal domain (NTD) from RCSB (PDB ID: 7L2C). Next, we removed the antibody from the structure and prepared a mutant version of the NTD structure using PyMOL. Finally, the SARS-CoV-2 spike N-terminal domain structure was uploaded onto DynaMut software and the effect of mutations in various protein structure stability parameters was determined.

Mutation profile of delta sequences in Morocco
Analysis of the 164 Moroccan sequences of SARS-CoV-2 delta variant revealed a total of 6403 mutations of which 872 (13%) were synonymous mutations. We were only interested in the 5531 (87%) nonsynonymous mutations; we extracted all distinct nonsynonymous mutations found in at least one sequence and obtained 418 mutations. Overall, 72 (17%) substitution mutations were found in the spike gene (61 missense, 8 frameshifts, and 3 disruptive in-frame deletions), 32 (7%) missense mutations were observed in the nucleocapsid (N) gene, 6 in the envelope (E) gene and 20 in the NSP12.

Dominance of AY.33 delta sublineage in Morocco
The variants from the Moroccan sequences were clustered using the Pangolin web service in 14 SARS-CoV-2 delta sublineages. AY.33 was the dominant sublineage (74 sequences) followed by AY.39 ( Figure 1).
To understand the abundance of the delta variant sublineages in Morocco, we plotted their cumulative count against sample collection date (Supplementary Figure 2). We observed that the variant B.1.617.2 first appeared in Morocco in June 2021, delta sublineages quickly appeared in the following months. After July 2021, the majority of samples were delta sublineages, specifically AY.33, which was the dominant sublineage by the end of July 2021.
We observed a total of 418 unique mutation events from the 164 SARS-CoV-2 isolates in Morocco and we plotted the occurrence of nucleotide mutations (frequency > 6%) per lineage ( Figure 1B). The AY.33 sublineage had the highest number of mutations in the S gene (13 mutations) whereas the average number of mutations in the other sublineages was 10. We also observed 6 distinct mutations in this sublineage out of which three were missense mutations; T29A, T250I and T299I were found in the spike gene and were present in nearly 45% of the sequences. Two synonymous mutations were found in ORF1ab, P748P belonging to AY.33 and AY.73 and L1356L belonging to AY.33 and AY.39.

Moroccan specific AY.33 mutations, T29A, T250I and T299I, cause alteration in secondary structure of SARS-CoV-2 spike N-terminal domain
The analysis of the AY.33 Moroccan sequences revealed three unique mutations in SARS-CoV-2 spike N-terminal domain, T29A, T250I and T299I. We studied their effect on the secondary structure of the protein to investigate the impact of these mutations. Our data revealed that the mutation T29A caused a change in the secondary structure as shown in Figure 2. The threonine amino acid at position 29 was substituted by alanine. Our analysis showed that there was a loss of sheet at positions 28, 29 and 30, and a replacement of sheet by coils at position 28 and 29 due to the mutation T29A (Figure 2A, compare i and ii). Similarly, in T250I threonine was substituted by isoleucine. Due to this substitution, there was a loss of turn at position 251 and its replacement by a coil ( Figure 2B, compare i and ii). Further, the substitution of threonine by isoleucine at position 299 resulted in considerable changes in the secondary structure ( Figure 2C, compare i and ii). The detailed analysis revealed that there was a replacement of coils by helix at positions 294, 295, 296, 297 and 299, while there was a replacement of turn by helix at position 298. Altogether, the substitution of threonine to alanine at position 29 and to isoleucine at positions 250 and 299 in the mutant spike N-terminal domain resulted in change in the secondary structure of the protein which may have had functional consequences.

T29A alters the stability dynamics of tertiary structure of SARS-CoV-2 spike N-terminal domain
To understand the impact of mutations on the tertiary structure of SARS-CoV-2 spike N-terminal domain, we used DynaMut to get information about the alteration in protein stability and flexibility due to each mutation in the native protein structure. Our data revealed that there was a change in vibrational entropy energy (ΔΔSVibENCoM) and free energy differences (ΔΔG) between the wild type and the mutant isolate (Table 1). Vibrational entropy represents an average of the configurational entropies of the protein within a single minimum of the energy landscape [24]. The negative ΔΔSVibENCoM of mutant spike N-terminal domain represents the rigidification of the protein structure and positive ΔΔSVibENCoM represents gain in flexibility. Here, our data showed that the mutations T29A, T250I and T299I lead to an increase of molecule flexibility. Further, we also calculated the free energy differences, ΔΔG, between wild-type and mutant. The ΔΔG caused by mutation had been correlated with the structural changes, such as changes in packing density, cavity volume and accessible surface area and therefore, it measures the effect of the mutation on protein stability [25]. In general, a ΔΔG below zero means that the mutation causes destabilization and above zero represents protein stabilization. Here, our analysis showed positive ΔΔG for T250I and T299I suggesting that these mutations were stabilizing the mutant structure as compared to the wild-type; however, we observed negative ΔΔG for T29A mutation indicating its destabilizing behavior (Supplementary Figure 3). The ΔΔG for T299I mutant was 0.150 kcal/mol which was significantly higher than others. Accordingly, we closely analyzed the changes in the intramolecular interactions due to these three mutations in SARS-CoV-2 spike N-terminal domain. The substitution of threonine with mutant residues alters the side chain leading to change of intramolecular bonds in the pocket of SARS-CoV-2 spike N-terminal domain ( Figure 3). Therefore, it can be consecutively stated that T29A is affecting the stability and intramolecular interactions in the protein which may have functional consequences.

Phylogenetic analysis
In order to identify the possible sources of the delta variant's introduction to Morocco, we searched for Moroccan sequences that were nested within sequences from various countries in the global phylogenetic tree containing 2131 leaves, including the reference genome (Supplementary Figure 4).
We discovered several clades where a Moroccan sequence was near other delta genomes circulating in the world (Figure 4). We propose three distinct possible groups for the phylogenetic position of Moroccan samples in the tree: Group 1 (Moroccan clades): groups made up mostly of Moroccan sequences.
Group 2 (Heterogeneous clades): clades containing a set of Moroccan sequences and a set of sequences from other countries.
Group 3 (Unique sequence): single Moroccan sequence discovered inside a clade of sequences from other nations, but is not related to any other Moroccan sequences.
After the clades were constructed, 128 out of 164 Moroccan genomes were grouped into clades.    Thereafter, we referred to the genetic profile of each clade as summarized in Supplementary Table 3, in order to verify the existence of unique mutations.

Ancestral reconstruction of the delta variant sequences
Summarized ancestral scenarios for location reconstruction on the delta tree is shown in Figure 5A.
In order to get more information about all the events, we used the complete visualization with 2976 nodes and 71 unresolved nodes. All the probabilities of the clades are provided in Supplementary Table 4. The results indicated that approximately all nodes showed a parent node originated from India (probability = 0.86). From there, the delta variant was introduced to Morocco in multiple ways as it is shown by the 19 Moroccan nodes (0.68 < probability < 1) ( Figure 5B to 5J). Nine separate nodes ( Figure 5B: node I to IX) diverge directly from the Indian parent, 5 nodes ( Figure 5: X, XI, XIII, XIV and XVI) diverge from an ancestor that itself derives from India. The remaining 5 nodes are called "unresolved nodes"; these nodes have multiple colors indicating several corresponding states having similar marginal probabilities.

Possible delta introduction sources
In order to extract the possible sources of delta variant's introduction, we considered the groups 2 and 3, where the majority of the sequences composing the clades belonged to a country other than Morocco.
India: As represented in the ancestral, India was the main ancestor for all the nodes where B.1.617.2 variant was first detected. As for Moroccan nodes ( Figure 5B: I to IX), nine nodes diverged directly from India with approximately 119 sequences. The mutation profile analysis also confirmed this as we observed the same major mutations in those nodes.
France: The node in Figure 5E indicated an introduction from France to Morocco. This introduction was also confirmed in the clade H ( Figure 4) with a shared mutation (Y6160Y/ORF1ab). Japan: Two introductions were observed in the ancestral reconstruction from India to Japan and from Japan to Morocco ( Figure 5D). It concerned the following mutations: V4887V/ORF1ab, I850L/S and K16T/ORF3a that were shared between 2 sequences from Morocco and others from Japan as described in clade D (Figure 4).
Germany: An introduction is shown in Figure 5H, and confirmed in clade K ( Figure 4) via a single mutation (T6891T/ORF1ab).

Possible delta transmission from Morocco to other countries
We suggest that there is a possibility of transmission of the delta variant from Morocco, when a foreign sequence is located in a clade composed of mostly Moroccan sequences.
Chile: Transmission confirmed in clade G ( Figure  4); a single mutation in the spike gene leading to T29A was shared between a sequence from Chile and 72 sequences from Morocco. The transmission was also confirmed in the ancestral reconstruction ( Figure 5B: node IX).
France: Multiple transmissions from Morocco to France were found. This concerns 5 mutations found in 3 different nodes from the clades in Figure 4: Q677H/S and E239E/ORF3a shared exclusively between a sequence from France and 5 Moroccan sequences from clade B which was confirmed in the ancestral node ( Figure 5I), P207L/N and P46S/N are found in one sequence from France and 6 from Morocco belonging to clade L which is confirmed by the ancestral node VI ( Figure 5B) and the last mutation generating the T29A is found in 72 Moroccan sequences and in 1 sequence from France as represented in the clade G (Node IX in Figure 5B).
Germany: Transmission of mutations from Morocco to Germany was observed in 3 nodes. The first transmission concerns 2 mutations (Q677H/S and E239E/ORF3a) found in 2 German and 5 Moroccan sequences as illustrated in clade B ( Figure 4) and in the ancestral reconstruction ( Figure 5I). The second one was located in clade M ( Figure 4) where 2 mutations (V3986V/ORF1ab and E102V/ORF3a) were found only in 5 sequences from Morocco and one from Germany (node V in Figure 5B). The last mutation concerns the T26A/S that was dominant in Morocco and spread to Germany as confirmed in clade G ( Figure  4) and in the ancestral (node IX in Figure 5B).
England: The mutation analysis of clade B sequences revealed specific mutations shared between 5 Moroccan sequences and one sequence from England, Q677H/S and E239E/ORF3a. Those two mutations were confirmed by the ancestral tree as transferred from Morocco to England ( Figure 5I).
Italy: The ancestral construction revealed a transmission from Morocco to a single sequence from Italy (node V in Figure 5B), where the sequence has accumulated 2 mutations, V3986V/ORF1ab and E102V/ORF3a as illustrated in clade M (Figure 4).
Denmark: We noticed one single transmission from Morocco (node IX in Figure 5B), which was the case of the mutation T29A in the spike gene found in clade G ( Figure 4) and in the majority of Moroccan clades.

Discussion
The epidemiologic curve of COVID-19 in Morocco showed two distinct waves; the first wave caused by the alpha variant extended from August 2020 to January 2021, and the delta variant wave extended from July 2021 to the end of October 2021. In the current study, we analyzed delta Moroccan genome sequences and compared them to delta genomes from all over the world in order to monitor the evolution of this variant in Morocco.
Considering only the mutations present in 75% of the Moroccan sequences (25 mutations), the major delta variant mutations found in Morocco were 84% similar to the reference profile. Six mutations were not detected in our analysis (4 in ORF1, E156G in the S and S84L in ORF8). This indicated a slight diversity of the delta variant in Morocco. This result has been proven in another SARS-CoV-2 genetic diversity study from Morocco [26]. A recent study reported that Morocco is among the North African countries with a higher rate of positivity and significant genetic diversity (46 Pango lineages identified) [27].
Each country is characterized by a specific delta mutation profile [12]. In the case of Morocco, we found that in addition to the major mutations found in other countries, some specific mutations, such as the I1128T mutation in the ORF1ab gene, was found exclusively in a set of Moroccan sequences. Another mutation, M920I, at position 3025 of the ORF1ab gene was found only in five Moroccan sequences.
This study revealed the dominance of the AY.33 sublineage in Morocco; this sublineage was first sequenced in Morocco in June 2021 (EPI_ISL_3253426) [2]. Global phylogenetic analysis of delta variant grouped Moroccan sequences belonging to this sublineage in one clade (Clade G in Figure 4). Thereafter, the results of mutation analysis deduced the existence of specific spike N-terminal domain (NTD) mutations in this clade; T29A, T250I, and T299I, and their effect on the protein structure and dynamics was subsequently analyzed.
The NTD has been reported to interact with the tyrosine-protein kinase UFO (AXL) host receptor facilitating the virus's entrance into human cells [28]. Since the antibody-mediated protection depends on the target antigen structure, any mutation causing the target antigen's productive conformational shift may reduce its binding effect and degrade its protective function [29,30]. Our data demonstrate that T29A, T250I and T299I mutations lead to significant changes in the protein secondary structure.
Several mutations of the SARS-CoV-2 spike protein have been found and their effects are being explored in immune system evasion and increased transmission [31]. The mutations in delta variants distribute in a similar way to those of other VOCs, clustering in the RBD and the NTD of the spike protein targeted by most neutralizing antibodies [32-34]. However, some of these monoclonal antibodies have shown an impairment in binding to the B.1.617.2 spike due to diverse mutations in the two aforementioned domains, suggesting that the delta variant partially but significantly escape neutralizing antibodies [35].
In this study, we proposed the different possibilities of introduction and transmission of the delta variant and its sublineages between Morocco and other countries based on the phylogenetic groups, genetic profiles and ancestral reconstruction results. Since the beginning of the SARS-CoV-2 pandemic, the Moroccan government has taken quick decisions to prevent the spread of the virus. These measures included suspension of flights with several countries, especially when a new variant is detected somewhere. This decision was not only aimed at reducing the spread of the virus, but also at preventing the introduction of new variants in the country.

Conclusions
This study revealed the dominance of AY.33 sublineage, and the existence of three specific Moroccan mutations in the spike N-terminal domain protein in this lineage. We have also demonstrated by phylogenetic and network analysis the possibility that the delta variant was introduced into Morocco from different countries, including India, France, Japan, and Germany. Moreover, we suggest that there is a possible transmission of the delta variant from Morocco to Chile, France, Germany, England and Italy.