The Bayesian reconstruction and the evolutionary history of Salivirus type 1 and type 2: the worldwide spreading

Introduction: Salivirus (SalV) represents an emerging problem in public health especially during the recent years. In this study, the Bayesian evolutionary history and the spread of the virus through the different countries have been reported. Methodology: a database of 81 sequences of SalV structural VP1 fragment were downloaded from GenBank, aligned and manually edited by Bioedit Software. ModelTest v. 3.7 software was used to estimate the simplest evolutionary model fitting the sequence dataset. A MaximumLikelihood tree has been generated using MEGA-X to test the “clockliness” signal using TempEst 1.5.1. The Bayesian phylogenetic tree was built by BEAST. Homology modelling was performed by SWISS-Model and protein variability evaluated by ConSurf server. Results: the phylogenetic tree showed a clade of SalV A2 and three main clades of SalV A1, revealing several infections in humans in South Korea, India, Tunisia, China, Nigeria, Ethiopia and USA. The Bayesian maximum clade credibility tree and the time of the most common recent ancestor dated back the root of the tree to the year 1788 with the probable origin in USA. Selective pressure analysis revealed two positive selection sites, His at 100th and Leu at 116th positions that at the homology modelling resulted important to guarantee protein stability and variability. This could contribute to the development of new mutations modifying the clinical features of this evolving virus. Conclusions: Bayesian phylogenetic and phylodynamic represented a useful tool to follow the transmission dynamic of SalV and to prevent new epidemics worldwide.


Introduction
Acute Gastroenteritis is considered one of the most significant and still prevalent diseases worldwide [1,2]. Even though sanitation and prevention strategies has determined a relevant decreasing of the mortality rate for diarrhea from 15% in 2008, to about 9% in 2015, infectious diarrheas are still a concerning problem in public health both in developing and industrialized countries [2,3]. Effective surveillance systems are critical for outbreak detection and timely implementation of public health interventions. Salivirus (SalV), previously named Klassevirus, a non-enveloped single-strand RNA virus with a genome length of 7.4 kb belonging to the Picornaviridae family, was recently described as an emerging problem in public health. Several studies, reported SalV as cause of acute gastroenteritis in children [4][5][6]. SalV as causal agent of acute gastroenteritis was described in 0.2-8.7 % of cases in different countries worldwide [7,8] with highest prevalence in Asian countries [9,10].
SalV is closely related to Aichivirus, a member of the Kobuvirus genus that similarly was detected in fecal samples as well in sewage from different countries [4][5][6]. Based on capsid region variation, SalV is classified in SalV A1 and SalV A2 [11]. SalV infection has been described in three cases of unexplained diarrhea from the United States, Australia and China and in sewage from Spain and Thailand [12][13][14][15][16]. Gastrointestinal apparatus seems not the only source of transmission, because in China this virus has been found in respiratory samples from a child with adenovirus infection showing different symptoms and representing a new way of spreading [17]. In a recent study, toilet wastes from different international airplanes resulted positive for several pathogens including enteric and respiratory virus. Especially in case of plane from South Asia, a high prevalence of SalV and Aichivirus have been described compared to samples collected from North Asia and North America [9,18,19]. At today, few reports are available about molecular epidemiology of this virus and all of them focusing on viral genotyping evaluated the genetic diversity of the virus as causal agent of acute gastroenteritis worldwide [7,8,11,18,20]. The aim of this study, using the genotype sequences available in GeneBank, was to apply the Bayesian evolutionary methods to date the origin of SalV infection worldwide and to track the spread of the virus through the different countries where epidemics have been reported. By phylogenetic analysis, the way of transmission of SalV can be clearly defined through the ancestor strain circulation dating and geographic movement tracing. This approach is extremely useful for prompt preventive action and surveillance plan aimed to limit the further spread of the virus worldwide.

Phylogenetic analysis
A dataset of all available VP1 sequences including 75 Salivirus A1 and 6 Salivirus A2 was built by downloading from GenBank (http://www.ncbi.nlm.nih.gov/genbank/). The selection criteria for sequences were known sampling date and geographical location ( Table 1). The countries included in the study were: China, Ethiopia, Germany, Guatemala, Hungary, India, Nigeria, Spain, South Korea, Thailand, Tunisia, USA and Venezuela.
The 81 sequences were aligned using the Bioedit program v7.0.5 and manually edited, as already described [21], obtaining an alignment of 852bp length. The recombination signal was analysed by the pairwise homoplasy index (Phi) test using SplitsTree4 software [22].
Invariant sites estimation has been performed in order to perform Xia test to estimate substitution saturation and the graphical exploration of the dataset of sequences used in the analysis by mean of DAMBE-7 software [23]. The likelihood mapping analysis of 10,000 random quartets by using TreePuzzle software has been used in order to investigate the phylogenetic signal as already described [24]. The best-fitting nucleotide substitution model was chosen by JModeltest software [25]. Adaptive Evolution Server (http://www.datamonkey.org/) was used to find eventual sites of positive or negative selection. At this purpose the following tests has been used: fast unconstrained Bayesian approximation (FUBAR) [26], adaptive Branch Site REL [27], Bayesian un-restricted test (BUSTED) [28] and the mixed effects model of evolution (MEME). These tests allowed to infer the site-specific pervasive selection, the episodic diversifying selection across the region of interest and to identify episodic selection at individual sites [29]. Statistically significant positive or negative selection was based on p value < 0.05 [29].
By MEGA-X software [30] an ML tree was reconstructed using Hasegawa Kishino Yano (HKY) plus Gamma distribution (HKY+G) as evolutionary model selected by JModelTest.
The correlation between sampling dates and genetic distances from the tips to the root was explored using TempEst to evaluate the robustness in terms of molecular clock of the dataset [31].

Bayesian phylogenetic analysis
The evolutionary rate, the dated and the phylogeographic trees were estimated using a Bayesian Monte Carlo Markov Chain (MCMC) approach. Strict and relaxed clock models have been tested using HKY+G model. Two parametric (constant size, exponential) and three non-parametric (Bayesian skyline plot (BSP), Bayesian GMRF and Bayesian Skygrid) population models were compared.
By means of a Bayes factor (BF, using marginal likelihoods) the best fitting models were selected and implemented in Beast v. 1.10.4. In reference of previous studies such as Villano et al., the strength of the evidence against H0 (null hypothesis) was evaluated as follows: 2lnBF < 2 = no evidence; 2-6 = weak evidence; 6-10 = strong evidence; and > 10 = very strong evidence. A negative 2lnBF indicates evidence in favor of H0. Only values ≥ 6 were considered significant [32]. The MCMC chains has been run for 75 million generations and sampled every 7500 steps. Convergence was assessed by estimating the effective sampling size (ESS) after removing a 10% burn-in, by Tracer software, version 1.7 [33], ESS values ≥ 250 were accepted. The posterior probability of each monophyletic clade ≥ 0.89 was considered statistically significant. The obtained tree was summarized using Tree Annotator software.
A map of the SalV infections has been generated using SPREAD software v1.0.7 [34].

Homology modelling
Homology models have been built relying on the website SwissModel [35]. Structural templates have been searched and validated using the software available within the SwissModel environment and HHPred [36]. Homology models have been validated using the QMEAN tool [37]. Three-dimensional structures have been analyzed and displayed using PyMOL. Mapping of evolutionary conserved regions onto homology models have been carried out with the ConSurf website [38]. To map the structural variability of the variant VP1 of SalV fragments and their sites under selection pressure, homology modelling has been applied on the templates JQ898343 and JN379039. Prediction of the impact of single mutations on the protein stability have been predicted with the SDM server [39]. Sequence alignment and display utilized Jalview [40].

Phylogenetic analysis
The Pairwise Homoplasy index (Phi) test did not detect any significant signal of recombination (p-value = 0.104).  The proportion of invariants sites by iteration found a P (invariant) of 0.6154. Consequently, the Xia's test for substitution saturation has been performed suggesting little substitution saturation for the data set, since the Iss value was smaller than Issc [41].
The tree Puzzle showed that the percentage of dots falling in the central area of the triangle was 3.9%, since it was lower 30% of noise the dataset contained sufficient phylogenetic signal.
The percentage of the Parsimony sites was 30% while the percentage of the constant sites was 62% confirming a sufficient phylogenetic signal for further analysis.
By FUBAR analysis of 284 sites, significant (pvalue < 0.05) pervasive episodic selection was found in 2 sites (100 th and 116 th nucleotide position using the reference sequence SalV KU362769.1). In sequences from Nigeria, Ethiopia, Usa, China, Guatemala, Spain and Thailand at the 100th position the presence of Histidine has been found, while in the other sequences the most prevalent amino acid was Arginine. This difference in the 100th position is present in the sanitation-resistant sequence isolated from sewage water in Bangkok (JQ898343.1), in the sequence isolated in the nasal swab of a child in china (KT182636.1) and in the sequences isolated in stool samples from Nigerian children suffering from nonpolio acute flaccid paralysis (AFP) (GQ179640.3, GQ507022.1). At the 116th position, the presence of Leucine instead of Threonine has been found in sequences from SalV A2 sequences.
Significant (p < 0.05) pervasive negative selection in 179 sites (63%) has been evidenced and confirmed by FUBAR.

Bayesian phylogenetic analysis
The root-to-tip divergence plots showed R value of 0.56, indicating good correlation and high temporal and "clocklikeness" signal between the genetic distance to the root. The selected model (lnBF > 6) was the costant demographic model using strict clock (Supplementary Table 1). The evolutionary rate of the SalV VP1 sequences was 1.0851×10 −3 substitutions site per year (95% HPD 6.2859×10 4 -1.5531×10 3 ). In Figure 1, Bayesian maximum clade credibility (MCC) tree was showed.
The time of the most common recent ancestor (tMRCA) dated the root of the tree back to the year 1788 (1622 -1870). In the MCC tree, three main clades of SalV A1 sequences and an outgroup clade of SalV A2 sequences, statistically supported (posterior probability (pp) > 0.89) were represented. In Figure 1, the MCC tree with Bayesian phylogeographic reconstruction of SalV sequences is showed. The probable circulation origin of SalV A1 and A2 dating back to 1788 was USA with a state posterior probability (spp) of 0.19. SalV A2 probably migrated in China (spp of 0.80) in the year 1981 where it persisted to circulate (spp 0.99) and simultaneously probably  (Figure 1). Figure 2 showed SalV worldwide migration route confirming the phylogeographic analysis that involve American, African, Asiatic and European continents. In detail phylogeographic analysis showed a greater posterior probability for USA in respect to China (spp = 0.19 vs 0.08 respectively), confirming the likelihood of origin for SalV 1 and 2 in USA and thereafter the posterior probability of 0.80 in the clade dated 1981 suggest the migration in China as probable origin for SalV 2. All the connections are statistically significant (red lines). The selective pressure analysis showed two sites under positive of selection, while 63% of sites seemed to be under negative selection, suggesting that SalV virus could be highly conserved.

Homology modelling
A multiple sequence alignment of fragments of the VP1 protein of SalV protein has been obtained. According to the alignment, the SalV VP1 fragments here considered cover the region 59-278 of the fulllength Aichi Virus VP3 protein (PDB code 5AOO, chain C). This portion is part of the VP3 hetero 15-mer and contains 3 α-helices and 10 β-sheets.
Among the available structural templates, the chain C of the PDB structure with code 5AOO corresponding to the crystal structure of the VP3 domain from Aichi Virus, has been selected. The decision has been taken in consideration of the suitable quality of the structure in terms of resolution (2.10 Å), the global and local quality estimate for the high sequence identity to the targets (about 53%) (Figure 3).
The sites subject to selective pressure in SalV VP1 fragments have been located onto the predicted threedimensional structure. The two sites are at the positions 100 th and 116 th of the alignment corresponding to positions 42 and 58 of 5AOO chain C structure. The mutation at position 116 is exposed on the outer surface of the protein and located on the capsid surface. The mutation at position 100 is located in the inner surface of the protein, pointing toward the inner volume of capsid ( Figure 4).
The homology model of the SalV VP1 protein compared with the VP3 fragment of Aichi Virus shows a difference in length and orientation of β-sheets encompassed by sequence positions 172 -178 and 234 -243. These secondary structure elements are predicted to interact, as in the Aichi Virus capsid, with an equivalent symmetry-related β-hairpin of an adjacent VP1 subunit forming an extended four-strand sheet.
ConSurf analysis conducted on the SalV VP1 domain suggests that both the position 100 and 116 are within variable sites of the VP3 protein of Aichi Virus. The variability may reflect the potentiality of future mutation that can induce more stability of the protein responsible for the clinical features of SalV that presented these mutations ( Figure 5).
Prediction of thermodynamic impact of the two mutations on the SalV VP1 has been carried out for the homology models built on the templates JQ898343 and JN379039. In both cases, the mutations His48 to Arg and Leu58 to Thr are predicted to be moderately destabilizing while the inverse mutations obviously stabilizing (data not shown).

Discussion
SalV, a worldwide spread virus recognized as causative of gastroenteritis in children, is becoming an emerging serious concern. Previous study has proven that Salivirus RNA can be detected in both treated and untreated sewage water. This could be due to the fact that the treatment processes of sewage water are not enough efficient in order to remove SalV particles, probably because of possible virus crystalline formation or aggregated forms, having an effect to increase viral survival in environment water samples [42].
The causative role of this virus in acute gastroenteritis is discordant at today as so as the seasonality not well established.
The treatment-resistance, the wide diffusion, the misdiagnosis and the evolving way of diffusion of SalV makes it a possible public health threat in the future years. Few studies have investigated SalV in terms of evolutionary analysis and only basic phylogenetic trees to characterize this virus have been described. In this study, we applied the Bayesian phylogenetic and evolutionary analysis determining a probable date of insurgence of the first SalV in USA, MRCA to the year 1788 ( Figure 1), after that this virus divided in two different genotypes, SalV A2, the most recent, dating back to 1982 and the most ancient Salv A1 dated in 1929. The virus was the cause of different monophyletic epidemic events forming almost three different clades and two different sub-clusters dated between the year 1975 and 2003.
The phylogeographic analysis (Figure 2, Figure 3) showed the probable origin of this virus is USA. From USA, the virus had two different way of spread, SalV A2 strains travelled in China causing various epidemics and after spread in Thailand and Venezuela by different way and date of transmission. The SalV A1, the most representative strain, travelled from USA to India in 1939 and successively in three different clades, spread in Nigeria (clade I) in 1977, persisted in India (clade II), and spread in South Korea (clade III) in 2000, identified by three different monophyletic epidemics. We did not identify any sort of intermingled strains in each clade or cluster underling the patent of territorial epidemics. This virus probably travelled by infected individuals that, once arrived in the living country, infected and contaminated environmental water becoming the potential source of human infection and creating a sort of fecal-oral circuit implying also the waste water as further source of virus isolation.
In this study, only two sites under positive of selection were found, suggesting that SalV virus could be highly conserved. This analysis do not have to encourage a light surveillance because previous studies, on other viruses, demonstrated that even only one site under positive selection can be important for  transmission vector change, as happened in CHKV epidemics [43,44]. Moreover, we did not exclude the amount of infectivity or therapy resistant or failed therapy so as. At the 116 th position, the presence of Leucine instead of Threonine has been found from SalV A1 sequences, indicating that SalV with Histidine in the 100 th position could induce over time severe and resistant infection.
We need to get a better understanding of SalV impact on public health in order to prevent and resolve future epidemics and infection [45], so as to prevent the diffusion of this evolving virus, how has been recently demonstrated [11,46,47].
The presence of the positive selection sites where the probability to fix a mutation over the time is high (> than 90%), can probably favor the resistance of the virus to the sanitation processes and the stagnation in sewage with other similar viruses. An example of the evolving nature of SalV is the recent isolation of the virus in respiratory specimen of child in China.
The presence of mutation in sequences of SalV that are sanitation resistant and more virulent and the homology between VP1 of SalV and VP3 protein of Aichivirus, suggests that probably this mutation facilitates the viral entry into the host cell. The presence of His at the 100 th position suggests that this loop region can be characterized by a narrower angle compared with the sequences with Arg in the same position. This effect could probably lead to a more compact structure as predicted by SDM server.
The sequence conservation and mapping analysis has shown the presence of multiple regions with high variability, this could lead to the development of new mutations that could modify the clinical features of this evolving virus. Since SalV is already wide spreading, more attention should be paid to the diagnosis, prevention and treatment of infections.
This emphasized the need to improve the molecular epidemiological surveillance of SalV.

Conclusions
In light of these considerations, molecular and evolutionary studies are important to reveal the potential abilities of SalV to trigger a significant epidemic in countries also far each other's. For newly discovered and continuously evolving viruses like SalV so as for the most recent increasing number of epidemics and the presence of a positive selection sites, the dynamic nature of SalV could represent a serious threat for public health in the future.
Bayesian phylogenetic and phylodynamic represent useful tools to follow the dynamic transmission of this virus and to prevent the possibility of new epidemics through the world.