Coronavirus Pandemic Whole Genome Sequencing and Phylogenetic Analysis of SARS‐CoV‐2 strains in Turkey

Introduction: Coronaviruses which are single-stranded RNAs, are members of a large family of viruses that may be important pathogens for humans. SARS-CoV-2 was found to cause the severe respiratory syndrome, and on January 22, 2020 first human-to-human transmission was reported. We aimed to reveal the complete genomes of 19 SARS-CoV-2 isolates from Denizli province and identify Turkish patients' genetic similarities. Methodology: 15 samples with the highest viral loads resulting from RT-PCR were selected for NGS analysis. Fifteen SARS-CoV-2 complete genome sequences were then subjected to phylogenetic analysis and uploaded to the GISAID database. Phylogenetic trees were constructed by the Neighbor-Joining method using MEGAX software. Results: Whole-genome sequencing of the viral RNA samples revealed 32 missense, 21 synonymous, and 4 non-coding alleles. In all samples c.1-25C>T (5’UTR), c.14144C>T (ORF1ab), c.2772C>T (ORF1ab) and c.1841A>G(S) mutations were detected. Phylogenetic analysis revealed that most of the present study's genomes are in 20B clade while the two are in 20A. The phylogenetic tree constructed with all complete SARS-CoV-2 genomes of Turkey showed that the viruses were spread nearly homogenous on eastern (around Kars) and western (around Istanbul) sides. Conclusions: Here, we reported the viral genomes in Denizli comprehensively for the first time. We identified 11 rare missense mutations in the virus compared to the reference genome. Phylogenetic analysis revealed that while most of our isolates were similar to European sequences, some had different sublineages depending on their genomic variants.


Introduction
Coronaviruses (CoVs) are members of a large family of viruses that may be important pathogens for humans and vertebrates. Coronavirus is a singlestranded RNA virus that belongs to the family Coronaviridae included in the order Nidovirales [1]. The family members of Coronaviridae can infect a variety of systems, including respiratory, gastrointestinal, hepatic, and central nervous systems of humans and animals such as birds, bat, and mice. Human Coronaviruses known to date are as follows: HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, SARS-CoV MERS-CoV, and SARS-CoV-2. SARS-CoV-2, a new Coronavirus described in mid-December 2019, was reported in China and linked to a pathology name Coronavirus disease 2019 (COVID-19) [2]. The virus that caused severe respiratory syndrome and human-to-human transmission was first reported on January 22, 2020 [3]. With the World Health Organization (WHO) declaration, it has been announced that SARS-CoV-2 infection became pandemic on March 11, 2020. The severity of the SARS-CoV-2 infection is variable. Some people are lethally affected, while others are strikingly asymptomatic for the disease. Patients with COVID-19 usually develop some signs and symptoms such as mild respiratory illness and persistent fever, nausea, diarrhea, skin rash, and loss of taste and smell an average of 5-6 days after infection [4,5]. At the beginning of the pandemic of COVID-19, difficulty in breathing was a typical syndrome for a patient who needs artificial ventilation in intensive care [6]. With the spread of this pandemic all over the world, 59.204.902 confirmed cases of COVID-19 and 1,397,139 deaths were reported by WHO website (https://covid19.who.int) in more than 200 countries and territories and as of November 22, 2020. The first case of COVID-19 in Turkey was confirmed on March 11, 2020. As of March 15, 2021, there have been 2,894,893 cases and 29,552 deaths declared by The Ministry of Health, Turkey. Understanding the transmission patterns and evolution of the SARS-CoV-2 virus is crucial for generating efficient drugs and vaccines for disease prevention [7,8]. Therefore, the analyses of genomic sequences of coronaviruses are essential tools for drug/vaccine design. There are 220,620 SARS-CoV-2 complete genome sequences available on Global Initiative's database on Sharing All Influenza Data (GISAID) (https://www.gisaid.org/epiflu-applications/hcov-19reference-sequence).
In the GISAID platform, there are 194 complete SARS-CoV-2 sequences from Turkey, and only two sequences among these were found from Denizli province. In this study, we performed next-generation sequencing (NGS) analysis to reveal the complete genomes of 15 SARS-CoV-2 isolates on Turkish patients from Denizli province. To identify their genetic similarity, phylogenetic analyses were performed by considering the complete SARS-CoV-2 genomes of Turkey and worldwide isolates selected from GISAID. Besides, we focused on the variation analysis to show the mutations in SARS-CoV-2 genomes.

Sample selection and viral RNA isolation
Nasopharyngeal swab (NPS) and oropharyngeal swab (OPS) specimens of COVID-19 suspected patients who were sent for routine diagnosis to the Virology Unit, Pamukkale University Hospital, Denizli, Turkey, were selected for viral cultivation. The study was conducted in Denizli, a region in the western part of Turkey. Detailed patient information about the patients is shared in supplementary S1. 15 SARS-CoV-2 positive samples with high viral load (assessed by Real-time PCR) were selected for NGS analysis. Viral RNAs of confirmed COVID-19 cases showed Cq <20 by real-time PCR were extracted by using QIAamp Viral RNA Mini kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions on 2020-07.

NGS Library Preparation
Sequencing libraries were prepared using the CleanPlex® SARS-CoV-2 Research and Surveillance Panel (Paragon Genomics, San Francisco, CA, USA), following the manufacturer's instructions. Briefly, the quantity of viral RNA samples was assessed with the Qubit RNA HS Assay kit. 50ng of extracted viral RNA was used as an input for each sample. As a first step, RNAs were used as a template for reverse transcription. Then genome of each sample was amplified using two primer pool protocol (343 primer pairs separated in two pools) with ten cycle multiplex PCR to cover the entire genome of SARS-CoV-2 with a median size of 149 bp. The reaction was terminated by the addition of 2µl of stop buffer. PCR products were then purified by adding magnetic beads. Indexing primers were introduced by second PCR using twenty-five cycles, followed by final bead purification was performed. PCR products were quantified and qualified by both dsDNA HS Assay Kits with Qubit 4.0 Fluorometer (Thermo Fisher Scientific Inc.) and an Agilent Bioanalyzer 2100 with High Sensitivity DNA Chips (Agilent Technologies Inc.) according to manufacturers' instructions. Finally, 15 libraries were pooled in equimolar concentrations and sequenced on the NextSeq500 instrument with midoutput 2 × 150 bp flowcells (Illumina).

Data Quality and Variant Analysis
Raw sequencing data were obtained in FASTQ format and processed for variant calling and consensus sequences generation. First, the FASTQ sequences' quality was evaluated by the FastQC program, and then adapter sequences were trimmed from reads using Cutadapt software. Processed reads were aligned to the reference genome (NC_045512.2) using BWA mem v.0.7.12. Variant calling and generation of consensus sequences were performed by using Samtools v.1.2.0. Moreover, samples were also analyzed in the SOPHIA platform for SARS-CoV-2 for variant calling and reporting the viral presence.

Phylogenetic Analysis
Fifteen SARS-CoV-2 complete genome sequences were subjected to phylogenetic analysis to reveal the molecular relationship and evolutions. For this purpose, all 160 complete and high coverage sequences isolated in Turkey (last access November 15, 2020) were selected and retrieved from GISAID. To obtain the phylogenetic tree on a global scale, 180 sequences based on major GISAID clades (19A, 19B, 20A, 20B, and 20C) were also selected from the same database. Sequences were aligned using MAFFT [9] with the following settings: Clustal format; FFT-NS-2 strategy. The evolutionary history was inferred using the Neighbor-Joining method [10]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) [11]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Kimura 2-parameter method [12] and are in the units of the number of base substitutions per site. All ambiguous positions were removed for each sequence pair (pairwise deletion option). The evolutionary analyses were conducted in MEGA X [13].

Results
We sequenced 15 samples from Denizli province by using the CleanPlex® SARS-CoV-2 Panel. These samples were collected among 15 patients who were detected positive as a result of RT-PCR. Age intervals of the patients were between 5 and 80. Thus, patients' mean age was approximately 53 years, which is consistent with older adults being more susceptible to COVID-19. The number of raw data reads of 15 samples on the library was obtained as 13,069,752 read pairs, and SARS-CoV-2 genomes (> 98% coverage) were recovered from all samples. We aligned our genomic sequence with the SARS-CoV-2 NCBI Reference Sequence Wuhan (NCBI GenBank, NC_045512) to investigate variants throughout the genome. Depending on genomic alignment, we obtained 57 total and 11 unique variants (Table 1). All detected variants in samples are given in Supplementary Table 2.
Totally 32 missense, 21 synonymous, and 4 noncoding alleles were obtained. The most common  Table 1). Additionally, 35 variants were detected only once. Most of the variants were detected in ORF1ab (30/57), and all of the isolates contained a signature c.1841A>G (D614G) mutation in the spike glycoprotein (S). D614G substitution in the spike glycoprotein of SARS-CoV-2 strains is now the most prevalent, and patients carrying the D614G variant are found to have higher upper respiratory tract viral loads. According to the GISAID database, a new missense mutation was also identified in the spike protein for the first time in Turkey. c.610G>C (G204R) mutation, one of the common mutations that were also detected in our study, was obtained 79988 times (34.84% of all samples with N sequence). The other one was c.608G>A (R203K) which has already obtained 80702 times (35.15% of all samples with N sequence) in 108 countries. c.171G>T ORF3a mutations was 13,3% in our study. However, in the previous studies from Turkey, it was 40%. Additionally, we determined some missense mutations found for the first time in the samples from Turkey (Table 2).
L140F is one of the rare mutations that was obtained in 5 out of 15 samples. Three patients with L140F mutations were found to come from the same family and they have same variants (Patients 1, 6, and 9). Moreover, we found that these three patients in the same lineage, B.1.1.162. We think that this viral genotype similarity is due to in-family spread. In this study, c.146T>C (V49A) mutation found in one patient was only reported once in the USA (0.00% of all samples with NS8 sequence).

Phylogenetic Analysis
In the present study, domestic and international evolutionary linkage of the isolated sequences were demonstrated in Figures 1 and 2 [14,15].
The clades were prominently dominated in the European region, having C3037T -C144408T -A23403 and G28881A -C14408T -A23403G mutations for 20A and 20B, respectively [16]. Like the result, all presented virus genomes were in lineage B.1 of GISAID pangolin clades, the major European lineage (Figure 1). Notably, EPI_ISL_707939 is in B1.9, the Turkish lineage, while EPI_ISL_708193 is in B.1.227, the Irish lineage. Unlike other isolated viruses, EPI_ISL_707939 and EPI_ISL_708193 had both genomes located at different roots in the phylogenetic tree. Considering the phylogenetic tree constructed with all complete SARSCoV2 genomes of Turkey, the viruses were seen to spread nearly homogenous in both eastern (around Kars) and western (around Istanbul) sides of the country (Figure 2). However, the difference of the two EPI_ISL_707939 and EPI_ISL_708193 genomes was also shown in the tree as a diverse root from other strains, which are mainly positioned in the eastern region of the country.

Discussion
SARS-CoV-2 is a novel coronavirus that infected more than 76 million people leading to approximately 1.69 million deaths globally as of December 2020 (https://www.gisaid.org/epiflu-applications/hcov-19reference-sequence). As of December 2020, more than 17,000 patients died in Turkey due to COVID-19. Herein, we reported 15 virus genomes isolated in Denizli, and to our knowledge, this case series is the first comprehensive study of a COVID-19 sample population from Denizli, Turkey. The emergency departments are less frequented by younger patients and biased toward patients 18 years and older. Patients with higher viral loads detected by RT-PCR (≤20 ct) were also correlated with higher coverage of SARS-CoV-2 genome by sequencing. Besides, we performed variant calling and phylogenetic analysis with these 15 samples. We compared our sequences to the reference genome to identify genomic variants of SARS-CoV-2 sequences (NC_045512.2). Depending on the variant analysis, all samples contained a D614G mutation in the spike glycoprotein, a widespread mutation in the samples from Turkey [17,18]. In the literature, Zhang et al. showed that the D614G mutation was associated with increased viral loads throughout the World [19]. We observed that c.14144C>T (P323L) and c.1841A>G (D614G) mutations co-occur in all samples, correlatively. Besides, we observed that c.608G>A and c.610G>C missense mutations are the second common mutations (86.67%) in our sequences. A study performed by Karamese et al. reported that 549 total and 53 unique variants were obtained in 47 isolates. Similar to our study's findings, the previous study showed that D614G was the most frequent mutation which is occurred in spike glycoprotein [17]. To obtain the global frequency of our genetic variations, we uploaded our sequences to the GISAID database. According to the database results, we obtained 11 sporadic missense mutations. The frequency of these rare mutations was between 0.00 to 0.06 percent. The mutations were not observed before in samples isolated in Turkey. Thus, the rare mutations in these samples may be significant for developing efficient antiviral vaccines or therapeutics. This virus's phylogenetic characterization is vital for contributing to the knowledge of viral variation to specify the most suitable regions to be used as vaccine targets or antivirals. Therefore, to determine the clades and lineages of our results, we uploaded our sequences to the GISAID database. For this purpose, all 160 complete and high coverage sequences isolated in   [17,18]. GISAID classifies phylogenetic clusters into currently seven clades (S, L, V G, GH, GR,

GV) using specific combinations of genetic variations.
Depending on the unique variant profile, 13 out of 15 samples were belonged to the GR clade, whereas the other two samples (EPI_ISL_707939 and EPI_ISL_708193) were in the GH clade. However, most viral isolates in Turkey showed L-type characteristics and formed a monophyletic clade [18]. In conclusion, we analyzed fifteen samples from SARS-CoV-2 positive patients from Denizli, Turkey. Here, we reported the viral genomes circulating in Denizli, Turkey comprehensively for the first time. As a result of our variants' analysis, we identified 11 rare missense mutations in the virus genome compared to the reference genome. According to the phylogenetic analysis, most of our isolates were found to have similar to European sequences. However, some of the sequences had different sublineages depending on their genomic variants. More samples should be included from different Turkey areas to investigate the phylogenetic characteristics of SARS-CoV-2 infections in detail. The SARS-CoV-2 pandemic affected worldwide allows analyzing the progress and improvements of the virus and investigating the mechanisms of selecting appropriate mutations. After more genome analyses have been carried out, further studies should be done to compare strains better; however, we think that these data could help understand the virus's dynamics and help further treatment/vaccine development studies.

Annex -Supplementary Items
Supplementary