Coronavirus Pandemic A straightforward molecular strategy to retrospectively investigate the spread of SARS-CoV-2 VOC202012/01 B.1.1.7 variant

The spread of new SARS-CoV-2 variants represents a serious threat worldwide, thus rapid and cost-effective methods are required for their identification. Since November 2020, the TaqPath COVID-19 assay (Thermo Fisher Scientific) has been used to identify viral strains of the new lineage B.1.1.7, since it fails to detect the S-gene with the ∆69/70 deletion. Here, we proposed S-gene mutations screening with the Allplex SARS-CoV-2 assay (Seegene), another widely used RT-PCR test that targets Sarbecovirus E, SARS-CoV-2 N, and RdRp/S genes. Accordingly, we evaluated the S gene amplification curve pattern compared to those of the other genes. Exploiting an Allplex assay-generated dataset, we screened 663 RT-PCR digital records, including all SARS-CoV-2 respiratory samples tested in our laboratory with the Allplex assay between January 1st and February 25th, 2021. This approach enabled us to detect 64 samples with peculiar non-sigmoidal amplification curves. Sequencing a selected group of 4 RNA viral genomes demonstrated that those curves were associated with B.1.1.7 variant strains. Our results strongly suggest that B.1.1.7 variant spread has begun in this area at least since January and imply the potential of these analytical methods to track and characterize the spread of B.1.1.7 strains in those areas where Allplex SARS-CoV-2 datasets have been previously recorded.


Introduction
Multiple variants of the SARS-CoV-2 virus have been reported globally during the current COVID-19 pandemic. The spread of new viral strains depends on the emergence of specific genetic variants conferring a selective advantage, including enhanced transmissibility. Hence, these variants represent a major health concern, since their capability to spread more quickly and easily increases, in turn, the toll of hospitalizations and deaths.
In late September 2020, a new lineage called B.1.1.7 emerged in the UK. This strain, also designated as SARS-CoV-2 Variant of Concern (VOC) 202012/01 by Public Health England (PHE), carries a large number of mutations along the whole genome [1]. Being more infectious than other circulating strains of SARS-CoV-2 [2], VOC 202021/01 quickly became the dominant circulating variant in the UK, leading to lockdowns and travel restrictions in this country. Two other lineages, B.1.351 and P.1, are under strict observation worldwide. B.1.351 has been identified initially in South Africa (December 2020) and to date a low number of cases have been reported also in other countries in Africa, US, and in Europe [3]. P.1 emerged in Manaus (Brazil), in January 2021, and is currently being reported in many countries of South and North America, Europe and Japan [4]. The most significant variants identified in these strains are those located in the S gene, encoding for the Spike protein.
Interestingly, one specific mutation, named N501Y, arose independently in all three variants. Since Spike protein is key to allow virion entry into host's cells, this mutation might improve the Spike protein's ability to bind to the cell receptor angiotensin-converting enzyme-2 (ACE-2), increasing the virus ability to infect hosts. Other mutations occurring within the S gene might affect the antigenic features of Spike, which is a key target of the host immune response. To this extent, since neutralizing antibodies are elicited by S protein epitopes, mostly S1 subunit RBD and NTD domains [5], S gene variants represent a serious threat also for the vaccinated or naturally immunized individuals.
B.1.351 and P.1, indeed, appeared to escape immunity in vaccinated individuals [6]. On the contrary, several reports suggested that B.1.1.7 increased its dissemination capability while keeping its susceptibility to anti-S antibodies triggered by vaccination [7][8]. Polyclonal nature of humoral response warrants neutralization capability for both vaccinated and naturally immunized individuals. However, Spike epitopes binding by monoclonal therapeutic antibodies might be weakened in circulating strains carrying both N501Y and HV60/70 variants. Noteworthy, under the selective pressure of the steadily growing number of vaccinated individuals, B.1.1.7 might rapidly acquire further genetic variants that resist vaccine-elicited anti-S polyclonal antibodies. Hence, it is of paramount importance to tackle the dissemination of this B.1.1.7 variant with effective diagnostic strategies. Exploiting its S gene target failure (SGTF), a reverse transcription PCR (RT-PCR) assays (e.g., TaqPath COVID-19 RT-PCR assay, Thermo Fisher Scientific), has been successfully applied as a proxy to identify the B.1.1.7 lineage and to monitor its geotemporal dissemination. This approach, also known as Spike gene "drop out" enabled many laboratories worldwide to rapidly screen for the B.1.1.7 variant and to monitor its circulation. Interestingly, Borges et al. [9] also detected TaqPath S-gene positive B.1.1.7 strains having Cycle threshold (Ct) values for S gene > 5 units higher than the maximum Ct value obtained for the other two targets (N and ORF1ab). This profile, named "Spike gene target late amplification", was also consistently associated with the VOC 202012/01. However, the TaqPath methodology requires a two-step RT-PCR assay and might be applied in retrospective studies only when original samples or extracted RNA are still available. These observations prompted our research group to carefully evaluate the RT-PCR profile obtained with another assay targeting Spike gene, together with E, RdRP, and N genes (e.g., Allplex COVID-19 assay, Seegene). This assay was not reported to "fail" in detecting the VOC 202012/01. However, in addition to being unveiled by "target failure" or "target late amplification", mutations in the target sequence might moderately affect amplification efficiency, leading to minor changes of the amplification curve slope. In the past several months, our laboratory has analyzed a large amount of samples with the Allplex RT-PCR assay. Hence, we aimed to identify putative variants of the S gene by carefully evaluating the amplification plot of the S gene from the thermal cyclers digital records, followed by nextgeneration sequencing (NGS) of the corresponding archived genomic samples.

RT-PCR digital records analyses
SARS-CoV-2 positive tests were selected from the whole dataset of Allplex SARS-CoV-2 assays (Seegene, Seoul, Korea) archived in our laboratory, as part of the diagnostic laboratory routine. RT-PCR were performed with the CFX96™ Real-time PCR Detection System-IVD (Bio-Rad Laboratories Inc, CA, USA). As for most thermal cyclers, the instrument software applies fluorophore-specific channel filters to remove background noise, and smoothing algorithms to obtain clearer amplification sigmoidal curves. Hence, the relative fluorescence from each sample was calculated with Bio-Rad CFX Manager V3.1 (Bio-Rad Laboratories) with the baseline-subtracted curve-fit setting and selecting the fluorescence from FAM (gene E) and Cal Red 610 (RdRP/S gene). The shape of the amplification curves and, specifically, of the phase-2 (or logarithmic phase) was carefully scrutinized to assess the efficiency of the S gene amplification. Specifically, the following parameters were chosen to screen samples of interest: a weak phase-2 (low slope) and/or a sequence of at least 2 inflection points indicating a curve increasing in slope each cycle and then decreasing in slope each cycle (Supplementary Figure 1).

Next generation sequencing libraries preparation
We implemented the CleanPlex SARS-CoV-2 Research and Surveillance Panels protocol (Paragon Genomics, Inc, Hayward, CA, USA) for target enrichment and library preparation. All the protocol steps, including RT with primers annealing, multiplex PCR (mPCR), digestion and second PCR, were performed according to the manufacturer's instructions. A group of four RNA viral samples was selected for whole genome sequencing. Specifically, mPCR was conducted using the 2-pool workflow, in order to obtain a higher coverage, preparing 2 mPCR reactions per sample as recommended. Unique i7 and i5 indexes from CleanPlex Dual-Indexed PCR Primers for Illumina, Set A (Paragon Genomics) were introduced via second PCR, adjusting to 24 cycles based on the number of amplicons (343) generated by mPCR. All the purification steps were performed using CleanMag Magnetic Beads (Paragon Genomics). Quality control was performed with the 2100 Bioanalyzer instrument using Agilent High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA, USA). Libraries showed an average peak at 272 bp. Libraries were then quantified using a Qubit Fluorometer with the dsDNA HS Assay Kit (Life Technologies, Carlsbad, CA, USA, now Thermo Fisher Scientific). After confirmation of the library quality, libraries were normalized to 10 nM and samples with unique index combinations were pooled in equimolar ratios to reach the recommended final concentration of 4 nM for sequencing. After a further quantification with Qubit dsDNA HS Assay Kit, pooled libraries were prepared following the Standard Normalization protocol on MiSeq System Denature and Dilute Libraries Guide (Illumina, San Diego, CA, USA), and finally denatured and diluted to 11 pM. Sequencing was carried-out on a MiSeq instrument (Illumina) with Reagent Kit v3, using 20 pM PhiX Table 1. Substitutions and indels identified in four SARS-CoV-2 positive samples showing a "low-slope" amplification curve for the S gene. Bold text indicates variants defining the B.1.1.7 lineage. The genomic position affected by the mutation is indicated in round brackets.
control spike-in of 5% for low-diversity libraries, setting 2 x 150 cycles, and generating paired-end reads.

Genomic analyses and variants assessments
NGS raw data (FASTQ files) were generated from MiSeq Local Run Manager (Illumina) and uploaded on the SOPHiA DDM platform (SOPHiA Genetics, Lausanne, Switzerland) for external quality check, variant call review, and determination of the consensus genome. Finally, SARS-CoV-2 lineages were determined using Pangolin (https://github.com/covlineages/pangolin).

Results and Discussion
The Microbiology and Virology Division, University Hospital of Sassari, Italy, serves as a regional reference center and has tested approximately 178,151 samples since March 2020. Between January 1 st 2021 and February 25 th 2021, 34,305 samples were analyzed by the Allplex COVID-19 assay, Seegene; 5,555 of these determinations were resolved as positive.
Since most swab samples collected in the area of North Sardinia were analysed in our diagnostic laboratory, we consider our digitally recorded dataset of Allplex COVID-19 assay as large and representative for the whole sampling area. Given the spread of VOC 202012/01 (B.1.1.7 lineage), our diagnostic laboratory was prompted to actively test for this variant on all SARS-CoV-2 positive swab samples. To this end, while waiting for development RT-PCR assays enabling the direct detection of specific variants, according to new and specific probe/primers design, we carefully analyzed the shape of S gene amplification curve as a potential proxy for S gene variants, as described in the methods section. Quite fortuitously, we observed a cluster of 4 samples with a weak phase-2 (low slope) and inflection in a point midway up the phase-2 of the curve (Supplementary Figure 1). To assess whether the amplification pattern was due to S gene mutations affecting the efficiency of the RT-PCR test, the 4 samples were subjected to RNA extraction and whole genome sequencing, as described in the methods section. According to the Pangolin application, the analyses of NGS data enabled us to assign these viral genomes to SARS-CoV-2 lineages, confirming the presence of the B.1.1.7 variant. In further analysis of the whole viral genomes, the lineage has also been confirmed by the identification of 15 of 17 variant defining mutations (Table 1). In addition to the modifications known to be B.1.1.7 lineage-specific, we reported the substitution and indels signatures (Table 1) characterizing each sample. We then analyzed, retrospectively, all digital records obtained from the thermal cyclers where the Allplex COVID-19 assay was performed. Among the positive tests recorded since January 1 st , 64 were detected as showing curve profiles similar to those described above and in Supplementary  Figure 1. We finally investigated the geotemporal distribution of the samples identified as B.1.1.7, according to the site of sampling and/or recorded permanent place of residence. The data obtained, while limited and without further epidemiological characterization, enable a preliminary picture of B.1.1.7 spread in the area of North Sardinia, Italy. Noteworthy, on February 25 th 2021, while we were writing this paper, Seegene Inc. published a technical note assessing that "investigations with full-genome sequenced around 400 known samples (300 with normal curve and 100 with abnormal curve), tested by Allplex SARS-CoV-2 Assay, showed that in the presence of B.1.1.7 lineage, the shape of the raw curve of the CalRed610 channel (corresponding to RdRp/S genes) differs from the expected shape of the curve, allowing to a prescreening of this lineage...".
In conclusion, given the wide availability of the Allpex SARS-CoV-2 assay in many countries, the retrospective analyses of digitally recorded dataset and the detection of the S gene amplification curve profiles, as reported in this paper, might contribute in a costeffective manner to enable large retrospective surveys of B.1.1.7 geotemporal dissemination in most countries worldwide.