Coronavirus Pandemic Modelling the evolution of the coronavirus disease (COVID-19) in Saudi Arabia

Introduction: The ongoing COVID-19 pandemic has claimed hundreds of thousands of lives around the world. Health planners are seeking ways to forecast the evolution of the pandemic. In this study, a mathematical model was proposed for Saudi Arabia, the country with the highest reported number of COVID-19 cases in the Arab world. Methodology: The proposed model was adapted from the model used for the Middle East respiratory syndrome outbreak in South Korea. Using time-dependent parameters, the model incorporated the effects of both population-wide self-protective measures and government actions. Data before and after the government imposed control policies on 3 March 2020 were used to validate the model. Predictions for the disease’s progression were provided together with the evaluation of the effectiveness of the mitigation measures implemented by the government and self-protective measures taken by the population. Results: The model predicted that, if the government had continued to implement its strong control measures, then the scale of the pandemic would have decreased by 99% by the end of June 2020. Under the current relaxed policies, the model predicted that the scale of the pandemic will have decreased by 99% by 10 August 2020. The error between the model’s predictions and actual data was less than 6.5%. Conclusions: Although the proposed model did not capture all of the effects of human behaviors and government actions, it was validated as a result of its time-dependent parameters. The model’s accuracy indicates that it can be used by public health policymakers.


Introduction
The COVID-19 pandemic continues to spread around the world, threatening public health [1]. Saudi Arabia has reported the highest number of COVID-19 cases in the Arab world [2]. The first case in the country was reported on 2 March 2020, after which it gradually spread, leading the government to impose a series of measures to control its spread on 23 March 2020. These measures included launching a multi-language awareness campaign calling for social distancing and personal hygiene. The government continued to impose more restrictions, culminating in bans on travel between cities, lockdowns, and school and workplace closures [3]. These measures targeted different parts of the country according to the level of the severity of infection. These actions, coupled with aggressive testing and the provision of adequate health services, may have reduced the number of infections and, more importantly, may have kept the fatality rate around 1.5% from the first death due to COVID-19 on 24 March 2020 until 15 April 2020. This rate was lower than the fatality rate of 5.2% in Europe and 3.7% in North America during the same period [4].
By the end of May 2020, the government had eased some restrictions, namely allowing approximately half of workers to return to work, allowing some travel between cities, and making mask-wearing compulsory. With the easing of restrictions, the number of new cases spiked and the number of intensive care unit hospitalizations increased. Uncertainty around the evolution of the COVID-19 pandemic was compounded by the fact that Saudi Arabia is home to a large number of expatriate workers. It is unclear whether governmental awareness programs effectively reached all of these workers and whether the population at large is abiding by the self-control measures imposed by the government given the economic hardship caused by the pandemic.
The spread of infectious diseases is a very complex phenomenon that depends on a large number of factors. Some of them are social, environmental, or economic and are linked with human activities while other factors are related to the nature of the pathogen causing the disease. Understanding these factors is necessary to develop an accurate predictive model. Research is currently being conducted on the pathogenesis, transmission, and life cycle of COVID-19. There are a few relevant points currently known about the virus. First, it causes respiratory, gastrointestinal, and neurological symptoms [5]. Second, it spreads via respiratory droplets and aerosols and surface contact [6]. Third, it has an incubation period of five to six days during which time infected individuals are typically asymptomatic.
COVID-19's asymptomatic transmission is an important topic, but whether transmissibility is the same when infected individuals are asymptomatic as when they are symptomatic is being debated [7]. Fourth, many factors are correlated with COVID-19 mortality, including age; gender; comorbidities, such as diabetes mellitus, hypertension, kidney disorders, and heart diseases; limited testing capacity; and healthcare quality [8].
Given the complexity of so many factors affecting COVID-19's transmission and morbidity, simple mathematical models are often used to understand how it spreads. More complex models require a larger number of parameters that can be difficult to estimate, making their predictions unreliable. Thus, simpler compartmental models are often used to model the spread of infectious diseases. In these models, the population is divided into compartments with the assumption that every individual in the same compartment has the same characteristics [9]. These models usually take the form of generally deterministic differential equations. The susceptible-exposedinfectious-removed (SEIR) model [9] is an extensively used compartmental epidemic model that divides the population in question into those who are susceptible to infection, have been exposed to the disease but are not yet infected, infectious, or have been removed from the population either because they recovered or died. This model has been used to model the spread of many viruses that infect humans, such as H1N1 [10], influenza [11], and Middle East respiratory syndrome (MERS)-CoV [12]. Despite its age, the SEIR model is flexible and is still being improved. It has also been applied extensively to analyze the COVID-19 pandemic [13][14][15][16][17].
A number of models have been used to predict the evolution of the COVID-19 pandemic in Saudi Arabia. Alharbi et al. [18] used COVID-19 data from 31 March 2020 to 21 July 2020 to predict its spread using four epidemiological models. They showed that their proposed susceptible-infected-removed (SIR) model best fit the data and predicted that the cumulative number of cases would reach a mean of 359,794 infected individuals and that the pandemic would end around 7 September 2020. Alrasheed et al. [19] proposed a network-based epidemic model to evaluate the effectiveness of the control measures implemented by the Saudi Arabian government and to predict the future dynamics of the disease in different scenarios. The proposed model predicted that, in the absence of vaccinations and social distancing beginning on 10 June 2020, the epidemic would end at the beginning of November 2020 with over 13 million infected individuals. Alharthy et al. [20] utilized an SEIR model and predicted a peak infection incidence around 26 July 2020, a peak mortality of 99,749 people, and around 14 million infected individuals. Finally, Alshammari [21] utilized a highly structured model in which the Saudi Arabian population was divided into six categories: susceptible, exposed, symptomatic, asymptomatic, hospitalized, and recovered. The authors used data from 2 March 2020 to 14 April 2020 to validate the model and make predictions. The model predicted that approximately 18% of the Saudi Arabian population would be asymptomatic by the last week of May 2020.
In this study, an SEIR model is proposed for the spread of COVID-19 in Saudi Arabia that was based on a successful model for the spread of the MERS outbreak in South Korea in 2015 [22]. Both MERS-CoV and SARS-CoV-2 viruses present with similar symptoms, though SARS-CoV-2 is more contagious than MERS-CoV, while MERS-CoV is deadlier than SARS-CoV-2.
A note should be made about model validation. While COVID-19 pandemic modeling has been the subject of recent research, the challenge remains in fitting the proposed models to the available epidemiological data. Models that fit the data well can be used to accurately estimate the basic reproduction number which is not explicit in the raw epidemiological data. A number of techniques have been proposed to determine the best fit, including the least-squares method that minimizes the sum of the squared residuals [18,21,23] and the variational approach that is applied to SIR models, some of whose parameters are timedependent [24].

Methodology
Two dynamic models were developed and validated to simulate the evolution of the pandemic. This model's parameters were extracted to obtain the reproduction number. The second model reflected government policies implemented beginning on 23 March 2000 that were intended to limit the spread of the disease. Different techniques were used to fit the data and found that time-varying model parameters provided the best fit. Based on fitted model parameters, the second model was used to forecast the spread of the disease. Two scenarios that reflected differing levels of severity of government policies and population compliance were modeled.
Daily infection, hospitalization, and recovery data were retrieved from the Saudi Ministry of Health. The models were integrated using MATLAB solver ode45 that uses Runge-Kutta methods to solve initial value problems. Parameters were estimated using the built-in MATLAB function fmincon that uses sequential quadratic programming. The objective minimization function was the sum of the squared differences between the actual data and the predicted values. R 2 value were computed to quantify the model's goodness of fit.

Uncontrolled Model
The uncontrolled model in this study was based on the work of Xia et al. [22] on modelling the outbreak of MERS. It assumed that the whole population in question was susceptible and that the disease only spread to individuals with no zoonotic infections. The model classified individuals as either susceptible; exposed; asymptomatic-infected; mildly symptomaticinfected; hospitalized; or removed, which was defined as recovered or deceased. The uncontrolled model used in this study used the same assumptions of the model on which it was based, but did not include the asymptomatic category because doing so would require the inclusion of more parameters and, most importantly, would introduce a high degree of uncertainty to the model's predictions because the number of asymptomatic individuals and their transmission rate during the COVID-19 pandemic were unknown. Figure 1 shows the structure of the current model: β1 represents the transmission coefficient of the symptomatic infected cases to the susceptible, β 2 is the transmission coefficient of the hospitalized cases to the susceptible, 1/σ is the mean time of incubation period, 1/λ is the mean time from symptoms onset to hospitalization, 1/κ is the mean duration for hospitalized cases for survivors and 1/μ is the mean duration from hospitalization to death.
Based on the preceding assumptions and structure, the uncontrolled model was given by the following differential equations: = + (5) The basic reproduction number associated with this model was derived as shown in Appendix A and is given by: where N is the total population of the country and is also the sum of all of the susceptible, exposed, infected, and hospitalized individuals and S 0 is the initial number of susceptible individuals. Table 1 shows the list of model parameters that were either estimated based on the literature or that were extracted through fitting. The   [25]. κ and μ, β 1 , β 2 were obtained by fitting. Table 1 also provides the values of the model parameters from two other studies on the spread of COVID-19, one of which was conducted in Portugal [26] and the other of which was conducted in Wuhan, China [25]. Figure 2 shows the fitting results for the total number of cases and recovered individuals for the period from 2 March 2020, which is when the first case was reported, to 23 March 2020, which was the day that the government began imposing control measures. Fit quality was quantified in terms of R 2 . The R 2 for the cumulative number of infected individuals was 0.934 ( Figure 2a). The R 2 for the number of recovered individuals was 0.940 ( Figure 2b). Substituting the obtained parameters' values into the expression of the basic reproduction number given by Eq. 6 yielded R 0 = 4.75 > 1, which indicates that the disease would have spread more quickly if insufficient control measures had been implemented.

Controlled Model
The uncontrolled model was augmented by incorporating the effect of governmental actions and self-imposed precautions taken by the population through the following parameters [22]: = + + 1 + 2 + 3 (11) where l 1 and l 2 are the self-protection measures taken by symptomatic individuals and hospitalized individuals, respectively, that served as multiplication factors for the transmission parameters β 1 and β 2 and d 1 , d 2 , and d 3 are governmental actions, namely requiring isolation and monitoring of, exposed, infected, and hospitalized individuals, respectively. l 1 , l 2 , d 1 , d 2 , and d 3 were obtained by fitting the data from after 23 March 2020, which is when the Saudi Arabian government implemented policies to limit the spread of COVID-19. We tried to use constant values for these parameters but the fit was unacceptable (Figure 3). Thus, R 2 was 0.800 and 0.886 for Figures 3a and 3b, respectively. The poor fit was likely due to the fact that the government's actions changed over time and geography. Therefore, d 1 , d 2 , and d 3 were defined as time-varying parameters following [24], who proposed a SIR model with timedependent infectivity and recovery rates and showed that their model better predicted the long-term evolution of the COVID-19 pandemic in several countries than static models. Each variable was   assumed to follow the simple linear relationship d i = d i0 + d i1 t where i = 1, 2, 3. Figure 4 shows the fit results for the cumulative number of infected and recovered individuals using the time-varying parameters from Table 2. The fit of the time-varying model was better than the constantparameters model (Figure 3) as indicated by the timevarying model's R 2 value of 0.920.
Next, predictions were made on how the pandemic would evolve according to two scenarios (Table 3). In scenario 1, the government continued to impose strong mitigation policies until the end of April 2020 and the population mildly adhered to self-protection measures. This was a realistic scenario because the government continued to threaten to impose more severe policies after repealing them. Scenario 2 reflects reality in which the government relaxed its mitigation policies and the population mildly adhered to self-protection measures. Figure 5 shows the predictions for scenario 1, in which the total number of infected individuals would have reached a peak of 66,750 by 4 May 2020 and the pandemic would have decreased in intensity by 99% by the end of June 2020. Figure 6 shows the predictions for scenario 2, in which the total number of infected individuals was predicted to reach a peak of 270,877 by 20 May 2020 and the pandemic would have decreased in intensity by 99% by 10 August 2020.

Discussion
The results of the study were strengthened by using a structured SEIR model that was successfully applied to understand the spread of MERS [22]. The model used in this study incorporated time-dependent parameters following [24] to produce more accurate long-term predictions about the evolution of the COVID-19 pandemic. The proposed model was validated using open data through a rigorous optimization process. The model predicted that the 95% CI for the cumulative number of infected individuals would be 271,256 ± 702 by 10 August 2020. Given that the actual figure was 289,947, the prediction error was 6.5%. The next-best prediction was produced by [18] who predicted a cumulative number of 359,794 infected individuals by 7 September 2020. Given that the actual figure was 321,456, the prediction error was 10.7%.
Our proposed model did not predict the number of recovered individuals well. The likely cause of this result is that recovery from disease is a complex process that depends on many factors that were not accounted for by the model, such as healthcare quality that could be measured by the hospital bed-population ratio, age of infected individuals, and comorbidities.
A final note should be made that parameter estimation optimization is limited by the inverse problem in which no unique solutions and a large number of local optimum solutions can be found [27]. For this reason, it is important to reproduce data by relying on epidemiological information about parameter magnitudes. Moreover, in the inverse problem, model parameters are not continuously dependent on data. Therefore, small and unavoidable errors in the data may lead to large changes in the parameters. This was observed in our study to some degree for some of the fitted parameters.

Conclusions
Epidemiological models, such as the one proposed in this study, can be used to run simulations to understand the effects of government actions and general behaviors, such as social distancing and wearing masks, on the spread of infectious diseases. However, modeling human behavior during a pandemic is a challenging task. It is even more challenging for countries such as Saudi Arabia that are hosts to millions of foreign workers from different cultural backgrounds. The success of social distancing and self-protection measures require awareness campaigns that overcome cultural and linguistic barriers. Moreover, even structured epidemiological models, such as the one proposed in this study, cannot capture all of the factors that affect human behavior and the effects of government measures. However, the proposed model used time-dependent coefficients to make predictions about the cumulative number of infected individuals. The quality of the model's predictions depended on how well it was validated using raw COVID-19 data. Epidemiological data is not generally designed for modeling [28]. Fitting becomes more challenging when models include more realistic divisions, such as those in the proposed model. A number of studies on COVID-19 in Saudi Arabia showed that goodness of fit and model structure must be balanced. Simpler SIR models [18] make better predictions than complex, networkbased epidemiological models [19].

Annex -Basic reproduction number, positivity and boundedness of the problem
A. 1 The Basic reproduction number R0 of system (Eqs. [1][2][3][4][5] The basic reproduction number is derived following the techniques presented in [29]. Only the infected compartments of the system (Eqs. 1-5) are considered: