Coronavirus Pandemic Prediction of COVID-19 Confirmed Cases in Indo-Pak Sub-Continent

Introduction: The global COVID-19 pandemic originated from the Chinese city of Wuhan and gradually reached every end of the world. It has adversely affected economies of developed as well as underdeveloped countries, the subcontinent has been hit badly by the negative consequences of deadliest coronavirus. People are getting affected by the virus in large numbers and cases in growing on daily bases. Methodology: The present study employs Automatic ARIMA through R package “forecast”, to predict the growing number of cases for upcoming 14 days starting on 1st July 2020 and ending on 14 July 2020. Using 107 daily observations of the confirmed cases of COVID-19, it seems an important concern to predict the cases to help governments of the region plan accordingly. Results: The outcomes of the study indicate that ARIMA applied on the sample rationally predicts the confirmed cases of coronavirus for next 14 days in the subcontinent. An increased trend is observed for Pakistan and India with constant cases for Bangladesh in the coming 14 days. Conclusions: Pakistan is having the highest predicted growth rate in terms of cases followed by India. Therefore, the governments need to build adequate policies in order to contain the spread of the virus.


Introduction
The coronavirus disease (COVID- 19), instigating around a year ago from Wuhan city of China, no doubt resulted in collapsing the global healthcare systems and had a ripple effect on social and economic crises as it spread uncontrollably and infected people at large. As of 6 January 2021, the World Health Organization (WHO) reported 86,405,927 confirmed cases including 1,868,768 deaths globally. The deaths are still increasing in many parts of the world and newer variants of the virus are also reported in media. The COVID-19 declared as pandemic by WHO is also labelled as black swan event [1], and likened to the economic scene of World War II [2]. All the governments were seen working hard in response to flattening the curve, and took measures including quarantine, full or partial lockdowns, border closures, restrictions on travel, etc. [2]. The COVID-19 is not only a global pandemic and public health crisis; it has also severely affected the global economy [3,4], financial markets [5][6][7] air quality [8,9], and psychological wellbeing [10] among others.
It was noted that chronic nature of coronavirus outbreak and its spread, and lack of treatment and cure is creating a crucial environment for mental well-being [11]. The broadcast media is spreading fear and negativity and is a reason for panic and rise of xenophobia [11]. The serious health crises have transformed into economic recession, seriously affecting economic players like United Kingdom, United States of America, United Arab Emirates, Italy, Spain, Australia and Germany [12]. The oil prices dropped to its lowest, confirming the spread has a negative impact on crude oil prices [13]. In a similar context, multifractality was detected in the European stock exchange markets, and was advised that stock markets need to be constantly monitored [14]. To confront the economic crises, relative governments stipulates that there is no other solution than lockdown, quarantine, self-hygiene behavior and following the standard operating procedures. Taking protective measures like practicing social distancing, staying at home, avoiding gatherings, self-quarantine and frequently washing hands can help contain the virus.
The countries Pakistan, Indian and Bangladesh (called as Indo-Pak sub-continent), adversely affected by the virus, and are facing highest number of confirmed cases in the South Asian region in the middle of June 2020. The COVID-19 spread in Indo-Pak subcontinent accounted for 11,385,992 confirmed cases (India = 10,375,478; Pakistan = 492,594; Bangladesh = 517,920) and 168,282 (India = 150,151; Pakistan = 10,461; Bangladesh = 7,670) deaths as on 6 January 2021. There is a shortage of ventilators, personal protective equipment and other important medical items in the region. India confirmed its first coronavirus case on 30 January 2020 in southwestern coastal state, Kerala [15], whereas, Coronavirus hit Pakistan on 26th February 2020 [16] and Bangladesh on 8th march 2020 [17]. An increasing trend was observed in all the three countries after the emergence of first coronavirus case, with 53 confirmed cases on 15 March 2020, and approximately 209,337 confirmed cases as on 30th June 2020 in Pakistan. Similarly, India had 82 confirmed cases (on 15 March 2020) that rose to 585,481 cases (on 30 June 2020) and Bangladesh experienced a growing trend as well, with 3 cases on 15th March 2020 and 149,000 confirmed cases on 30th June 2020 (shown in Figure 1). The sub-continent holds great importance in terms of trade and business due to its geostrategic position [18,19]. The sub-continent is densely populated, comprising a mix of cultures, languages, religions and people.
The expediency of performing time series analysis in situations similar to COVID-19 increases, as it is one of the scientific ways of forecasting. Prediction based on statistics helps policymakers to prepare strategies for forthcoming crises, which ultimately can help contain the deadly consequences of the pandemic. Auto-Regressive Integrated Moving Average (ARIMA) is used to forecast future by making use of time series data. The application of model can be noted in many fields e.g. electricity prices [20], primary energy demand [21] etc. The application can also be seen in medical sciences [22][23][24][25] generally and specifically in epidemics [26,27]. Explicitly, studies grounded on COVID-19 employed ARIMA model to predict the epidemiological trend of the occurrence and degree of the pandemic [28]. Congruently founded parallel results regarding prediction of COVID-19 were also studied and a similar study was recently published to forecast the cases in the European region [23]. The prediction about the cases in the sub-continent was less observed in the literature and hence current study was formulated to predict the increase in cases for next 14 days.
The study will prove extremely helpful in formulating policies and strategies for the said region. This research offers crucial contributions to the COVID-19 body of knowledge research as it relies upon the data of confirmed cases from 15th of March to 30th June 2020 and predicts the next two weeks. The next section comprises of design and methods that provide details about the forecasting mechanism, followed by results based on 80% and 95% confidence intervals. The final section includes the conclusion that infers implications and recommendations of the study for government departments, policymakers and health ministries of sub-continent.

Design and Methods
The study employs Auto-Regressive Integrated Moving Average (ARIMA) model to estimate the upcoming cases of COVID-19 in the sub-continent. Conveniently, time series data of daily confirmed cases of coronavirus emerging in the said region was considered. The ARIMA, holding great importance for forecasting analysis, originated from autoregressive model (AR), the moving average model (MA) and the combination of AR and MA. For stationary time series data, the ARIMA model is used i.e. when there are no missing values. An identified underlying process based on observations is generated in ARIMA analysis to produce a precise process-generating mechanism resulting in a good model (Box & Jenkins, 1976). The ARIMA analysis comprises of identification estimation, and diagnostic checking [29]. Generally, ARIMA model is considered as a filter that tries to separate signals from noise, and the signal further helps to extrapolate the future for obtaining forecasts.

Data
The daily data of COVID-19 confirmed cases was collected from John Hopkins University (JHU), Center for Systems Science and Engineering supported by Esri Living Atlas Team and Applied Physics Lab of JHU [30]. The data about reported COVID-19 cases of subcontinent was used for this study because: (i) subcontinent is at high risk because of population density and business connections all over the world (ii) subcontinent exhibited a high peak of cases in the recent days (iii) daily cases of confirmed cases for the region were collected from March 15, 2020 to June 30, 2020 which corresponds to 107 observations. The selection of these three countries is done based on highest daily growth (∆ [Xn-Xn-1]) as it shows as non-constant growth of the daily confirmed cases, which is calculated by taking the first difference.

Methodology
Auto ARIMA (autoregressive integrated moving average) is a frequently used technique for forecasting using the time series data, specified by three order parameters: (p, d, q). where "p" stands for the order of auto regressive model, "d" is the order of differencing and "q" represents the order of moving average. The procedure of fitting an ARIMA model is also referred as the Box-Jenkins method [31], where p, d and q are the orders of AR part, Difference and the MA part respectively. AR is a class of linear model where the variable of interest is regressed on its own lagged values. If yt is modeled via AR process, it can be written as: where: δ is intercept; − are regressors; − and ϵ is an error term(ϵϵ).
MA is another class of linear model. In MA, the output or the variable of interest is modeled via its own imperfectly predicted values of current and previous times. It can be written as follows in terms of error terms: = + 1 ϵ −1 + 2 ϵ −2 +. . . + ϵ − + ϵ (2) The mathematical form of ARMA (p,q) is as follows: In short, we can rewrite the above equation as:

Parameter Estimation and Model Selection
For parameter estimations, "auto.arima" function was used in R package "forecast" [32]. The purpose of using this package is to fit best the ARIMA model to univariate time series and returns best ARIMA model according to either Akaike Information Criterion (AIC), or its small-sample equivalent (AICc) or Bayesian Information Criterion (BIC) value. The function conducts a search over possible model within the order constraints provided. In Table 1, the details of the model with corresponding AIC values are documented. Based on AIC, the best model of Pakistan, India, and Bangladesh are highlighted.

Results
The best ARIMA model was selected based on Akaike information criterion (AIC) [33] and final selections are reported in Table 2. The best fit models are employed to predict the daily growth of coronavirus cases in all three countries under consideration. Predictions are made in the region for upcoming 14 days from 7/1/2020 to 7/14/2020. An illustration of the predictions can be seen in Table 2 for the region on 80% and 95% confidence intervals (CI). It also depicts minimum and maximum values for both CI. An increasing trend is seen in Pakistan, with an average upsurge of approximately 10,992 cases. The number of cases is predicted as minimum 7,685 and maximum 14,323 considering 95% CI in Pakistan. Similarly, in India the number of cases will grow by 10,621 with the average ranging from 8,426 (minimum) to 13,014 (maximum) in the first two weeks of July 2020.  The trend observed in Bangladesh is constant with no further increase in cases for two weeks starting at 1st July 2020 and ending at 14th July 2020 as the cases are forecasted to remain 5,974 per day with zero change.
The results predict that for the coming 2 weeks the highest number of cases will be observed in Pakistan followed by India. The detailed forecast of confirmed cases for 14 days (starting 1st July 2020) is shown in Figure 2 where the blue line indicates the forecast value, dark gray demonstrations 95% confidence interval while the light grey area illustrates 80% lower and upper bounds.
The ACF and PACF plots in Figure 3 and Figure 4 indicate that the residuals are behaving like white noise as no significant autocorrelations are observed. A portmanteau test is applied to the residuals of all fitted ARIMA models in order to test the "overall" randomness based on several lags. Meant for the Box-Pierce test the significant p values of also indicates that the residuals are white noise.

Discussion
The current study was aimed to forecast the growth in COVID-19 cases in sub-continent for the upcoming 14 days starting 1 July 2020 till 14 July 2020. The subcontinent was selected as in the Asian countries; it is experiencing the highest growth rate in terms of confirmed cases of COVID-19. This study is the need for time as sub-continent is regarded as the next epicenter of COVID-19, and this study clearly identifies the situation of the region. The study will provide government with estimates about increase in cases, hence helping them to develop strategies to cope with the upcoming crises. Considering the predicted changes in number of COVID patients, the governments can improve their medical facilities and provide necessities to the people especially those living with lesser medical coverage. The governments need to redesign the strategies like smart lockdown and relief funds to facilitate the poor. The 14 day-forecast for the three countries of the region shows that the number of cases will grow steadily in Pakistan and India but will remain constant in Bangladesh. The result indicates a very serious disruption and crisis situation for the subcontinent, and governments must deal with the health emergency as well as economic issues. More specifically, an increasing trend for the next 14 days with Pakistan having an average of 10,992 additional cases, India on average will have 10,621, while Bangladesh at a constant level of 5,974 confirmed cases was seen. In comparison, there are multiple studies reporting the prediction of confirmed number of COVID-19 cases in different regions. The findings of this study are comparable with the existing COVID-19 forecasting studies. For instance, larger unreported cases [34], close to the number of predicted cases as officially reported [35], predicted peak in February 2020 in China [36], flat pattern during June 2020 in Iran [37] were considerably noted in literature.
The isolation wards in hospitals needs to be expanded with provision of all kinds of medical facilities like ventilators and personal protective equipment (PPE). The medical services providers including nurses, paramedic staff, and doctors must be provided with PPE, to ensure their safety while dealing with the patients. Additionally, policy formulation and its implementation in true spirit should be focused as these countries have to deal with both the COVID-19 health crises and hunger due to poverty, so as to contain the adverse outcomes of deadliest coronavirus.

Conclusions
The current study is based on predictions for COVID-19 confirmed cases for 14 days starting 1st July 2020. It uses the Automatic ARIMA model to forecast the daily growth of COVID-19 cases by undertaking the data of the sub-continent from 15th March 2020 to 30th June 2020. The results obtained from the study reveals that the cases will grow in Pakistan and India and will remain constant in Bangladesh. The study can provide valuable information for policymakers to cope up with the upcoming crises arising from the deadliest consequences of coronavirus. Hence, governments should make counter strategies to deal with economic issues, health emergencies and provide the necessary support to the front liners and underprivileged community.