Coronavirus Pandemic Non-linear link between temperature difference and COVID-19: Excluding the effect of population density

Introduction: The spatiotemporal patterns of Corona Virus Disease 2019 (COVID-19) is detected in the United States, which shows temperature difference (TD) with cumulative hysteresis effect significantly changes the daily new confirmed cases after eliminating the interference of population density. Methodology: The nonlinear feature of updated cases is captured through Generalized Additive Mixed Model (GAMM) with threshold points; Exposure-response curve suggests that daily confirmed cases is changed at the different stages of TD according to the threshold points of piecewise function, which traces out the rule of updated cases under different meteorological condition. Results: Our results show that the confirmed cases decreased by 0.390% (95% CI: -0.478~ -0.302) for increasing each one degree of TD if TD is less than 11.5°C; It will increase by 0.302% (95% CI: 0.215 ~ 0.388) for every 1°C increase in the TD (lag0-4) at the interval [11.5, 16]; Meanwhile the number of newly confirmed COVID-19 cases will increase by 0.321% (95% CI: 0.142 ~ 0.499) for every 1°C increase in the TD (lag0-4) when the TD (lag0-4) is over 16°C, and the most fluctuation occurred on Sunday. The results of the sensitivity analysis confirmed our model robust. Conclusions: In US, this interval effect of TD reminds us that it is urgent to control the spread and infection of COVID-19 when TD becomes greater in autumn and the ongoing winter.


Introduction
The outbreak of Corona Virus Disease 2019 (COVID- 19), which poses a public health threat [1,2], has led to more than 93 million subjected infected and 2.01 million dead worldwide as of 17 January 2021 [3], correspondingly up to 23.34 million and 0.39 million for the United States (US), respectively. Early studies have shown that COVID-19 is transmitted between individuals by direct contact or droplets, like coughing, sneezing, talking, or even singing [4]. Air pollution and population density have also been proved as important factors for the transmission and survival of this coronavirus [5][6][7][8][9]. Meteorological changes in indoor and outdoor environmental factors on human behavior, social interactions, or hygiene practices that stimulate the propagation among the infected or susceptible [10]. However, less is certain about the environmental conditions that drive the spatiotemporal patterns of COVID-19 in the US as the confirmed cases are increased dramatically day by day, especially since 26 June 2020, the number of daily new cases is over 40 thousand; even more, up to 200 thousand in 20 December 2020. Some scientists conjecture that low temperature regions enhance the viability of COVID-19 cases [11][12][13]. However, other researchers deem daily average temperature is correlated positively with the number of daily increased COVID-19 cases or weak association between them [14,15]. Here we modeled the effect of temperature difference on COVID-19 transmission using a moving average lag period of 7 days. It also provides the inflection point of all orders of a lag period after excluding the impact of population density on newly confirmed cases. Nonlinear characteristics of daily confirmed cases are absorbed by GAMM, which provides the exact measure to spatiotemporal meteorological data. The influence of meteorological parameters is measured independently by the models' coefficients, and it is significant on the Daily New Cases (DNC). The sensitivity analysis is implemented for the effect, and the consistency shows our models robust.

Data feature
Our case study covered 9 main partitions, including all 50 states and Washington, D.C in the US, the time span is from March 9th, 2020 to January 10, 2021. The data set information is extracted from the center of Systems Science and Engineering, Johns Hopkins University, while meteorological data is from the National Aeronautics and Space Administration. Figure  1 and Figure 2 show the cumulative confirmed cases and temperature differences in different states as of January 10, 2021. The darker the heat map, the bigger the data. Both cumulative confirmed cases of COVID-19 ( Figure 1) and temperature changes ( Figure 2) in the western and southern United States were significantly higher than in other regions.
The dependent variable is COVID-19 DNC. It is known that population density is an influenced factor to the increased cases of COVID-19 [16][17][18], with correlation coefficient of 0.265. The density coefficient of standardization for population density is derived by z-score. Here we eliminate the effect of population density through the COVID-19 confirmed cases dividing by this coefficient. The meteorological factors that might be associated with COVID-19 in the US, is shown in Table 1. The abbreviation forms are given as shown in the first column.

Variance Inflation Factors (VIF)
When multiple variables are analyzed, information overlap often exists between variables. Collinearity test is a suitable way to check the repeated information by the statistic of VIF with the expression shown in (1). R 2 is the complex coefficient of regression to other independent variables with (̂) as the dependent variable. Usually, the multiple correlations will seriously affect the least-squares estimation if the maximum VIF exceeds 10 [19,20].

Generalized additive mixed model (GAMM)
Linear regression is often used when looking for correlations between multiple variables. This is not always the case. Based on the VIF test, we found that the nonlinear feature is more significant about the association with meteorological factors and COVID-19 cases. GAMM is used to explore the nonlinear relationship between weather factors and health conditions [21,22]. Since temperature may last for several days and the incubation period of COVID-19 varies from 1 to 14 days, we used GAMM to test the moving average hysteresis effect of temperature differences (lag0-4). The model is defined as follows: where denotes the number of new confirmed cases in the area i on day t (plus 1 to avoid the logarithm of 0) [23]; α is the intercept of model, WEEK i and STATE i are named as indicative functions to denote the effect of week and state for controlling the effects of   short-term fluctuations in time and random effects in different regions. (•)is a natural spline function for smoothing, and the degree of freedom is selected according to Akaike information criterion (AIC) to avoid overfitting [24].

Descriptive analysis
It shows the correlation coefficients between meteorological variables in Figure 3. The heat map shows there exists a strong correlation between RH and the four factors of TD & ADR & PS & WS, which means the latter variables include the most information of RH, so we haven't selected it in our modeling process to avoid overfitting in the following section.
The TD is a significant factor to affect the transmission of COVID-19 cases, but we cannot sure when is the most significant to this association. Different leading time is considered because of the incubation period of COVID-19, and the average effect of TD lag from 0 to 21 days is considered. The changing curve of this association is show in Figure 4. It seems smoothly, but the inflection point is emerged at the lag order of 4th.

The association with daily updates of COVID-19 and TD
We test the multicollinearity about the variables from Figure 3, the value of VIF shows no co-linear effect between them, which means that little evidence could express linear correlation among factors. Using the GAMM model, we found that when the moving average of TD lags for 4 days, the nonlinear relationship is robust and significant (p < 0.05).
The space difference feature has aroused our attention. Nine representative states are selected according to the Census Bureau's Geographical    Partition for detecting this difference. Here we demonstrate the difference of TD effect in Figure 5 and Figure 6. It also verifies the new confirmed cases have been posed significant influence by population density.
The association of nonlinear feature is evident from the exposure-response curve in Figure 7, which shows that the trend of newly confirmed COVID-19 cases influenced by TD (lag 0-4) is divided into three stages (p < 0.05). Specifically, the relationship is approximately inversely linear when the temperature difference is less than 11.5°C or greater than 16°C, and positively linear when greater than 11.5°C and less than 16°C, which hints that the double thresholds of temperature difference for new confirmed COVID-19 cases are set as 11.5°C and 16°C tentatively.
The association of COVID-19 cases with the combined effect of temperature difference and week effects is demonstrated in Figure 8. We try to find out the most significant week-time node, and the surface shows newly confirmed cases are waved at different range of TD and week time. It fluctuates the most significance on Sunday with the maximum change of -0.264% (95%CI: -0.381~ -0.381). The two thresholds of 11.5°C and 16°C make the effect of temperature difference into three parts. We chose linear model to quantify the impact of TD for every part, then a piecewise linear regression model is established here.
Our results show that the number of new confirmed COVID-19 cases decreases by 0.390% (95% CI: -0.478~ -0.302) for every 1°C increase when the TD (lag0-4) is lower than 11.5°C; it will increase by 0.302% (95% CI: 0.215 ~ 0.388) for every 1°C increase of TD (lag0-4) at the interval [11.5, 16], meanwhile the number of new confirmed cases will increase by 0.321% (95% CI: 0.142 ~ 0.499) for every 1°C increase when the TD (lag0-4) is over 16°C (Figure 9).  The X-axis is temperature difference, the Y-axis is the contribution of smoothen to the fitting value, and the red and blue dotted lines are the double thresholds of TD: 11.5°C and 16°C, respectively.  In the modeling process of piecewise linear regression, PS, ADR and WS are considered as the mixed effect to the updates on COVID-19 ( Figure 10), which is highly correlated with TD. The weak effect is demonstrated through the coefficients in Figure 10. In the three stages of TD variation, the impact from these three variables is less than TD as the coefficients are almost lower than 0.1 with significances.

Sensitivity analysis
It is well known that California State is the hardest hit at the early time in the US, with a much larger cumulative number of confirmed cases than other cities. Here we exclude it from the data set for the sensitivity test. Our results show that the nonlinear relationship is not only robust but also significant when the moving average of TD is taken to lag by 4 days (p < 0.05) ( Figure 11).
The number of new confirmed COVID-19 cases decreases by 0.338% (95%CI: -0.428~ -0.249) for every 1°C decreases if TD (lag0-4) is lower than the threshold of 11.5°C, while increases by 0.322% (95%CI: 0.235~ 0.410) if the TD (lag0-4) is greater than 11.5°C and less than 16°C, and increases by 0.004% (95% CI: -0.192 ~ 0.201) when the TD is higher than 16°C ( Figure 13). The association of COVID-19 cases with the combined effect of temperature difference and week effects is demonstrated in Figure 12. In addition, Figure 14 verifies the conclusion that PS, ADR and WS have a weak impact on the COVID-19 update.

Discussion
We focus on the association of environmental temperature differences and newly confirmed cases of COVID-19. The nonlinear relationship is measured by GAMM, which is significant in the sensitive test. The greater the TD, the more updated cases of COVID-19 if the temperature difference is greater than 11.5°C and less than 16°C. When the temperature difference is greater than 16°C, the COVID-19 updates will further    increase, indicating that a higher TD may not inhibit the spread of this novel Coronavirus. But if the TD is lower than 11.5°C, it presents a reverse linear relationship that the greater the temperature difference, the fewer updates of COVID-19. External temperature changes that affect the transmission mechanism of COVID-19 mainly reflect in two aspects: one is the impact of temperature on the survival of COVID-19 virus [25][26][27], which has been confirmed in early studies on SARS virus [28][29][30][31]; second, external temperature changes cause the migration and mobility of population [32][33][34]. Our study sheds some light on the non-linear relationship between ambient temperature and new confirmed cases of COVID-19, which verifies that the number of new confirmed cases of COVID-19 may increase daily without any public health interventions if the weather changes dramatically. Therefore, neither the public nor the government can ignore the link between temperature changes and the virus and take preventive measures.
However, more impact factors may be detached to impact the new confirmed cases, which will be our further research points. First, public health intervention is an important factor in COVID-19 transmission. Second, our study only covered cities in the United States, further verification would be needed to other cities and regions.