Econometric model building
The latest literature on energy use, CO2 emissions and economic growth emphasizes that the dynamics of economic activity (technological innovation, human capital development, transportation, industrialization, trade openness, natural resources, green technologies) can determine energy use from fossil fuels, CO2emissions and economic growth14,32,34,35,36,37,41,44,45,51. Following the above studies, this study develops four dynamic regression models to examine the impact of transportation infrastructure, power generation, and technological innovation on fossil fuel energy, fossil fuel energy intensity, CO2 emissions, and economic growth. Transportation infrastructure and power generation are included in the model because China is the world’s most populous country with the highest primary energy demand, and about 60% of fossil fuel energy (coal and oil) is used for power generation and transportation. Additionally, the United States is the largest consumer and producer of the oil and natural gas industry, with most fossil fuel energy used for transportation and power generation. China and the United States are using more fossil fuel energy for transportation and power generation, boosting economic growth at the expense of environmental health. Transport infrastructure is broken down into road infrastructure, aviation infrastructure and rail infrastructure to understand which metric uses more fossil fuel energy and pollutes the environment. Thus, the following models based on fossil fuel energy demand and fossil fuel energy intensity can be captured.
$$\:FFE_i,t=f(TI_i,t^\alpha\:1,\:ROT_i,t^\alpha\:2,\:RAT_i,t^\alpha\:3,\:AIT_i,t^\alpha\:4,\:EG_i,t^\alpha\:5)$$
$$\:\textl\textnFFE_it=\alpha\:_0+\alpha\:_1it\textl\textn\:TI_it+\alpha\:_2it\textl\textnROT_it+\alpha\:_3it\textl\textnRAT_it+\alpha\:_4it\textl\textnAIT_it+\alpha\:_5it\textl\textnEG_it\mu\:_it$$
(1)
$$\:FFEI_i,t=f\:(TI_i,t^\alpha\:1,\:ROT_i,t^\alpha\:2,\:RAT_i,t^\alpha\:3,\:AIT_i,t^\alpha\:4,\:EG_i,t^\alpha\:5)$$
$$\:\textl\textnFFEI_it=\alpha\:_0+\alpha\:_1it\textl\textnTI_it+\alpha\:_2it\textl\textnROT_it+\alpha\:_3it\textl\textnRAT_it+\alpha\:_4it\textl\textnAIT_it+\alpha\:_5it\textl\textnEG_it\mu\:_it$$
(2)
The sustainability of a country’s environmental and economic growth depends largely on its demand for fossil fuel energy and its energy efficiency. Higher economic growth can be achieved through increased energy efficiency, while environmental quality can be improved through the use of renewable rather than fossil fuel energy58. Fossil fuel energy use and higher energy intensity lead to higher environmental emissions and economic growth, requiring a shift from traditional fossil fuels to renewable energy sources29,59. Through these studies, showing the relationship among fossil fuel energy, energy intensity, renewable energy use, carbon dioxide emissions and economic growth, the following two econometric models based on carbon emissions and economic growth can be derived.
$$\:CO_2,\:i,t=f\:(TI_i,t^\alpha\:1,\:FFE_i,t^\alpha\:2,FFEI_i,t^\alpha\:3,RE_i,t^\alpha\:4)$$
$$\:\textl\textnCO_2,it=\alpha\:_0+\alpha\:_1it\textl\textnTI_it+\alpha\:_2it\textl\textnFFE_it+\alpha\:_3it\textl\textnFFEI_it+\alpha\:_4it\textl\textnRE_it+\mu\:_it$$
(3)
$$\:GDP_,\:i,t=f(TI_i,t^\alpha\:1,FFE_i,t^\alpha\:2,FFEI_i,t^\alpha\:3,RE_i,t^\alpha\:4)$$
$$\:\textl\textnGDP_,it=\alpha\:_0+\alpha\:_1it\textl\textnTI_it+\alpha\:_2it\textl\textnFFE_it+\alpha\:_3it\textl\textnFFEI_it+\alpha\:_4it\textl\textnRE_it+\mu\:_it$$
(4)
Whereas, FFE, RE and TI denote fossil fuels energy, renewable energy and technological innovation respectively; ROT, RAT and AIT indicates road transportation, rail transportation and air transportation respectively; EG is electricity generation, GDP refers to gross domestic product, CO2 is carbon dioxide emission and EFEI reflects fossil fuels energy intensity. i, t, and \(\:\mu\:_it\) in the above equations represent time period, countries and random error terms respectively.
Depiction, measurement and retrieval sources of panel variable data
The main purpose of this study is to reveal the role of technological innovation, transportation infrastructure, and power generation on fossil fuel energy use, fossil fuel energy intensity, which are responsible for higher carbon dioxide emissions, and economic growth in China and the United States. The database used for the research analysis was taken from the World Development Indicators (WDI), World Bank and OECD for the period 1995–2020. Technological innovation (TI) is measured as the total number of patents, and fossil fuel energy consumption (FFE) is measured as a percentage of total energy use, as shown in Table 1. Transportation infrastructure (Road, Rail and Air) investment as percentage of GDP, GDP in 2015 US$, fossil fuels energy intensity in megajoule/$2017 PPP GDP, electricity generation (EG) in gigawatt hours and renewable energy in percentage of total final energy use. Fossil fuel energy intensity can be used as a proxy for fossil fuel energy efficiency because higher energy intensity is associated with higher productivity levels and vice versa.
Estimating techniques
Cross sectional dependency (CD) test
Due to the continuous development of globalization, economic integration, cross-border trade and trade openness, cross-sectional dependence (CD) in the panel regression is expected to exist during the specific period considered in this study. Thus, to enhance the accuracy and robustness of estimates, it becomes crucial to reveal cross-sectional dependence (CD) in panel data and eliminate its associated problems. The detection of cross-sectional dependence in panel analysis can be assessed by Breusch and Pagan60 LM method, Pesaran et al.61scaling LM approach and Pesaran62 CD test. The measurements for the above techniques can be expressed as:
$$\:\textL\textM\:=\sum\:_i=1^N-1\sum\:_j=i+1^N\textT_ij\widehatp_ij^2\to\:\upchi\:^2\fracN(N-1)2$$
(5)
$$\:LM_s\:=\:\sqrt\frac1N(N-1)\sum\:_i=1^N-1\sum\:_j=i+1^N(\textT_ij\widehatp_ij^2-1)\:\to\:\textN(\text0,1)$$
(6)
$$\:CD_P\:=\:\sqrt\frac2N(N-1)\sum\:_i=1^N-1\sum\:_j=i+1^N\textT_ij\widehat\rho\:_ij\:\to\:\textN\left(\text0,1\right)$$
(7)
$$\:LM_BC=\sqrt\frac1N(N-1)\sum\:_i=1^N-1\sum\:_j=i+1^N(\textT_ij\widehatp_ij^2-1)-\fracN2(T-1)\to\:\textN(\text0,1)$$
(8)
Slope heterogeneity test
Likewise, the presence of heterogeneity, although assuming homogeneous slope dynamics, may produce misleading results. Thus, the Pesaran and Yamagata63slope heterogeneity (SH) test can be used to detect slope homogeneity or heterogeneity among cross-sections. The slope heterogeneity test was first proposed by Swamy64and later improved by Pesaran and Yamagata63. The following equations reflect the manifestation of the Pesaran and Yamagata heterogeneity test.
$$\check\Delta _\rmSH = (\rmN)^\frac12(2\rmK)^\frac – 12\left( \frac1\rmN\rm\check S – \rmK \right)$$
(9)
$$\check\Delta _\rmASH = (\rmN)^\frac12{\left( {\frac2\rmK(\rmT – \rmK – 1)\rmT + 1} \right)^\frac – 12}\left( {\frac1\rmN\rm\check S – \rmK} \right)$$
(10)
The Š in the above formula represents the Swamy test statistic, and K is the number of explanatory factors. \(\check\Delta\)SH represents the homogeneous slope coefficient associated with the null hypothesis, where \(\check\Delta\)ASH denotes the heterogeneous slope coefficient related with the alternative hypothesis.
Exploring unit root properties of panel variables
After elucidating panel variable cross-sectional dependence and slope heterogeneity, the unit root property of each panel variable in the series is found to be decisive. Thus, Levine et al.65and Maddala and Wu66proposed LLC and MW unit root tests, formally known as first-generation unit root tests, which can be used in the current study to scrutinize the unit root properties of panel variables. However, first-generation unit root tests may lead us to perform spurious regression analyzes on cross-sectional dependent panel data67,68,69. This defect can be addressed by the second-generation CIPS unit root test and the enhanced DickeyFuller (ADF) test proposed by Pesaran70. These tests enlighten the use of mean hysteresis cross sections and unity first-difference cross sections to interpret the dynamics. The enhanced ADF test, formally known as the Cross-sectional ADF test (CADF), has the following features:
$$\:Z_it=\beta\:_i+b_iZ_i,\:t-1+C_i\barZ_t-1+\sum\:_j=0^pd_ij\varDelta\:\barZ_t-j+\sum\:_j=0^pe_ij\varDelta\:Z_i,\:t-j+\epsilon\:_it$$
(11)
Where \(\:\barZ_t\) signifies the cross-sectional time (t) average.
The CIPS unit root test shown below relies on the CADF statistics.
$$\:\textC\textI\textP\textS\:=\:N^-1\:\sum\:_i=1^N\lambda\:_i\:\textC\textA\textD\textF$$
(12)
Panel cointegration tests
After clarifying the unit root properties of each panel variable in the model, it becomes crucial to discover the cointegration relationship among the panel dynamics. Thus, the first generation of Pedroni71,72 cointegration tests based on residual dynamics can be applied to fossil fuel energy use, fossil fuel energy intensity, economic growth, and CO2 emission functions for the United States and China. The panel variables in the heterogeneity penal in the null hypothesis are assumed to be non-cointegrated in the long run. The Pedroni panel cointegration test includes the Panel V-stats (\(\:X_V\)), Panel ADF-stats (\(\:X_ADF)\), Panel Rho-stats (\(\:X_r)\), Panel PP-stats (\(\:X_PP)\), all of which are within the dimension. The Pedroni group cointegration test includes the group rho statistic, the group pp statistic, and the group ADF statistic, which are located between the dimensions. When the integration level of the residual is the first order differential, it reflects the acceptance of the null hypothesis (\(\:H_0\)) that there is no cointegration relationship between the panel dynamics. On the contrary, when the residuals are integrated across levels, the null hypothesis is rejected and the alternative hypothesis is accepted, indicating that there is a long-run cointegration relationship among the panel dynamics. Following is the manifestation for the Pedroni residual cointegration test.
$$Y_it=\eta_i + \delta_it +\:\sum\:_j=1^K\beta_ji,\:\:\:Z_jit+\sigma _it\:\:\:\:\:$$
(13)
where \(\:\eta\:_i\) denote the specific influence of each country; \(\:\delta\:_i\)is the trend of deterministic dynamics; Z identifies regressors; and j indicates the number of explanatory parameters (i = 1, 2, 3, 4, … N) and (t = 1, 2, 3, 4,…, T). Estimation using the first-generation method may produce spurious regression results because the cross-sectional dependence and slope heterogeneity of panel data are not taken into account. The second-generation cointegration test of Westerlund73is the most appropriate approach, which addresses the issues of cross-sectional dependence and slope heterogeneity in panel data74,75. The Westerlund error correction mechanism (ECM) can be enunciated as:
$$\delta\textZ_\textit =\:\acute\alpha\:_id_t+\beta\:_i\left(X_i,\:t-1-\:\lambda\:_i\acuteY_i,t-1\right)+\sum\:_J=1^q\beta\:_ijZ_i,t-j\hspace0.17em+\hspace0.17emZ_i,t-j\sum\:_J=0^q\Psi_ijX_i,t-j\:+\:\varepsilon\:_i,t.$$
(14)
where, in the country panel, \(\:d_t\) = (1, t) ́ is a constant term, \(\:\acute\alpha_i\)= (\(\:\acute\alpha_1\)\(\:\acute\alpha_2\)) represents the deterministic trend, and i and t are the entire cross-sections and time periods, respectively. The enuciation for the group Westerlund test statistic is as follows:
$$\:G_t\:=\:\frac1N\:\sum\:_i=1^N\frac\Psi_iS.E.\left(\widehat\Psi_i\right)$$
(15)
$$G_a=\frac1N\:\sum\:_i=1^N\frac\textT\Psi_i\acute\Psi_i\left(1\right)$$
(16)
The enuciation for the panel Westerlund test statistic is as follows:
$$\:P_t\:=\:\frac\widehat\Psi_iS.E.\left(\widehat\Psi_i\right)$$
(17)
$$\:P_a\:=\:\textT\:\widehat\Psi_i$$
(18)
Where\(\:\widehat\Psi_i\) represents the speed of long-term coverage after a short-term shock.
Long-term coefficient estimation method for panel variables
Among contemporary panel data estimation methods, the augmented mean group (AMG) approach and the common correlated effects mean group (CCEMG) method have strong estimation capabilities for cross-sectional dependence and slope heterogeneity, and can be used in this study after confirming the long-term cointegration of panel variables. The CCEMG estimator was proposed by Pesaran70 and subsequently improved by Kapetanios et al.76 to make it more robust to structural cracks and non-stationary dimensions that are indistinguishable from each other. The CCEMG technique can be uttered as follows:
$$\:X_it=\rho\:_1i\:+\:\alpha\:_iY_it+\beta\:_i\kappa\:_it+\mu\:_it$$
(19)
where \(\:\rho\:_1i\), \(\:\alpha\:_i\), \(\:\beta\:_i,\:\:\kappa\:_it\), \(\:\mu\:_it\)denote time-invariant fixed effects that take into account panel heterogeneity, slope coefficients, unobservable collective dynamics, heterogeneous factor loadings, and random disturbance terms, respectively. Equation (19) can be further enriched by integrating the means of the cross-sectional independent and explanatory factors, as follows:
$$\:X_it=\rho\:_1i\:+\alpha\:_iY_it+\Psi_i\bary_it+\lambda\:_i\barz_it_+\:\:\:\:\beta\:_i\kappa\:_it+\:\:\:\:\:\:\beta\:_i\kappa\:_it\:\mu\:_it\:\:$$
(20)
The mean group estimator in the case of slope heterogeneity can be extrapolated to the entire panel, and the mean group CCEMG method can be highlighted as:
$$\:\textC\textC\textE\textM\textG\:=\lambda\:_iN^-1\sum\:_i=1^N\widehat\theta\:_i$$
(21)
\(\:\widehat\theta\:_i\) signifies the parameters that need to be evaluated.
Another robust AMG approach was proposed by Eberhardt and Teal77and Bond and Eberhardt78, which is more similar to the CCEMG approach in allowing for cross-sectional dependence and slope heterogeneity. The only difference between the described strategies AMG and CCEMG is the detection of hidden mutual dynamics, however, both approaches have more similar features. The CCEMG approach allows estimating cross-sectional dependencies as the mutual influence of dependent and independent dynamic parameters. The AMG technique follows a two-stage procedure to reveal hidden mutual dynamic influence and add parameters of common volatile influence to tolerate for cross-sectional correlations. In the first stage, the equation can be estimated using the augmented first-difference least squares method with time dummies.
$$\DeltaY_it=k_i+\betai\DeltaX_it+\lambda\:_if_t+\sum\:_t=2^\textT\rho\:_iDUMMY_t\:+\:\mu\:_it$$
(22)
In the above first-stage equation, Δ is the variation operator, \(\:\rho\:_i\) is the time dummy coefficient, and it is a mutual dynamic procedure. The addition of explicit variables or coefficients for each group member can extend the above equation to a second-stage process. The AMG approach is more similar to the CCEMG approach in averaging each dynamic parameter across the panel and in including an intercept, which reflects time-invariant fixed effects. The AMG procedure can be demonstrated as:
$$\:\widehat\beta\:_AMG\:=\:N^-1\sum\:_t=2^\textT\widehat\beta\:_i$$
(23)
where, \(\:\widehat\beta\:_i\) is the deterministic trend, which can be estimated in regression as follows:
$$\DeltaY_it=k_i+\beta_i\DeltaX_it+\lambda\:_if_t+\sum\:_t=2^\textT\rho\:_iDUMMY_t+\mu\:_it$$
(24)
Finally, another technique based on serial correlation, heteroskedasticity, and endogeneity is the fully modified ordinary least squares (FMOLS) method, which can be used for robustness of the analysis. The FMOLS method functional form can be identified as:
$$\:\widehat\beta\:_FMOLS=\frac1N\sum\:_i=1^N\left[\right(\sum\:_t=1^T(Z_it-\barZ_i)^2)^-1(\sum\:_t=1^T(Z_it-\barZ_i)\widehatEFP_it-T\widehat\delta\:_i\left)\right]$$
(25)
Dumitrescu and Hurlin (DH) non-causality test
The proposed methods AMG, FMOLS, and CCEMG cannot estimate the direction of causal relationships between the variables of interest but can only determine long-run coefficient estimates. The detection of bilateral causal relationships can provide effective strategies for addressing social, economic and environmental issues79,80. The Dumitrescu and Hurline (DH) non-causality test allows for cross-sectional dependence and slope heterogeneity in panel data and is therefore referred to as a second-generation test. Thus, this study attempts to use the Dumitrescu and Hurline (DH) non-causality test to reveal the direction of causality between the panel variables of interest. Although the small sample size is taken into account in the estimation process, this strategy can produce more dynamic and reliable findings. The manifestation for the DH panel non-causality test is expressed as:
$$\:Y_it=\:\alpha\:_i+\sum\:_j-1^J\lambda\:_I^jX_i(t-j)+\sum\:_j-1^J\beta\:_I^jZ_i(t-j)+\:\mu\:_it$$
(26)
where X and Z are regression explanatory factors, whose parameters are denoted\(\:\:\:\lambda\:_I^j\) and \(\:\beta\:_I^j\), respectively, estimated by OLS and autoregression, varying in each i cross-section. The average Wald statistic can be used for the DH causality test associated with the null hypothesis (H0) as shown below.
$$\:W_N.T^HNC\:=\:N^-1\sum\:_i=1^NW_i,\:T$$
(27)
\(\:W_i,\:T\) in the above equation is the Wald statistic of each cross section. The following Eqs. (28) and (279 express the null hypothesis and alternative hypothesis of the DH causality test.
$$\:H_0:\delta\:_i=0\:\textf\texto\textr\:\forall\:\:\texti\:$$
(28)
$$\:H_0\::\:\left\\beginarrayc\delta\:_i\:=\:0\:for\:all\:i=\text1,2,\text3,4,\dots\:.N_1\\\:\delta\:_i\:=\:0\:for\:all\:i=N_1+1,\:\text2,3,4,\dots\:.N\endarray\right\$$
(29)
link