key: cord-0074560-io24r9mc authors: Fukushige, Mototsugu; Shi, Yingxin title: Quantile regression approach for measuring production inefficiency with empirical application to the primary production sector for the Xinjiang Production and Construction Corps in China date: 2022-02-06 journal: Asia-Pac J Reg Sci DOI: 10.1007/s41685-022-00228-9 sha: dbc6b8fd69ea139da26224a07a1dce42186661fe doc_id: 74560 cord_uid: io24r9mc We propose a new method to measure production inefficiency by estimating the target and production technology of individual units using quantile regression. This method not only measures inefficiency in total factor productivity but also inefficiencies in input utilizations. We also propose two methods for decomposing the estimated inefficiency. We apply this proposed method for measuring the inefficiency of primary sector production for the Xinjiang Production and Construction Corps in China to clarify its usefulness and advantages. We specify the capital stock using the area sown and other inputs to estimate the production function with the restriction of constant returns-to-scale. Results indicate that lower labor inputs make production inefficient, and the inefficiency of labor utilization makes a large contribution to the mean and variance of total inefficiency. We also compare the proposed inefficiency measure to those employing corrected ordinary least squares and data envelopment analysis. The estimated efficiencies obtained are similar to those for existing methods. However, the proposed method provides additional advantages, including information on the inefficiencies in input utilization. In this paper, we propose a new method for evaluating production inefficiencies. For measuring production inefficiency, we can draw on several useful references including, for example, Coelli et al. (2005) , Fried et al. (2008) , and Kumbhakar et al. (2015) . According to these researchers, most studies are based on two phases of production, production itself or the cost of production, and employ one of two methods, either parametric or nonparametric. After considering the various combinations of these phases and methods, they take one of four approaches. For instance, the estimation of production and cost functions and data envelopment analysis (DEA) are parametric and nonparametric methods, respectively. Of course, each approach has its own strengths and weaknesses. For example, DEA is practical and provides useful solutions to the improvement of efficiency in decision-making units (DMU) and measuring scale elasticity locally (Thanassoulis et al. 2008) . In this paper, however, we choose not to focus on nonparametric approaches like DEA, because it is difficult to use them to clarify the technological relationships between inputs and outputs. In other words, we cannot estimate the marginal productivities of inputs or the marginal costs of outputs, an important consideration in the economic analysis of production efficiency. In addition, in this paper, we do not consider the cost function approach as a parametric method. Most microeconomics texts explain that we derive the cost function through the profit maximization of the economic agent. Consequently, if economic agents do not maximize profit, it is difficult to assume the existence of a stable cost function. For the same reason, when we analyze the inefficiency of some kinds of public sector or nonprofit organizations, it is not possible to assume that agents maximize profits. Of course, we do not consider our chosen empirical study, being the Xinjiang Production and Construction Corps (XPCC) in China, as a profit-maximizing agency. In this context, the production function approach is one of the more general and robust methods as it directly represents the technological relationship between inputs and output. We can then define any inefficiencies using this relationship even if the production units do not maximize profit. However, we also question some of the assumptions entailed in the production function approach. For instance, when we employ the stochastic frontier function approach to estimating the production function and thereby obtain some stable (and sometimes efficient) production function, we need to include inefficiency terms following a probabilistic distribution, for example, Kumbhakar and Lovell (2003) . This means that we require information about the distribution of the error terms in estimating the parameters for the production function. Naturally enough, we question whether it is plausible to assume that an inefficient production unit has the same technology (production function) as the efficient production unit. If we cannot presume the parameters in the production function over production units, we should extend the production function approach to mutually different production functions over production units to loosen the constant parameters hypothesis. For instance, Berger and Humphrey (1991) consider the concept of a thick frontier, which they implement by separating the observations and estimating different cost functions for each quantile. By limiting the evaluation target to production inefficiencies and including some additional assumptions, the proposed method provides not only the inefficiencies in each DMU's total factor productivity, but also its inefficiencies in input utilization. In this paper, we first propose a method to measure production inefficiency. We then decompose it into the contributions of each of the inputs by estimating the target (efficient) production function and the production technology (function) of the ith unit. Next, we propose a method to estimate the target (efficient) production function and the production function of the ith unit by applying quantile regression. We subsequently apply this method to measure the inefficiency of primary sector production for the XPCC, constructing the capital stock using the area sown and specifying other inputs to demonstrate its practical usefulness and validity. By conducting this case study, we can shed light on the several advantages and challenges when we apply the proposed method to real production data from both technical and empirical viewpoints. Additionally, we compare the proposed inefficiency to measures obtained by corrected ordinary least squares (COLS) and DEA to investigate the characters of the proposed method when calculating the correlation coefficients and the estimated rankings. The remainder of the paper is structured as follows. Section 2 proposes the method used to estimate the inefficiency of the ith production unit and the two types of decomposition methods. Section 3 conducts the empirical analysis for the primary production sector function of the XPCC in China and compares the proposed inefficiencies to those obtained by COLS and DEA. Section 4 provides some concluding remarks. 2 Measuring the inefficiency of the ith production unit We assume that the production function of the ith production unit is of Cobb-Douglas form and that each of ith production units has an original production technology. Of course, we can extend the production function to more complex functions like the constant elasticity of substitution (CES), translog production, and others. However, in this paper, there are two justifications for adopting the Cobb-Douglas form. The first is that the Cobb-Douglas production function has been and remains the most popular production function employed by many empirical researchers. For a historical assessment, see Biddle (2020) . The second justification is that quantile regression estimation sometimes provides an unnatural result when we use it to estimate nonlinear functions. For example, the regression lines for some quantiles cross each other in some local regions, as also noted by Hao and Naiman (2007) . To overcome this problem, Kuosmanen and Zhou (2021) include shape constraints, while Cai and Xiao (2012) , Goldman and Kaplan (2018) , and Yu and Jones (1998) propose use of a semiparametric or local linear model instead of simple linear regression. However, in this paper, we do not employ any of these complex approaches. Instead, we assume that each production unit faces a production function with different parameters. We then set the ith production unit's production function in log-linear form as follows: Here, we assume that the ith production unit has its own coefficients, which we represent with superscript (i). The estimation method for this is described later in this section. We set the coefficients of the production function for the target or efficient technology by adding the suffix + and estimate the ideal production level when the ith production unit adopts the target technology Of course, the ideal and actual levels of production differ If we can estimate the target and ith unit's production technologies (i.e., parameters), we can easily define their technical differences. When the target technology is efficient ( log Y i + > log Y i ), we decompose the ith unit's relative inefficiency into the inefficiencies of total factor productivity (TFP), capital utilization, and labor utilization as follows: We then introduce an additional assumption that the ith unit's production technology explains nearly all of the ith unit's actual production and so the residuals in Eq. (1) are almost zero ( i (i) ≈ 0 ). If not, we include the residuals into TFP inefficiency These decomposed parts are in log-linear form, so we also define them in multiplicative form. When we define the multiplicative form, we transform them into reciprocal form to indicate relative inefficiency to the ideal size of inputs = + − (i) ∶ Inefficiency of TFP, + + − (i) * log K i ∶ Inefficiency of capital utilization, ∶ Inefficiency of TFP, Some much earlier economic development studies, including Hayami (1969) and Hayami and Ruttan (1970) , assume constant returns-to-scale (CRS) technology and estimate the labor productivity equation. When we assume CRS, i.e., (i) + (i) = 1 , we incorporate this relation into Eq. (1) and rewrite it as follows:and We define the decomposition of the total inefficiency of labor productivity as follows: where we estimate target labor productivity using These two decomposition methods are quite new, because the existing method for estimating production (in)efficiencies, e.g., the frontier function approach, does not provide the inefficiencies of input utilization. At first sight, this seems like the improvement plan proposed in DEA. However, unlike DEA, the proposed method provides separate inefficiencies for each input utilization. When we have repeated observations of the actual production levels of the ith unit for different levels of input, we can easily apply OLS methods to estimate the ith unit's production technology. However, if we have only cross-sectional or panel data including possible technological change over time for each of the production units, it would seem difficult to estimate each unit's technology using a simple estimation method. However, in this paper, we propose a method to rectify this by utilizing a quantile regression approach. Prior to Eq. (1), we state that "…we assume each production unit has a production function with different coefficients." However, we also consider an additional assumption about each production unit in that we consider that the production function parameters gradually change according to the changes in the total inefficiencies and that we can estimate these using quantile regression. Koenker (2005) provides a discussion of quantile regression, a method usually applied when the error term includes heteroscedasticity with respect to the level of the dependent variable. We assume a linear regression model with an error term as follows: When we estimate the OLS estimator, we minimize the following objective function: For the quantile regression, we define the qth quantile and quantile point ( q ) for the dependent variable where F y q is the distribution function of y i . Of course, the inverse function of F y q implies that Assuming y i = X � i + i , this can be rewritten in terms of the conditional probability as follows: To estimate q in a qth quantile, we fix q and minimize the following objective function with respect to q : This yields a variation of the least absolute deviation estimation. In this paper, we focus on an assumption introduced earlier that the ith unit's production technology explains almost all the ith unit's actual production and the residuals in Eq. (1) are almost zero ( i (i) ≈ 0 ). The actual estimation method is as follows. We first estimate Eq. (1) using quantile regression by changing q from 0.005 to 0.995 in increments of 0.005. Then, we estimate the ith DMUs qth quantile regression as Among the estimated residuals ( i(q) ), we then search for the optimal q ( q (i) ) for each ith production unit to minimize the absolute value of the residual ( i(q) ) Identifying the optimal q ( q (i) ) for each production unit, we consider that the estimated qth quantile regression represents the ith production unit's production technology By the definition of q (i) , we consider the residual for the ith production unit as being almost zero, such that i(q(i)) ≈ 0 . In other words, we can consider this optimal q (i) as a measure of production inefficiency for the ith production unit. As for the target production technology, there are several methods available to estimate the target (or efficient) technology. These include, for example, the COLS, frontier function, and stochastic frontier function when estimating the Cobb-Douglas production function. In this paper, we propose a method to utilize a quantile regression approach to this same problem. We consider a certain higher quantile regression represents an efficient technology, for example, q = q + = 0.9 or 0.95. Of course, this setting involves some arbitrariness in selecting the target quantile (q +). In practical terms, we should consider the size of the sample and a q fixed at some higher point to obtain stable and reasonable empirical implications. We discuss this problem later. When we set a quantile for the target technology as q = q + , then the ideal level of the ith production unit is defined as Using the notation introduced above, we rewrite Eq. (3) as In this decomposition, we include ε i(q(i)) , but the actual estimated ε i(q(i)) is near zero. As it is difficult to illustrate this decomposition using a two-dimensional graph in a two-input and one-output case, we illustrate it with a one-input y i and one-output x i case in Fig. 1 . After estimating the target production function and the production function of the ith production unit, we translate the production function of ith production unit, so that it meets the y-intercept of the estimated target production function. We then define the inefficiency of TFP as the difference between the estimated y-intercept and the transferred y-intercept of the production function of the ith production unit. We also define the inefficiency of input utilization as the gap between the fitted production levels of the target production and the translated production function of the ith production unit with the input level fixed at the ith production unit's level of input. This decomposition provides richer information about the ith unit's production inefficiency. When Eq. (3) holds for every observation, we obtain two standard decompositions. where Mean x i is an operator to calculate the arithmetic mean of x i where N is the total number of observations. We also decompose the relative contribution to the average inefficiencies ith Unit's Production Function Inefficiency of TFP xi: ith unit's Input Level In the case of CRS, this decomposition becomes This provides the contribution of the inputs through their marginal productivities. If the relative input level is less than the optimal level, the marginal productivities become higher and the contributions to the inefficiencies are negative. where Var x i andCov(x i , y i ) are operators for calculating the variance of x i and the covariance of x i andy i , respectively We also decompose the relative contribution to the average inefficiencies This decomposition is like that Asdrubali et al. (1996) proposed for decomposing the degree of risk sharing. As for the mean decomposition, this shows the variations in the marginal productivities. If we can assume profit maximization for each unit, the marginal productivities of inputs are equal. Any variation over production units implies that these production units either do not maximize their profits or face different factor prices. These decompositions are also an advantage of the method proposed in this paper. We apply these decompositions to an actual example in the following section. Before proceeding to the empirical case, we discuss the proposed inefficiency measure and how it differs from other measures using quantile regression. In empirical research, quantile regression is used when heteroskedastic error terms clarify the relationship between the regression line and heteroscedasticity. Most studies then follow a typical quantile approach and discuss the changes in the estimated parameters and their statistical significance. See, for example, Dimelis and Louri (2002) , Ito (2004) , and Yasar et al. (2006) . However, some studies already utilize quantile regression for efficiency measurement as in this paper. For example, Bernini et al. (2004) , Wang et al. (2008) , and Behr (2010) evaluate unit efficiencies using the distance from the estimated quantile regression line. However, most of these earlier studies assume that the inefficiencies are defined as errors or discrepancies from the optimal or most efficient case as estimated by quantile regression. In our proposed inefficiency measure, given the definition of q (i) in Eq. (7) and the inefficiencies in Eq. (9) or their corresponding CRS cases and assuming that each observation is on its corresponding regression line, we regard that each production unit employs a different production technology. In other words, we consider that the sources of each unit's inefficiencies arise in the technology adopted by each unit. We then use quantile regression technique to estimate the technology that each production unit employs. Several studies explore the productivity and growth of either local or agricultural production in China. For example, Arayama and Miyoshi (2004) analyze regional labor productivity and TFP by estimating a production function, while Watanabe and Tanaka (2007) assess regional industry-level efficiencies using DEA. Elsewhere, Fan (1991) , Fan and Pardey (1997) , and Fan and Zhang (2004) investigate the productivity growth of the regional agricultural sector using several alternative approaches, and Lin (1992) 2012) consider TFP using firm-level data to investigate the sources of productivity growth. However, to date, there is no empirical investigation concerning the economic activities of the XPCC and its primary sector. The creation of the XPCC was to promote economic development, ensure social stability, facilitate ethnic harmonization, and consolidate border defense in the Xinjiang Uygur Autonomous Region of China. Becquelin (2000 Becquelin ( , 2004 and McMillen (1981) provide useful discussions of the history of the Xinjiang Uygur Autonomous Region and the role of the XPCC. Given the organization's paramilitary character and that it is administered by the Chinese central government and the Xinjiang Uygur Autonomous Region, we consider that the farms/ ranches under each division of the XPCC are not profit-maximizing production units. These types of organizations sometimes adopt inefficient technologies in producing outputs, but the technology available for each farm/ranch may differ, because the Xinjiang Uygur Autonomous Region encompasses a very large area (about 1.6 million km 2 ) and is in an area remote from the central government. Accordingly, when we attempt to investigate the production inefficiencies of the primary sector in these local production units, we consider that the method proposed earlier is one of the most suitable estimation methods available. Primarily, as our proposed method applies to the production phase, we do not require a profit-maximization assumption. For this reason, the approach will be applicable to the inefficiency measurement of production units operating in a similar context to the primary sectors in the XPCC. We obtain all data from the 2016 Statistical Yearbook for the XPCC. Table 1 lists the XPCC's divisions and farms (and ranches), comprising some 143 farms or ranches across 13 divisions. Unfortunately, COVID-19 has prevented us from accessing more recent data and limited data availability to older yearbooks. While we estimate a Cobb-Douglas-type production function, we do not have data for one of its factors being the capital stock ( K i ). In a similar situation, Fan and Pardey (1997) considered several inputs, including land, fertilizer, and power, in the agricultural sector instead of capital. We obtain the data for total sown area ( A_sown i ), planted area of fruits ( A_fruit i ), and the number of livestock ( N_livestock i ) from the yearbook. In this paper, we propose a composite variable using these variables as a proxy for the missing capital input as follows: where w 1 and w 2 are the relative weighting coefficients to the total sown area. We also estimate these two weighting coefficients in the estimating process of quantile regression equations for each unit. Some previous studies of primary sector production specify the sown area, for example, Lin (1992) , Wan and Cheng (2001), and Deininger et al. (2014) . However, they do not consider the composite input of total sown area, the planted areas of fruit, and the number of livestock as the capital input. In addition to capital, we specify the amount of electricity used by the primary sector as an additional production input. Modifying the Eq. (1) in Sect. 2.1, we estimate the following equation: where Y_prm i and L_prm i are variables corresponding to Y i and L i in Eq. (1) and Electricity i is an additional input. Table 2 provides the variable definitions and Table 3 their summary statistics. To analyze the changes in the estimated parameters according to q, Table 4 details the estimation results for the quantile regression for some representative quantiles (q) for q = 0.1, 0.25, 0.5, 0.75, and 0.9, where q is the qth quantile. With q = 0.1 and 0.25, the estimated coefficients for the labor input are not statistically significant. This could imply that the labor input is used to produce less efficient crops or that there is a shortage of capital inputs. In other words, the primary sector includes additional labor as hidden unemployment. Regarding hidden unemployment, Lewis (1954) first suggested its presence in developing economies with unlimited supplies of labor, while Dernburg and Strand (1966) later proposed several ways to conduct log Y_prm i = 0 + 1 * log Electricity i + 2 * log L_prm i + 3 * log A_sown i + w 1 * A_fruit i + w 2 * N_livestock i + i. , Minami and Ma (2010) demonstrate that even at the national level, China did not reach Lewis' turning point as the eliminating level of income for hidden unemployment. When we fix q = 0.50, 0.75, and 0.90, the estimated coefficients for electricity are also not statistically significant. This suggests that there is an overuse of electricity in efficient units. All the other estimated coefficients are statistically significant. The estimated weights of composite capital ( w 1 and w 2 ) change according to the change in quantile (q). This suggested differences in utilizing the sown area, the planted area of fruits, and livestock as the capital stock between efficient and inefficient crops. Only for five quantile levels, namely q = 0.1, 0.25, 0.5, 0.75, and 0.95, can we not discern any change in the estimated coefficients. Figure 2 depicts all the estimated coefficients and the band of two standard errors (±2 * sigma) between q = 0.005 and 0.995, and also estimated using the bootstrap method with 400 replications. This type of figure is also prepared by Dimelis and Louri (2002) and Ito (2004) . As shown, most of the coefficients are quite stable and gradually change between q = 0.1 and 0.9, whereas from q = 0.005 to 0.1 and from q = 0.9 to 0.995, there is substantial instability in the regression estimates. For example, the estimated β 1 between q = 0.95 and 0.995 and β 2 between q = 0.005 and 0.1 are volatile and their bands of two standard errors relatively wide. Later in Fig. 5 , we observe similar instabilities in the estimated parameters between q = 0.005 and 0.1 and q = 0.95 and 0.995. This is another practical problem in quantile regression estimation like those discussed in Sect. 2.1. Except for 1 and 2 for some quantiles, most of the coefficients are positive and statistically significant. Of course, where the quantiles are either very low or very high, the estimated coefficients can be unstable, because there are relatively few observations with their residuals being negative or positive. The broad band of two standard errors and the extreme changes in the level of the estimated coefficients in Fig. 2 implies that we cannot estimate stable regression lines for q = 0.995 or q = 0.99 because of the very few observations with their residuals being positive. We consider the estimated production function using quantile regression with q = 0.90 as the targeted production technology. This empirical example shows that we cannot estimate stable results for higher quantiles with only 144 observations in this case. The estimated production function is not of a simple Cobb-Douglas form, so we modify Eq. (4) as follows: ∶ Inefficiency of TFP, 1 * log Electricity i ∶ Inefficiency of electricity utilization, 2 * log L_prm i ∶ Inefficiency of labor utilization, Table 5 details the decomposition of the means and variances. By definition, a positive (negative) value is inefficient (efficient). Table 5 shows that the inefficiency of labor utilization dominates the level of total inefficiency, whereas its variation dominates the variation in total inefficiencies. The inefficiency of TFP and electricity utilization improves relative inefficiencies across levels and the latter offsets the variation in total inefficiency. This suggests that the labor input plays an important role in relative total inefficiencies in both level and variation. Note that the decomposition approach differs from Kalirajan et al.'s (1996) decomposition of the changes in TFP for Chinese agricultural growth. Figure 3 illustrates the distribution of the estimated inefficiencies where the vertical axis represents the number of observations. Because we fixed the targeted production function as q = 0.9, most of the estimated total inefficiencies are positive. However, most of the TFP and electricity utilization inefficiencies are negative. These results also show that the inefficiencies in TFP and electricity utilization improve relative inefficiencies in levels and that the latter reduces the variation in total inefficiency. Moreover, the labor and capital utilization inefficiencies are mostly positive, so these two inputs are the primary sources for the inefficiencies in each farm's production. Wan and Cheng (2001) consider the effects of land fragmentation on the returnsto-scale for several agricultural crops using a translog production function. In this analysis, we first investigate the returns-to-scale for primary sector production. Figure 4 depicts the estimation results for 1 + 2 + 3 with a band of two standard errors (±2 * sigma) to investigate whether the CRS hypothesis holds. As shown, for most quantiles, we cannot reject the CRS hypothesis, that is, 1 + 2 + 3 = 1 . We also report the test statistics for q = 0.1, 0.25, 0.50, 0.75, and 0.9, for none of which we can reject the null hypothesis. Consequently, after incorporating the assumption of CRS, we estimate the following per capita production function: Table 6 provides the estimation results of the quantile regression for the selected quantiles (q): q = 0.1, 0.25, 0.5, 0.75, and 0.9 and the result of the nonlinear least squares (NLS) method. We use this estimation result in Sect. 3.6. All the estimated coefficients are statistically significant at least at the 5% level. The estimated weights of composite capital ( w 1 and w 2 ) change according to the quantile changes. We also (14) log Y_prm i L_prm i = 0 + 1 * log Electricity i L_prm i observe that all the coefficients in Fig. 5 lie within the band of two standard errors of q = 0.005 and q = 0.995. Once again, these figures show that all the coefficients are quite stable and only gradually change between q = 0.1 and q = 0.9 and most are positive and statistically significant. Of course, in some cases, when the quantiles are very low or very high, the estimated coefficients are also unstable, because there are few observations with negative or positive residuals. For example, the estimated 1 between q = 0.005 and 0.1 and 3 between q = 0.95 and 0.995. + 3 * log A_sown i + w 1 * A_fruit i + w 2 * N_livestock i L_prm i + i. . We also consider the estimated production function obtained by quantile regression with q = 0.90 as the target production technology. As before, the estimated production function is not of a simple Cobb-Douglas form, so we modify Eq. (6) to decompose the total labor productivity inefficiencies as follows: Table 7 provides the mean and variance decompositions. As shown, the inefficiency of TFP dominates the level of total labor productivity inefficiency, and inefficiency in capital equipment utilization negatively affects the mean decomposition. ∶ Inefficiency of TFP, 1 * log Electricity i L_prm i ∶ Inefficiency of electricity utilization, There is no dominant part in the variance decompositions. These results imply that the capital equipment levels of most farms are below their optimal level. Figure 6 plots the distributions of the estimated inefficiencies where the vertical axis details the number of observations. Because we fixed the target production function at q = 0.9, most of the estimated total labor productivity inefficiencies are also positive. However, many of the utilization inefficiencies for capital equipment are negative. These results also show that the level of capital equipment for most farms is lower than its optimal level. When we consider the inefficiencies in labor productivity, these two types of decomposition suggest that the inefficiencies of TFP play a major role. The proposed method provides several advantages. One is that it provides input utilization inefficiencies, and we can use these to calculate the contribution of the inefficiencies using mean and variance decompositions. However, the characteristics of the calculated inefficiencies should be investigated. It is especially important to show whether the proposed method provides measures like previously proposed methods. To address this, we compare two different types of efficiency measurement: COLS and DEA. For the COLS, because Eq. (13) is nonlinear, it is not easy to use simple COLS. Instead, we estimate Eq. (13) As for DEA, using the estimated coefficients from nonlinear least squares in Table 6 , we construct Because the logarithmically transformed variables sometimes take negative values, we employ level data for DEA. We set two input variables and one-output variable and estimate the efficiencies using the Charnes-Cooper-Rhodes (CCR) and Banker-Charnes-Cooper (BCC) models, with further details in the Appendix. To compare our proposed method to these other methods in Sect. 4.1, we adopt the estimated optimal q ( q (i) ) estimated using CRS for each observation as the inefficiency measure. Table 8 provides the correlation coefficients between q (i) and the estimated inefficiencies from COLS, BCC, and CCR. Figure 7 illustrates the correlation using scatter diagrams. Together, Table 8 and Fig. 7 suggest a positive correlation between q (i) and the estimated COLS, BCC, and CCR efficiencies, with the correlation with COLS efficiency being the highest (0.90), followed by CCR (0.82), and then BCC (0.68). These results are plausible, because the COLS efficiencies are estimated using the same functional form as the proposed method and inefficiencies calculated by DEA are calculated with level data, while the proposed method and COLS's efficiencies are calculated with logarithmically transformed data, and the CCR model employs more flexible assumptions than the BCC model. These results imply that the proposed method leads to similar results to these other approaches. However, the correlation coefficients are not close to one, so the proposed method also has some unique or original character that the other methods appear unable to capture. For example, we compare the ten best and ten worst farms from our method with those from the COLS, BCC, and CCR approaches. As shown in Table 9 , Farms 128 and 129 in Division 7 are efficient farms according to our quantile regression, but not according to the other methods (i.e., not included among the best ten). In contrast, with the worst ten farms by quantile regression, several are also efficient according to the COLS, BCC, and CCR models. In this paper, we propose a method to measure the production inefficiency of each production unit compared with the target efficient technology by estimating a target production function and each production unit's production function using quantile regression. We also propose two types of decomposition methods for the estimated inefficiency: mean and variance decompositions. We apply this method to measure the inefficiencies of primary sector production in the XPCC after constructing capital stock using the sown area and other inputs. We estimate a Cobb-Douglas production function and its CRS version. We identify two advantages of the proposed inefficiency measure. One theoretical advantage is that we can decompose each unit's inefficiency into several parts corresponding to Eqs. (3) or (6). Another theoretical advantage is that we can provide the relative influence of the decomposed parts of the inefficiencies measured using mean and variance decomposition. As for the empirical advantages, we find that the labor inputs and capital equipment rates are large and low in most farms, because the calculated mean decomposition exhibits positive and negative contributions to the total and total labor productivity inefficiencies, respectively. These findings also imply that the proposed method for measuring relative inefficiencies and their mean and variance decompositions provide useful information when analyzing productivities over production units, even when we use only cross-sectional observations. Additionally, where q = 0.1 and 0.25, the estimated coefficients for the labor input in Eq. (12) are not statistically significant. This could imply that the primary sector includes additional labor as hidden unemployment in the production of less efficient crops. Taken together, these suggest several policy implications. Nonprofit production units, like those in the XPCC's primary sector, sometimes employ inefficient technologies. Our proposed measure sheds light on which inputs or TFP accounts for these inefficiencies. We can then utilize the results for the improvement of each unit's inefficiencies and identify the sources of inefficiencies among the group of units through mean and variance decomposition. Finally, we note some remaining problems. The first concerns the selection of the target production's quantile level. Wang et al. (2008) propose efficient production as a quantile regression where q = 0.99 and apply it to more than 3000 observations. As we have only 144 observations available, we fix q at 0.9, but should consider further how to select the appropriate quantile for small sample sizes. The second concerns the possible extension of this work to cost functions. In our analysis, we estimate a production function, because we were unable to obtain price data. As many previous studies use cost functions, we should extend our analysis to the cost function case. However, we would need to first conceptualize the economic meaning of the mean and variance decompositions. Finally, we emphasize that we need to accumulate additional empirical research to demonstrate the advantages of our proposed method in different contexts. efficiency through varying scale instead of the assumption of the CCR model where scale is fixed. The BCC model is as follows: subject to As shown in the BCC model, the only difference is that an additional constraint, e ′ = 1 , is added to the CCR model, where e ′ is an N × 1 vector of ones. Using one-input ( K = 1 ) and one-output ( M = 1 ) case with N decision units, we can simply illustrate its efficiency measurements. Figure 8 plots the efficiency measures obtained using the CCR and BCC models. As shown, the frontier is calculated as a straight line using the CCR model and a polygon using the BCC model. Some decision units located on their calculated frontiers are efficient DMUs, for which CCR or BCC for corresponding DMUs equal one. Other efficiency measures, for example "Unit i," are calculated as Regional diversity and sources of economic growth in China Channels of interstate risk sharing: United States 1963-1990 Estimating most productive scale size using data envelopment analysis Some models for estimating technical and scale inefficiencies in data envelopment analysis Xinjiang in the nineties Staged development in Xinjiang Quantile regression for robust bank efficiency score estimation The dominance of inefficiencies over scale and product mix economies in banking Quantile estimation of frontier production function Progress through regression: the life story of the empirical cobb-douglas production function Creative accounting or creative destruction? Firm-level productivity growth in Chinese manufacturing Semiparametric quantile regression estimation in dynamic models with partially varying coefficients Measuring the efficiency of decision-making units Moving off the farm: land institutions to facilitate structural transformation and agricultural productivity growth in China Hidden unemployment 1953-62: a quantitative analysis by age and sex Foreign ownership and production efficiency: a quantile regression analysis Effects of technological change and institutional reform on production growth in Chinese agriculture Research, productivity, and output growth in Chinese agriculture Infrastructure and regional economic development in rural China Economic transition, higher education and worker productivity in China Non-parametric inference on (conditional) quantile differences and interquantile ranges, using L-statistics Thousand Oaks Hayami Y (1969) Sources of agricultural productivity gap among selected countries Agricultural productivity differences among countries Foreign ownership and plant productivity in the Thai automobile industry in 1996 and 1998: a conditional quantile analysis A decomposition of total factor productivity growth: the case of Chinese agricultural growth before and after reforms A practitioner's guide to stochastic frontier analysis using STATA Shadow prices and marginal abatement costs: convex quantile regression approach Economic development with unlimited supplies of labour Rural reforms and agricultural growth in China Xinjiang and the production and construction corps: a han organization in a nonhan region The Lewis turning point of Chinese economy: comparison with Japanese experience Data envelopment analysis: the mathematical programming approach to efficiency analysis Effects of land fragmentation and returns to scale in the Chinese farming sector The cost effects of government-subsidised credit: evidence from farmers' credit unions in Taiwan Wage growth, landholding, and mechanization in Chinese agriculture Efficiency analysis of Chinese industry: a directional distance function approach Discussion of Mr. Farrell's paper Productivity and exporting status of manufacturing firms: evidence from quantile regressions Local linear quantile regression Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Conflict of interest The authors declare they have no conflicts of interest. Informed consent The analysis did not require informed consent from any participants. Using OLS estimation where ̂ is estimated coefficients and ̂ i, s are residuals, Winsten (1957) proposes a method to estimate the efficiency of the production for ith production units as follows:where Max ̂ i, � s is the maximum of the residuals for i = 1, 2, 3, … , N. Based on the estimation result by least squares, this method considers the ith observation leading to maximum residuals as the most productive unit. Then, the most productive unit has its efficiency measure equal zero and other units have negative efficiency measures. The two most popular DEA models are the CCR and the BCC. The CCR model, after Charnes et al. (1978) , is a flexible measurement approach to relative efficiency given the assumption of CRS. Assume that there are K inputs and M outputs for a sample of N DMU. The total input vector is X = K × N and the total output vector is Y = M × N , comprising the whole dataset. The column vectors for a specific production unit, such as the ith farm, are represented as x i for the input vector and y i for the output vector. The CCR model considers the following optimization problem: subject to where CCR is the minimization task as the objective function, and which represents the total efficiency rating of the DMUs. represents the N × 1 nonnegative vector. The overall efficiency scores are obtained as with respect to and only operates relative to the inputs in the constraint. We need to solve this problem for each DMU N times. The model minimizes input consumption by keeping output constant, which is the dual problem commonly conducted rather than the primary problem of maximizing output. The alternative BCC model, after Banker et al. (1984) and Banker (1984) , extends the CCR model to one including the assumption of variable returns-to-scale (VRS). As a result, the BCC model can address the situation where we can improve y i = X � î+ � i, , i = 1, 2, 3, … , N,