key: cord-0284302-puqrcood
authors: Wu, Yujia; Lan, Wei; Zou, Tao; Tsai, Chih-Ling
title: Inward and Outward Network Influence Analysis
date: 2022-05-15
journal: nan
DOI: 10.1080/07350015.2021.1953509
sha: f4f3ae7ce6f878292b31ad184ce0f3c56856987c
doc_id: 284302
cord_uid: puqrcood

Measuring heterogeneous influence across nodes in a network is critical in network analysis. This paper proposes an Inward and Outward Network Influence (IONI) model to assess nodal heterogeneity. Specifically, we allow for two types of influence parameters; one measures the magnitude of influence that each node exerts on others (outward influence), while we introduce a new parameter to quantify the receptivity of each node to being influenced by others (inward influence). Accordingly, these two types of influence measures naturally classify all nodes into four quadrants (high inward and high outward, low inward and high outward, low inward and low outward, high inward and low outward). To demonstrate our four-quadrant clustering method in practice, we apply the quasi-maximum likelihood approach to estimate the influence parameters, and we show the asymptotic properties of the resulting estimators. In addition, score tests are proposed to examine the homogeneity of the two types of influence parameters. To improve the accuracy of inferences about nodal influences, we introduce a Bayesian information criterion that selects the optimal influence model. The usefulness of the IONI model and the four-quadrant clustering method is illustrated via simulation studies and an empirical example involving customer segmentation.

Rapid digital technology advancement has produced large amounts of data that can be used for network analyses in various disciplines and professions such as business, engineering, health care, and science. For example, Valente (2010) In practice, one of the important tasks in network analysis is to identify influential nodes (or actors). For example, evaluating the effect of each user's behavior on other users or determining the extent of spread from each COVID-19 carrier to other people.

Hence, it is useful to measure the magnitude of the outward influence each node exerts on others; see, e.g., Zhu et al. (2019) and Zou et al. (2021) . On the other hand, it is also critical to assess the inward receptivity of each node to being influenced by others. Examples include analyzing how the behavior of social media influencers is itself affected by the other users or measuring the receptivity of each person who is exposed to COVID-19 carriers. While much research has focused on outward influence, such studies are limited in their predictive power as they neglect the compounding effect of heterogeneity across nodes in the level of receptivity to influence from others (i.e., inward influence). Our paper improves on prior literature by introducing heterogeneity in inward influence, in addition to heterogeneity of outward influence. We propose a model for working with these two influence factors simultaneously. Many existing models that study outward influence alone can be seen as a special case of our model.

To study both types of influences, we construct a network with n nodes by defining the adjacency matrix A = (a ij ) n×n ∈ R n×n to characterize the relationship between any two adjacent nodes in the network (see, e.g., Carrington et al. 2005 and Scott 2013 ).

Specifically, we define a ij = 1 if there is a direct connection from node i to node j and a ij = 0 otherwise. In addition, we denote a ii = 0 for any 1 ≤ i ≤ n. For example, we define a ij = 1 if user i in the Sina Weibo network follows user j and a ij = 0 otherwise; see the real data analysis in Section 3.2. Subsequently, we normalize a ij and obtain W = (w ij ) ∈ R n×n with w ij = a ij / n j=1 a ij , which is the weighted adjacency matrix.

After constructing the network structure, we then define the response variable Y = (Y 1 , · · · , Y n ) that usually depends on the target of the study. For example, suppose that our aim is to identify the specific users who can most influence others' activity in the Sina Weibo network given in Section 3.2. The Y i (i = 1, · · · , n) could be the number of posts made by the i-th user as a measure of user i's activity. Although the response of the i-th node, Y i , can be influenced by the other nodes' responses, it can also depend on its own attributes, such as gender and duration in the Sina Weibo network, and we name this vector of attributes the nodal covariate X i . Therefore, to identify influential nodes, for each node 1 ≤ i ≤ n, we consider the continuous response variable Y i ∈ R 1 and its associated p-dimensional nodal covariate X i = (X i1 , · · · , X ip ) ∈ R p . By integrating this nodal information with the network structure induced by the adjacency matrix A,

we propose the following model

where b ij is the function of w ij that characterizes the influence effect between nodes i and j for i, j = 1, · · · , n, η = (η 1 , · · · , η p ) ∈ R p is the p-dimensional unknown regression coefficient vector, and ε i is the random error for i = 1, · · · , n. If we parameterize b ij = λw ij by a single influence parameter λ, then the resulting autoregressive model has been studied for a long history in the field of economics (see, e.g., Anselin 1988 , Manski 1993a , Manski 1993b , Bramoullé et al. 2009 , LeSage and Pace 2009 Lee 2010, Gupta and Robinson 2015 , Zhou et al. 2017 , Zhang and Yu 2018 , Gao et al. 2019 , Kwok 2019 and Zhu et al. 2019 . It is of interest to note that this type of model was labeled as "mixed-regression", "spatial autoregressive", "spatial correlation", and "spatial autoregressive model (SAR)" by Anselin (1988) , Manski (1993a) , and LeSage and Pace (2009), respectively, although the term SAR is most commonly used in the literature.

As Gupta and Robinson (2015) and Lam and Souza (2019) Thus, the influence effect of node j on node i not only relies on the network structure w ij , but also depends on λ j . We name λ j the outward influence index of node j. It is worth noting that the influence of node j on node i does not take into account the magnitude of the response of node i. Hence, we cannot measure the true influence of node j accurately. For example, consider a Sina Weibo network with three nodes, j 1 , j 2 and j 3 . Suppose that j 1 is a famous star and both nodes j 2 and j 3 follow node j 1 , while node j 2 is quite active and node j 3 is inactive. Under this scenario, we expect that j 1 has larger influence on j 2 than on j 3 since j 2 renders more responses to j 1 .

This situation can also occur in the infectious diseases network. For instance, let j 1 be a super spreader, j 2 be an older person with chronic susceptibility, and j 3 be a younger person with a strong immune system. It is expected that j 1 will have more apparent influence on the health of j 2 than on the health of j 3 since j 2 is more likely than j 3 to yield an immune response to j 1 . Thus, we cannot detect potential collective outcomes accurately by considering the outward influence index alone. This motivates us to introduce an inward influence measure that can characterize each individual's own response to others' influence.

To determine the real influence, we introduce the Inward and Outward Network Influence (IONI) model given below by assuming b ij in (1.1) to be γ i w ij λ j .

where γ i measures the degree of node i's receptivity to being influenced by others and λ j determines the influence magnitude of node j on others. Hence, we name γ i the inward influence index, and λ j is the outward influence index mentioned earlier. In addition, we assume that 0 ≤ γ i ≤ 1 and 0 ≤ λ j ≤ 1. It is worth noting that the influence indices γ i and λ j are usually related to the nodes' attributes (see, e.g., Trusov et al. 2010 and Zou et al. 2021) , which we denote them Z i and V j , respectively. For instance, in the Sina Weibo example, Z i and V j can include the number of each user's followers and the number of each user's followees, i.e., other people that a given user is following. Accordingly, we parameterize γ i and λ j as the known functions of the nodes' attributes Z i and V j , respectively, in Section 2.1.

To illustrate the usefulness of the γ and λ indices, we consider customer segmentation in marketing as an application. Figure 1 depicts γ and λ, which increase in the directions of the x and y arrows, respectively, with the origin (γ 0 , λ 0 ). Then we classify the nodes of the network into four groups according to the values of γ and λ.

The first quadrant of Figure 1 is group I, which consists of customers with high γ and λ. Accordingly, those customers not only have strong influence over others, but also can be influenced greatly by others. This type of customer shares similar properties to "early adopters" in marketing; see, e.g., Wolf and Seebauer (2014), Catalini and Tucker (2017) . Hence, identifying this type of customer is essential for promoting new products. The second quadrant is group II, in which the customers have low γ and high λ. This group's customers tend to influence others, but not vice versa. They can be named "opinion leaders" or "social influencers;" see, e.g., Li et al. (2013) , Ma and Liu (2014) and Bamakan et al. (2019) . Group III is located in the third quadrant, in which the customers have both low γ and λ. This type of customer neither exerts influence nor is influenced by others. Hence, these customers do not create awareness or show interest in products, and they can be considered "inactive customers;" see Kumar and Reinartz (2018) . Finally, group IV is located in the fourth quadrant, in which the customers have high γ and low λ. This type of customer seems to be very susceptible but has little influence on others, and they can be treated as the "early majority;" see, e.g., Ram and Jung (1994) and Mattila et al. (2003) .

Based on the above four-quadrant clustering, practitioners can segment Sina Weibo's users into the four groups to improve product promotion or opinion dissemination. For example, if practitioners need to promote a new product on Sina Weibo, they could target the "early adopters" (i.e., the users with high γ and high λ). In contrast, if practitioners would like to extend the life cycle of mature products, they should focus on the "early majority" (i.e., the users with high γ and low λ). To effectively make use of the above two types of customers, decision makers should focus on the "opinion leaders" (i.e., the users with low γ and high λ) for promotions. Finally, removing or ignoring prolonged "inactive users" (i.e., the users with low γ and low λ) can cut down on promoting costs. We provide detailed illustrations in Section 3.2.

The aim of this paper is to propose the Inward and Outward Network Influence (IONI) model, which allows us to identify four types of influencers in network analysis.

To estimate the model and influence indices, we study identification conditions and parameterize λ and γ as known functions of the nodes' attributes, respectively. Then, we employ Lee's (2004) quasi-maximum likelihood approach to obtain parameter estimators. In addition, we demonstrate their asymptotic properties without imposing any specific error distributions. A Bayesian information criterion is also proposed for selecting optimal models, and its consistency is demonstrated. Moreover, score tests are provided to examine the homogeneity of λ and γ. Simulation studies and an empirical example for differentiating the four different types of users in the Sina Weibo network are presented to demonstrate the utility of the proposed model. The rest of the article is organized as follows. Section 2 describes the model structure in the matrix form, parameter estimators, four-quadrant clustering, selection criterion and test statistics.

Section 3 presents numerical studies. We provide a discussion with some concluding remarks in Section 4. All technical details are relegated to the supplementary material.

To effectively examine the Inward and Outward Network Influence (IONI) model,

we express the proposed model (1.2) in the matrix form as follows:

where Y = (Y 1 , · · · , Y n ) ∈ R n , X = (X 1 , · · · , X n ) ∈ R n×p , Γ = diag{γ 1 , · · · , γ n }, Λ = diag{λ 1 , · · · , λ n }, ε = (ε 1 , · · · , ε n ) ∈ R n , and ε i are independent and identically distributed with mean 0 and variance σ 2 for i = 1, · · · , n. In addition, γ i (and λ i ) are the inward (and outward) influence parameters or indices, respectively.

It is worth noting that there are 2n unknown parameters in (2.1), which cannot be estimated via the n observations. However, γ i and λ j can each be considered a function of the associated attributes of nodes i and j, respectively. To this end, we parameterize γ i with γ i (α) = F (Z i α) and λ j with λ j (β) = F (V j β), where F (·) is a strictly monotone and known function, and Z i and V j are the corresponding attributes associated with the inward and outward influence indices, γ i and λ j , respectively. In addition,

with v j1 ≡ 1, and the corresponding unknown regression

Moreover, define Z −1,i = (z i2 , · · · , z id 1 ) and V −1,j = (v j2 , · · · , v jd 2 ) , and assume that Z −1 = (Z −1,1 , · · · , Z −1,n ) and V −1 = (V −1,1 , · · · , V −1,n ) are of full rank. Since both Λ and Γ can be expressed as a function of α and β, respectively, we express (2.1) as:

To make the proposed model (2.2) practically useful, one needs to specify the function F (·) that links attributes to influence parameters. We first adopt the sigmoid function, which is often used in machine learning (see, e.g., Hastie et al, 2009) . That is,

. We next consider two related link functions: the inverse of log-log (i.e., F (Z i α) = 1 − e −e Z i α ) and the inverse of probit (i.e., F (Z i α) = Φ(Z i α)),

where Φ(·) is the distribution function of the standard normal distribution.

In order to estimate the unknown parameters in model (2.2), one needs to consider the identification problem of γ i and λ j . That is, for any Γ = diag{γ 1 , · · · , γ n }, Λ = diag{λ 1 , · · · , λ n }, Γ = diag{ γ 1 , · · · , γ n } and Λ = diag{ λ 1 , · · · , λ n }, ΓW Λ = ΓW Λ leads to Γ = Γ and Λ = Λ, where

Specifically, if ΓW Λ = ΓW Λ and w ij > 0 for some

To this end, we impose the constraint that either α 1 = 0 or β 1 = 0, to assure identifiability. To illustrate the necessity for this constraint, we define α −1 = (α 2 , · · · , α d 1 ) ∈ R d 1 −1 and β −1 = (β 2 , · · · , β d 1 ) ∈ R d 2 −1 .

Suppose that α −1 = 0 and β −1 = 0. Then F (Z i α)/F (Z i α) = c implies that α −1 = 0 and F (α 1 )/F ( α 1 ) = c. Similarly, we obtain β −1 = 0 and F ( β 1 )/F (β 1 ) = c. Thus, F (α 1 )F (β 1 ) = F ( α 1 )F ( β 1 ). Using the fact that F (·) is strictly monotone, the model cannot be identified unless we impose the constraint that either α 1 = 0 or β 1 = 0.

Without loss of generality, we assume α 1 = 0, or equivalently Z i α = Z −1,i α −1 .

For the sake of simplification, however, we slightly abuse notation and still use α = (α 1 , · · · , α d 1 ) ∈ R d 1 ×1 and Z i = (z i1 , · · · , z id 1 ) ∈ R d 1 ×1 hereafter, rather than α −1 and Z −1,i , as the parameter vector and its associated attributes, respectively.

Remark 1: In network analysis, most existing studies have focused on outward influence. To construct spatio-temporal models for panel data, however, Dou et al. (2016) introduced a pure spatial effect measure. Without considering the temporal structure, we can consider their measure to be inward influence by parameterizing b ij as γ i w ij . Dou et al. (2016) allowed the temporal observations to tend toward infinity and showed that their parameters are estimable without introducing a link function. However, our model does not consider the temporal structure. Hence, we introduce link functions for influence parameter estimation.

In model (2.2), we do not assume that the random errors are normally distributed.

To obtain nice properties of the estimators, we adopt Lee's (2004) approach by employing the quasi-maximum likelihood estimation (QMLE) method to estimate unknown parameters. Define S(α, β) = I n − Γ(α)W Λ(β). We then have ε = ε(η, α, β) = S(α, β)Y − Xη, and the normal log-likelihood function of (2.2) is

where θ = (η , α , β , σ 2 ) . We next apply the concentrated QMLE method to estimate the parameters. Specifically, given α and β, we maximize (θ) with respect to η and σ 2 , which leads to η(α, β) = (X X) −1 X S(α, β)Y and σ 2 ( η(α, β), α, β) = n −1 ε ( η(α, β), α, β)ε( η(α, β), α, β),

where ε( η(α, β), α, β) = M X S(α, β)Y and M X = I n − X(X X) −1 X . The resulting concentrated quasi-log-likelihood is

By employing the most commonly used quasi-Newton method to maximize the above equation with respect to α and β, we get the QMLE, ( α , β ) = argmax c (α, β). We then obtain the QMLEs of η and σ 2 , which are η = η( α, β) and σ 2 = σ 2 ( η, α, β).

To develop the asymptotic distribution of estimated parameters, we introduce some notation and equations below.

where F (·) is the first derivative of F . In addition, we denote the Fisher information matrix of the quasi-maximum likelihood l(θ) by I n (θ) = − 1 n E ∂ 2 (θ) ∂θ∂θ . To

show the asymptotic properties of parameter estimators, we define J n (θ, µ 3 , µ 4 ) = 1 n var ∂ (θ) ∂θ − I n (θ), where the third-order and fourth-order moments are µ 3 = E( 3 i ) and µ 4 = E( 4 i ), respectively. The detailed calculations of I n (θ) and J n (θ, µ 3 , µ 4 ) are given in Section 1 of the supplementary material. To further obtain the theoretical properties of γ and λ, we define l α = (0 d 1 ×p , I d 1 , 0 d 1 ×(d 2 +1) ) ∈ R d 1 ×(p+d 1 +d 2 +1) and

matrix, and denote D α,n (θ, µ 3 , µ 4 ) = l α I −1 n (θ) + I −1 n (θ)J n (θ, µ 3 , µ 4 )I −1 n (θ) l α and D β,n (θ, µ 3 , µ 4 ) = l β I −1 n (θ) + I −1 n (θ)J n (θ, µ 3 , µ 4 )I −1 n (θ) l β .

(2.4)

We next define the necessary norms and introduce conditions used in the theoretical proofs. Denote · s the s-norm of a vector (or a matrix) for 1 ≤ s ≤ ∞. Specifically, for any generic vector x = (x 1 , · · · , x q ) ∈ R q , x s = ( q i=1 |x i | s ) 1/s , and, for any generic matrix G ∈ R m×q ,

Gx s x s : x ∈ R q×1 and x = 0 .

In addition, define the element-wise ∞ norm for any generic matrix G as |G| ∞ = vec(G) ∞ , where vec(G) denotes the vectorization for any generic matrix G. We subsequently present five technical conditions, which are required to ensure the theoretical properties of the estimates.

(C1) Assume that the random errors ε i are independent and identically distributed with mean 0, and E|ε i | 4+ξ < ∞ for some ξ > 0.

(C2) Assume sup n≥1 ||W || 1 < ∞ and sup n≥1 ||W || ∞ < ∞.

(C3) Assume that S(α, β) = I n − Γ(α)W Λ(β) is nonsingular uniformly over α and β in the compact parameter space P, and the true values of α and β are also in the interior of P. In addition, assume that sup {α,β}∈P sup n≥1 ||S −1 (α, β)|| 1 < ∞ and sup {α,β}∈P sup n≥1 ||S −1 (α, β)|| ∞ < ∞ hold.

(C4) Assume sup n≥1 |X| ∞ < ∞. In addition, for the true parameters α and β, assume

for any k 1 , k 2 , k 3 ∈ {1, · · · , d 1 } and l 1 , l 2 , l 3 ∈ {1, · · · , d 2 }, where the link function F is assumed to be three times differentiable.

(C5) Assume I n (θ) → I(θ) and J n (θ, µ 3 , µ 4 ) → J (θ, µ 3 , µ 4 ) as n → ∞. We further assume that I(θ) and I(θ) + J (θ, µ 3 , µ 4 ) are finite and positive definite, where I n (θ) and J n (θ, µ 3 , µ 4 ) are defined in Section 1 of the supplementary material.

Condition (C1) is a moment condition, which means it is weaker than a distribu- are standard regularity conditions used in spatial literature (Lee 2004 and Yu et al. 2008 ). We can verify that these two conditions are satisfied as long as the network is sparse so that each individual node is connected only to a finite number of other nodes (i.e., n j=1 a ij < ∞ for any i = 1, · · · , n). Accordingly, sup n≥1 ||W || ∞ < ∞ and sup n≥1 ||W || 1 < ∞ are satisfied. In addition, 1 ≤ n j=1 |γ i λ j w ij | + 1 < ∞ for any i. This implies that 1

Condition (C4) imposes conditions on the link function and attributes, and it can be satisfied as long as the link function F (·) has bounded first three derivatives and the elements of attributes X i , Z i and V i are uniformly bounded constants for all i (see, e.g., Assumption 6 in Lee 2004). It is worth noting that the first three derivatives of the three link functions (i.e., sigmoid function, inverse of log-log, and inverse of probit) mentioned in Section 2.1 are bounded and they satisfy this condition. Condition (C5) is a type of law of large numbers assumption for ensuring the convergences of the Hessian matrix and the variance of the score function, in order to establish the asymptotic normality of θ. Similar conditions can be found in Fan and Li (2001) , Lee (2004) and Zhu et al. (2017) . Under Condition (C5), the concentrated quasi-log-likelihood function in (2.3) is concave and has a global maximizer. Based on the above notations and conditions, we then have the following results.

, and D α (θ, µ 3 , µ 4 ) and D β (θ, µ 3 , µ 4 ) are the correspondingly convergent matrices of D α,n (θ, µ 3 , µ 4 ) and D β,n (θ, µ 3 , µ 4 ) defined above.

In practice, we can estimate the unknown quantities D α (θ, µ 3 , µ 4 ) and D β (θ, µ 3 , µ 4 ) consistently by their corresponding estimators D α,n ( θ, µ 3 , µ 4 ) and D β,n ( θ, µ 3 , µ 4 ), where

Using the continuous mapping theorem, we can also estimate F (Z i α) and F (V i β) consistently by the estimators F (Z i α) and F (V i β), respectively.

To make use of the inward and outward influence indices, we can apply the fourquadrant clustering approach to partition all nodes into four groups with known threshold values γ 0 and λ 0 as depicted in Figure 1 . To achieve this end, a heuristic approach can be applied to select the threshold values γ 0 and λ 0 via the empirical quantiles and then classify nodes by comparing their estimated inward and outward influence indices with the threshold values, respectively. However, this simple heuristic approach does not take into account the variation in the estimated inward and outward influence indices. To this end, we consider the following two hypotheses for i = 1, · · · , n:

Their corresponding test statistics are

where D α,n ( θ, µ 3 , µ 4 ) and D β,n ( θ, µ 3 , µ 4 ) are given above in the discussion of Theorem 1. By Theorem 1, we can immediately obtain that T i,γ and T i,λ are asymptotically distributed as N (0, 1), which yields their respective p-values.

It is worth noting that the above testing procedure implicitly involves n multiple tests. This motivates us to control the false discovery rate (FDR) by selecting the threshold of the p-values obtained from the multiple tests and then identifying significant hypotheses and determining the rejection regions (see, e.g., Storey et al. 2004 ).

In sum, Theorem 1 not only provides the asymptotic properties of parameter estimators, but also allows us to classify existing nodes into adequate categories via inward and outward influence indices. Thus, the four-quadrant clustering method can play an important role in network applications.

The influence parameters γ i and λ j in model (2.2) are known functions of attributes.

To make better inferences and predictions, we employ Schwarz's Bayesian information criterion to select attributes in the correct model consistently. Define the full model

. Accordingly, the full model includes all possible attributes (i.e., covariates), while the true model contains relevant covariates. For the sake of convenience, we use generic notation S ⊆ S F to represent an arbitrary candidate model with S = {S α , S β }, and

Based on (2.3), the Bayesian information criterion is

where c ( α S , β S ) represents the concentrated quasi-loglikelihood function for any subset model S. α S = ( α k,Sα : k ∈ S α ) and β S = ( β l,S β : l ∈ S β ) are the maximum likelihood estimates of α S and β S , respectively, and df = |S| is the size of S. Removing the irrelevant constants involved in the concentrated log-likelihood function, we have

Then the optimal model selected by the Bayesian information criterion is S BIC = argmin S⊂S F BIC(S). To obtain the theoretical property of BIC, we introduce an additional condition here.

(C6) For any underfitted model, which means S ⊂ S F but S ⊃ S T , assume that there exists a positive constant c min > 0, such that min

Condition (C6) indicates that the mean of the concentrated quasi-loglikelihood function of any underfitted model is inferior to that of the true model (i.e., no underfitted model can fit better than the true model). Similar conditions can be found in linear regression models; see, e.g., Assumption 3 in Shi and Tsai (2002) and Condition 2 in Wang et al. (2007) . We now present the following result.

Theorem 2. Under Conditions (C1)-(C6), we have P (S BIC = S T ) → 1 as n → ∞.

The above theorem implies that the Bayesian information criterion can determine the true model consistently as long as n tends to infinity. To obtain the solution of S BIC , we apply the backward elimination method (see, e.g., Zhang and Wang 2011), which can reduce the computational complexity from O(2 d 1 +d 2 ) to O{(d 1 + d 2 ) 2 }. Thus, S BIC is computable when d 1 + d 2 is not very large.

After obtaining the parameter estimator of θ = (η , α , β , σ 2 ) , we perform the following three tests to examine the homogeneity of λ and γ.

Test I. We first test the homogeneity of λ and γ by considering the null and alternative hypotheses:

intercept is included in α, and Z and V −1 are of full rank as defined in Subsection 2.1, the above hypotheses are equivalent to

Recall that θ = (η , α , β , σ 2 ) and β 1 is an intercept. To test the null hypothesis, we let θ 1 = (η , β 1 , σ 2 ) and θ 2 = (α 1 , · · · , α d 1 , β 2 , · · · , β d 2 ) . With a slight abuse of notation, we reset θ = (θ 1 , θ 2 ) . The notation, functions and equations used in the following theorem and propositions are based on this new setting. Let θ (c) be the constrained QMLE under the null hypothesis. Then, the resulting quasi-Lagrangemultiplier test statistic is

where ∂ ( θ (c) ) ∂θ and I −1 n ( θ (c) ) are, respectively, the score function and the Fisher information matrix, both evaluated at θ (c) . Subsequently, the asymptotic distribution of T s is given below.

Theorem 3. Under Conditions (C1)-(C5) and the null hypothesis of H 0 , T s is asymp-

and I 1 (θ) are defined in Section 2 of the supplementary material, and χ 2 l (1) are independent chi-square random variables with 1 degree of freedom for l = 1, · · · , (p+d 1 +d 2 +1).

Furthermore, T s is asymptotically χ 2 (d 1 + d 2 − 1) when ε is normally distributed.

If the null hypothesis is not rejected, then we can consider the classical spatial autoregressive model. Otherwise, there exists heterogeneity either among inward or outward influence indices, or both, which motivates us to conduct the next two tests.

Test II. To test the homogeneity of γ, we consider the null and alternative hypotheses:

Assuming that γ i (α) = F (Z i α), F (·) is strictly monotone, no intercept is included in α, and Z is of full rank defined in Subsection 2.1, the above hypotheses are equivalent to H 0,α : α 1 = α 2 = · · · = α d 1 = 0 vs. H 1,α : α i = 0 for some i.

( 2.5) Under the null hypothesis of H 0,α , we can obtain the constrained QMLE θ (c) α , whose associated quasi-log likelihood function is ( θ (c) α ). Accordingly, the quasi-Lagrange Multiplier test statistic of α is

and its theoretical property is given below.

Proposition 1. Under Conditions (C1)-(C5) and the null hypothesis of H 0,α , T sα is

is defined in Section 2 of the supplementary material, and χ 2 l (1) are independent chisquare random variables with 1 degree of freedom for l = 1, · · · , (p + d 1 + d 2 + 1).

Furthermore, under the normality assumption of ε, T sα is asymptotically χ 2 (d 1 ).

If the null hypothesis is rejected, then there exists heterogeneity among inward influence indices. Otherwise, we can consider the network influence model proposed by Zou et al. (2021) .

Test III. To test the homogeneity of λ, we consider the null and alternative hy-potheses:

Assuming that λ j (β) = F (V j β), F (·) is strictly monotone, and V −1 is of full rank, the above hypotheses are equal to

Under the null hypothesis of H 0,β , we can obtain the constrained QMLE θ (c) β and its associated quasi-loglikelihood function ( θ (c) β ). Accordingly, the quasi-Lagrange Multiplier test statistic of β is

and its theoretical property is given below.

Proposition 2. Under Conditions (C1)-(C5) and the null hypothesis of H 0,β , T sβ is

defined in Section 2 of the supplementary material, and χ 2 l (1) are independent chisquare random variables with 1 degree of freedom for l = 1, · · · , (p + d 1 + d 2 + 1).

Furthermore, T sβ is asymptotically χ 2 (d 2 − 1) under the normality assumption of ε.

If the null hypothesis is rejected, then there exists heterogeneity among outward influence indices.

It is worth noting that the asymptotic results in Theorem 3 and Propositions 1 and 2 depend on unknown parameters. In practice, we can replace them by their corresponding consistent estimators. For example, in the test statistic T s , λ l (θ, µ 3 , µ 4 ) is unknown.

We can replace it by its consistent estimator λ n,l ( θ (c) , µ (3,c) , µ (4,c) ), which is the l-th

). In addition, I n1 ( θ (c) ) is a consistent estimator of I 1 (θ). Analogous replacements can be done for the asymptotic distributions of T sα and T sβ , which are omitted here.

In this section, we conduct simulation studies to investigate the finite sample performance of parameter estimators, variable selection, and test statistics. Let the off-diagonal elements, a ij , of the adjacency matrix A be independent and identically generated from the Bernoulli distribution with probability 5/n, and let the diagonal elements, a ii , of A be zeros. We then define the weighted adjacency matrix W as W = (w ij ) n×n ∈ R n×n with w ij = a ij / n j=1 a ij for i, j = 1, · · · , n. Next, the elements

and identically generated from the standard normal distribution N (0, 1) except v i1 = x i1 = 1. Their corresponding coefficients are α = (α 1 , α 2 , α 3 ) = (0.4, 0.6, 0.7) , β = (β 1 , β 2 , β 3 , β 4 ) = (1.2, 1.4, 1.3, 1.6) , and η = (η 1 , η 2 , η 3 ) = (1, 2, 3) . Subsequently, we have Γ = diag{F (Z 1 α), · · · , F (Z n α)} and Λ = diag{F (V 1 β), · · · , F (V n β)}. Because Z i does not include the intercept, the above covariate-parameter setting ensures the model is identifiable; see Subsection 2.1. In addition, the random errors ε i (i = 1, · · · , n) are independent and identically generated from two distributions: the standard normal distribution, N (0, 1), and the mixture normal distribution, 0.9N (0, 5/9) + 0.1N (0, 5), respectively. Finally, the response vector Y is generated from model (2.2) with Y = (I n − ΓW Λ) −1 (Xη + ) and parameter vector θ = (η , α , β , σ 2 ) . All of these settings satisfy Conditions (C1)-(C6).

In simulation studies, all results are based on 500 realizations with n=500, 1,000 and 2,000. Let θ (k) = ( η

4 , σ 2(k) ) ∈ R 11 be the QMLE of θ at the k-th realization and let θ (k) j be its j-th component. To evaluate the performance of the parameter estimates, we consider the measurement

j . Based on θ (k) , we can further obtain the estimates γ (k) and λ (k) of γ and λ, respectively. Their

j , we can obtain the estimated standard error SD (k) j . Thus, the average of the estimated standard errors for each θ j is SD j = 500 −1 k SD (k)

j . Subsequently, the averages of the estimated standard errors of the n nodes via 500 realizations for γ and λ are SD γ =

To obtain an overall measurement, we also consider the root mean squared error, RMSE = SD 2 + BIAS 2 , for θ, γ and λ. Table 1 presents the BIAS, SD and RMSE of θ , γ and λ via 500 realizations under normal random errors with three sample sizes. The results indicate that, for all three link functions, the absolute value of BIAS and the SD of the parameter estimates decrease when n gets large. In addition, the biases of γ and λ approach 0, and the average estimated errors are quite small. Hence, it is not surprising that RMSE reveals the same pattern. Notably, the accurate estimates of γ and λ allow us to subsequently conduct four-quadrant clustering, which is not reported here to save space. Moreover, we find that both the estimated Hessian matrix and the information matrix are positive definite at each iterative step across all realizations. Accordingly, θ is the global maximizer of the log-likelihood function, and θ is close to its true value, as shown in Table   1 . Under mixture normal errors, the results are qualitatively similar to those in Table   1 , and we relegate them to Table S1 in the supplementary material. As suggested by an anonymous reviewer, we consider the simulation setting with an overlapping design.

The results are qualitatively similar to those in Table 1 , and we present them in Table   S3 of the supplementary material.

We next study the finite sample performance of the BIC variable selection criterion.

To this end, we set up the full model size |S F α | = |S F β |=6 and the true model size |S T α | = 3, |S T β |=4. In addition, three measures are used to assess the selection performance given below: (i) The average percentage of correct fit (CF), i.e., I( S α = S T α ) and I( S β = S T β ); (ii) the average true positive rate (TPR), i.e., | S α ∩ S T α |/|S T α | and | S β ∩ S T β |/|S T β |; and (iii) the average false positive rate (FPR), i.e.,

where I is an indicator function, S α and S β are the selected models via BIC, S c T α = S F α \S T α and S c T β = S F β \S T β . Table 2 reports CF, TPR and FPR under normal random errors via 500 realizations.

The results indicate that, as the sample size increases, the average percentage of CF and the average TPR approach 1, while the average FPR is close to 0. Accordingly, the optimal model selected by the BIC is consistently approaching the true model, which supports Theorem 2. For the mixture normal errors, the results are qualitatively similar to those in Table 2 , and we present them in Table S2 of the supplementary material.

We finally investigate the performance of the three quasi Lagrange-multiplier tests we fit them with the model (2.2), and find that none of their associated coefficients α, β and η are significant at the 5% significance level. Thus, we iteratively eliminated the attribute with the largest p-value across Z, V and X until one of the attributes becomes significant. As a result, the remaining attributes in Z, V and X are ("Out-degree", "Duration" and "Self-Introduction"), ("In-degree", "Duration" and "Self-Introduction"), and ("Gender"), respectively.

In this empirical example, the sample size is large and the full model is finite. This motivates us to further employ the BIC consistent criterion to select the best model for each of the three link functions: sigmoid, inverse of log-log and inverse of probit.

Inspired by Vuong (1989), we further choose the link function with the smallest BIC value, which is the inverse of probit. Table 3 reports the QMLEs of the parameters and their corresponding t-statistics and p-values. For Z covariates, the attribute "Outdegree" is selected, and it is significant at the 5% level with a positive coefficient. This finding is sensible since users with more out-degree may have broad interests, outgoing personality, and strong curiosity to accept new things and are thus more likely to be influenced by others. As for V covariates, the attribute "Duration" is selected, and it is significant at the 5% level with a positive coefficient. This result is also reasonable because the users with longer duration are generally called "veteran users", who tend to have greater influence in the network. Finally, the p-value of the X variable "gender" is 0.1857. Thus, the users' activity level is not strongly related to gender.

It is worth noting that, without employing an initial variable elimination procedure, none of the selected variables are significant at the 5% significance level when using BIC to select relevant variables from the full model with fifteen candidate variables (five attributes in each of three covariates Z, V and X); the results are omitted to save space. In addition, we find that both the estimated Hessian matrix and information matrix are positive definite at each iterative step, which ensures that the QMLEs listed in Table 3 are global maximizers.

We next examine the heterogeneity of inward and outward influence among individuals via the three test statistics in Subsection 2.5. The p-values of all three tests are close to 0, which indicate high heterogeneity in both inward and outward influence among the 2,580 users. To evaluate these phenomena, we calculate the estimated influence indices γ i = Φ(Z i α) and λ j = Φ(V j β). We then sort them and obtain γ (1) > γ (2) > · · · > γ (n) and λ (1) > λ (2) > · · · > λ (n) . This allows us to conduct user segmentation. To this end, as shown in Figure 1 , we designate the origin as γ 0 = γ (k 1 ) and λ 0 = λ (k 2 ) , where k 1 and k 2 satisfy

5%, respectively. To illustrate the usefulness of our proposed clustering method, we choose the 12.5% quantile as the threshold for our user segmentation. In practice, decision makers can balance the benefits and costs of choosing a different threshold.

With the given threshold values γ 0 and λ 0 , we now employ the four quadrant method of classifying users, following two different approaches. First, we conduct the "indicescomparison (IC)" by comparing γ i and λ i with the corresponding threshold values γ 0 and λ 0 for i = 1, · · · , n. Second, we proceed with the "indices-test (IT)" by calculating the test statistics T i,γ and T i,λ for i = 1, · · · , n, and obtain the corresponding p-values p i,γ and p i,λ by controlling the false discovery rate (FDR) as mentioned in Subsection 2.3. Figure 5 depicts the percentage of users being classified into the four quadrants by the aforementioned two approaches. Because the results are similar, we focus only on the "indices-test" approach.

Based on user segmentation, practitioners can design marketing strategies to improve a product promotion or opinion dissemination. For example, if practitioners need to promote a new product on Weibo, they could focus on not just "influencers" but specifically the 4.0% "early adopters". In contrast, if practitioners want to conduct a promotion for a mature product, they should focus on the 7.7% "early majority".

To effectively utilize the above two types of users, decision makers can adopt the "fan economy" strategy to increase their conversion rate. With this strategy, they should first focus on the 15.2% "opinion leaders" to make such promotions. Finally, removing or ignoring the 73.1% who are prolonged "inactive users" can reduce promoting costs and improve marketing efficiency. In sum, this example demonstrates that obtaining the nodes' inward and outward influence indices can play an important role in social network applications.

In large scale networks, we propose an Inward and Outward Network Influence (IONI) model that not only captures the heterogeneity among nodes, but is also applicable to user segmentation. Without imposing any specific error distribution, we apply the quasi-maximum likelihood approach to obtain the estimators of the inward and outward influence indices. This allows us to employ the four-quadrant clustering method to segment users. The theoretical properties of parameter estimates are established, and simulation studies as well as an empirical analysis are presented to demonstrate the usefulness of the IONI model.

To broaden the usefulness of IONI and four-quadrant clustering, we identify the following four possible future research avenues. First, the adjacency matrix of the network is observed in our study. For an unobserved adjacency matrix, we could allow it to follow a probability distribution and estimate the adjacency matrix. Then we could employ IONI for studying influential power within networks. It is not surprising that indirectly connected nodes can influence each other in a network; see Lan et al.

(2018). Hence, the second avenue is to take into account higher-order adjacency matrices to enlarge the application of the IONI model. Third, motivated by an anonymous reviewer's comment, one can consider observations and network structures that vary with time. Hence, it is useful to extend IONI to dynamic network models; see, e.g., Dou et al. (2016) and Gao et al. (2019) . Lastly, based on an anonymous reviewer's suggestion, it is of great interest to generalize IONI to accommodate data with discrete responses (see Zhang et al. 2020) . We believe each of the above efforts would increase the value of IONI considerably. Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017) . "Network vector autoregression," The Annals of Statistics, 45, 1096-1123.

Zhu, X., Chang, X., Li, R. and Wang, H. (2019) . 2, 1.4δ, 1.3δ, 1.6δ) . The random errors are independently and identically simulated from the normal distribution N (0, 1).

The proportion (%) of each user segment in the Sina Weibo data as classified by the IC and IT methods. The Q1, Q2, Q3, and Q4 represent the first quadrant ("early adopters"), the second quadrant ("opinion leaders"), the third quadrant ("inactive users"), and the fourth quadrant ("early majority"), respectively.

In this supplementary material, we include six sections. Section 1 presents detailed expressions of the Fisher information matrix I n (θ) and the quantity J n (θ, µ 3 , µ 4 ) used in estimating the asymptotic covariance matrix of parameter estimators. Additional notations used for proving Theorem 3 and Propositions 1-2 are given in Section 2.

Section 3 provides five technical lemmas. Section 4 presents the proofs of theorems and propositions. Simulation studies for mixture normal errors and for an overlapping design are provided in Sections 5 and 6, respectively. Both additional studies are used to demonstrate the robustness of our proposed estimators.

Section 1: Detailed Expressions of I n (θ) and J n (θ, µ 3 , µ 4 )

and Λ β l (β) := ∂Λ(β)/∂β l = diag v 1l F (V 1 α), · · · , v nl F (V n β) for l = 1, · · · , d 2 .

The Fisher information matrix I(θ) of the quasi-maximum likelihood function (θ) is denoted as follows:

σ −2 n −1 X X I ηα,n I ηβ,n 0 p×1 I αη,n I αα,n I αβ,n I ασ 2 ,n I βη,n I βα,n I ββ,n I βσ 2 ,n 0 1×p I σ 2 α,n I σ 2 β,n 2 −1 σ −4

, I αη,n = I ηα,n , I βη,n = I ηβ,n , I βα,n = I αβ,n , I σ 2 α,n = I ασ 2 ,n , and I σ 2 β,n = I βσ 2 ,n .

Let • denote the Hadamard product of matrices, l n = (1, · · · , 1) ∈ R n×1 , X j = (X 1j , · · · , X nj ) ∈ R n for j = 1, · · · , p. In addition, denote the third and the fourth order moments by µ 3 = E( 3 i ) and µ 4 = E( 4 i ), respectively, for i = 1, · · · , n. After algebraic simplification, we have

J αη,n J αα,n J αβ,n J ασ 2 ,n J βη,n J βα,n J ββ,n J βσ 2 ,n µ 3 l n X 2nσ 6 J σ 2 α,n J σ 2 β,n

, J αη,n = J ηα,n , J βη,n = J ηβ,n , J βα,n = J αβ,n , J σ 2 α,n = J ασ 2 ,n , and J σ 2 β,n = J βσ 2 ,n .

Section 2: Additional Notation used for Theorem 3 and Propositions 1-2

Notation used for Theorem 3. Recall that θ = (η , α , β , σ 2 ) and β 1 is an intercept. In consideration for hypothesis testing, we let θ 1 = (η , β 1 , σ 2 ) and θ 2 = (α 1 , · · · , α d 1 , β 2 , · · · , β d 2 ) . With a slight abuse of notation, we reset θ = (θ 1 , θ 2 ) . In addition, let

and I 11 (θ) = ∆ T I(θ)∆ T . Then,

where l 11 (θ) ∈ R (p+1)×(p+1) , l 12 (θ) ∈ R (p+1)×1 , l 21 (θ) ∈ R 1×(p+1) and l 22 (θ) ∈ R 1×1 .

Finally, define

Notation for Propositions 1. Analogously, we reset θ = (θ 1 , θ 2 ) with θ 1 = (η , β , σ 2 ) and θ 2 = (α 1 , · · · , α d 1 ) in consideration for hypotheses testing in Test II. In addition,

and I 11 (θ α ) = ∆ Tα I(θ)∆ Tα . Then,

where l α 11 (θ) ∈ R (p+d 2 )×(p+d 2 ) , l α 12 (θ) ∈ R (p+d 2 )×1 , l α 21 (θ) ∈ R 1×(p+d 2 ) and l α 22 (θ) ∈ R 1×1 . Finally, define

Notation for Propositions 2. Based on the hypothesis testing in Test III, we reset θ = (θ 1 , θ 2 ) with θ 1 = (η , α , β 1 , σ 2 ) and θ 2 = (β 2 , · · · , β d 2 ) . In addition, let

and I 11 (θ β ) = ∆ T β I(θ)∆ T β . Then,

It is worth noting that we use K(θ, µ 3 , µ 4 ) = I(θ) + J (θ, µ 3 , µ 4 ) in all three tests. Lemma 1. For any generic vector x = (x 1 , · · · , x q ) ∈ R q and for 1 ≤ s 1 ≤ s 2 , we

have that

Lemma 2. For any generic vector x = (x 1 , · · · , x q ) ∈ R q and generic matrix G ∈ R m×q and U ∈ R m×m and for any s ≥ 1, we have that U Gx s ≤ m 1/s |G| ∞ U ∞ x 1 .

Lemma 3. Define ε = (ε 1 , · · · , ε n ) , where ε 1 , · · · , ε n are independent and identically distributed random variables with mean 0 and finite variance σ 2 . Let

where A = (a ij ) n×n ∈ R n×n and b = (b 1 , · · · , b n ) ∈ R n×1 . Suppose that:

(1) for i, j = 1, · · · , n, a ij = a ji ;

(2) sup n≥1 A 1 < ∞;

(3) for some ξ 1 > 0, sup n≥1 n −1 b 2+ξ 1 2+ξ 1 < ∞;

(4) for some ξ 2 > 0, E|ε i | 4+ξ 2 < ∞.

Then, we have E(Q n ) = 0 and

where µ s = E(ε s i ) for s = 3, 4. Moreover, suppose n −1 σ 2 Qn ≥ c for some c ≥ 0. Then, we obtain

Lemma 4. Let ε = (ε 1 , · · · , ε n ) , where ε 1 , · · · , ε n are independent and identically distributed with mean 0 and finite variance σ 2 . Define

ij ) n×n ∈ R n×n for l = 1, · · · , L with L < ∞ and B = (b il ) n×L ∈ R n×L . Suppose the following assumptions are satisfied,

(1) for all i, j = 1, · · · , n and l = 1, · · · , L, a (2) sup n≥1 A l 1 < ∞ for any l = 1, · · · , L;

(3) for some ξ 1 > 0, sup n≥1 n −1 vec(B) 2+ξ 1 2+ξ 1 < ∞;

(4) for some ξ 2 > 0, E|ε i | 4+ξ 2 < ∞.

Then, we have E(Q n ) = 0 and

where Ψ = (ψ 1 , · · · , ψ L ) ∈ R n×L with ψ l = (a l 11 , · · · , a l nn ) ∈ R n for l = 1, · · · , L, and µ s = E(ε s i ) for s = 3, 4. Moreover, assume that n −1 cov(Q n ) → D holds for a positive definite matrix D ∈ R L×L . Then we have

Lemma 5. Under Conditions (C1)-(C5), we have that, as n → ∞,

Proof of Theorem 1. To show this theorem, we take following three steps: (1) demonstrating that θ is n 1/2 -consistent, as n → ∞;

(2) verifying the asymptotically normal property of θ;

(3) showing that γ i and λ i are also asymptotically normal for each node i. Detailed procedures are given below.

Step 1. Applying the techniques of Fan and Li (2001) , we need to show that, for an arbitrarily small positive constant ξ > 0, there exists a constant M ξ > 0 that satisfies P sup

as n → ∞. To prove this, employing the Taylor series expansion and using the results of Lemma 5, we obtain sup u∈R (p+d 1 +d 2 +1) , u 2 =M ξ (θ + n −1/2 u) − (θ) = sup u∈R (p+d 1 +d 2 +1) , u 2 =M ξ

where λ min {·} is the smallest eigenvalue of the matrix inside the braces. Note that

is a quadratic function of M ξ . This, in conjunction with Condition (C5), implies that −2 −1 λ min {I(θ)} < 0. Thus, as long as M ξ is large enough, we have P sup

which demonstrates (S.1). The above equation indicates that there exists a local maximizer θ satisfying θ − θ 2 ≤ n −1/2 M ξ as n is large enough. This, together with (S.1), leads to

Accordingly, we obtain √ n θ − θ 2 = O p (1), which completes the first part of the proof.

Step 2. Applying the Taylor series expansion to the score function and the result of

Step 1, we have that 0

Combing this with the results of Lemma 5, we obtain

which completes the second part of the proof.

Step 3. Let l α = 0 d 1 ×p , I d 1 , 0 d 1 ×(d 2 +1) and l β = 0 d 2 ×(p+d 1 ) , I d 2 , 0 d 2 ×1 . Denote D α (θ, µ 3 , µ 4 ) = l α I −1 (θ) + I −1 (θ)J (θ, µ 3 , µ 4 )I −1 (θ) l α and D β (θ, µ 3 , µ 4 ) = l β I −1 (θ) + I −1 (θ)J (θ, µ 3 , µ 4 )I −1 (θ) l β . Using the result of (S.4), we have

Since γ i = F (Z i α) and λ i = F (V i β), we employ the delta method and obtain that

which completes the entire proof.

Proof of Theorem 2. To prove this theorem, we consider the following two scenarios,

i.e., the candidate model S ⊂ S F is overfitted or underfitted.

Overfitted Model: S ∈ Q + = {S : S T ⊂ S, S T = S}. Denote the QMLEs of S T and S by ( α S T , β S T ) ∈ R |S T | and ( α S , β S ) ∈ R |S| , respectively. Without loss of generality, we assume that the α and β components of S T correspond to the first |S T | components of S. Accordingly, (α S ,β S ) = ( α S T , 0 , β S T , 0 ) ∈ R |S| . By Conditions (C1)-(C5) and Theorem 1, we have that (α S ,β S ) is n 1/2 consistent and¨ c ( α S , β S ) is finite. Using the fact that˙ c ( α S , β S ) = 0, we then apply the Taylor series expansion and obtain

where * c (α S ,β S ) is the concentrated quasi-loglikelihood function of the overfitted model and it is evaluated at (α S ,β S ). Thus, we obtain that

Since Q + is a finite set with |S F | < ∞, the above result implies

Accordingly, for any S ∈ Q + , we have P (S BIC = S T ) → 1.

is a consistent estimator of (α S T , β S T ) as n → ∞. By the law of large numbers,

This, together with the overfitted result, leads to

which completes the entire proof.

Proof of Theorem 3. As mentioned in Section 2, we set θ = (θ 1 , θ 2 ) , where θ 1 = (η , β 1 , σ 2 ) and θ 2 = (α 1 , · · · , α d 1 , β 2 , · · · , β d 2 ) . The following notation, functions, and equations are based on this new setting. Under the null hypothesis, i.e., H 0 : θ 2 = (0, · · · , 0) ∈ R (d 1 +d 2 −1)×1 , the true parameter is θ = (θ 1 , 0 ) . Denote the resulting constrained and unconstrained QMLE of θ as θ (c) and θ, respectively. That is, θ (c) is obtained by maximizing the resulting quasi-loglikelihood function with the constraint that θ 2 = 0, while θ is obtained without such a constraint. Furthermore, define

where I ij (θ) is the convergence of the corresponding information matrix with respect to θ i and θ j for i, j ∈ {1, 2}.

Under the null hypothesis H 0 , we employ similar techniques to those used in proving (S.4) to obtain that

In addition, applying the Taylor series expansion, we obtain that

whereθ lies between θ (c) and θ. Since both θ (c) and θ are √ n-consistent estimators under the null hypothesis,θ is also √ n-consistent. By Condition (C5), Lemma 5 and the continuous mapping theorem, we have, under the null hypothesis,

Accordingly, we obtain that

By Lemma 5, we have

where K = K(θ, µ 3 , µ 4 ) = I(θ) + J (θ, µ 3 , µ 4 ) as defined in Section 2. This, together with (S.5) and (S.6), leads to

Using the fact that I 1 (θ)I(θ)I 1 (θ) = I(θ) and I 1 (θ)−I −1 (θ) I(θ) I 1 (θ)−I −1 (θ) = I −1 (θ) − I 1 (θ), we further obtain that

Subsequently, applying the continuous mapping theorem and Slutsky's theorem as well as Lemma 5, we can demonstrate that T s follows a weighted chi-square distribution p+d 1 +d 2 +1 l=1 λ l (θ, µ 3 , µ 4 )χ 2 l (1) asymptotically, where λ l (θ) are the l-th eigenvalues of K 1/2 I −1 (θ) − I 1 (θ) K 1/2 for l = 1, · · · , (p + d 1 + d 2 + 1). This completes the proof of the first part of Theorem 3.

When ε is normally distributed, the matrix J n (θ, µ 3 , µ 4 ) defined in Section 1 is zero.

As a result, its convergence J (θ, µ 3 , µ 4 ) defined in the Appendix of the paper should be zero. This implies that K = I(θ). Using the fact that K 1/2 I −1 (θ) − I 1 (θ) K 1/2 = I 1/2 (θ) I −1 (θ) − I 1 (θ) I 1/2 (θ) is symmetric and idempotent, we have tr T s = tr I 1/2 (θ) I −1 (θ) − I 1 (θ) I 1/2 (θ) = tr I p+d 1 +d 2 +1 − I(θ)I 1 (θ)

Accordingly, T s d − → χ 2 (d 1 + d 2 − 1), which completes the entire proof.

This section presents the simulation results for mixture normal errors in order to assess the robustness of our proposed method to non-normal errors. The model simulation setting is the same as that in Section 3 of the paper except for random errors.

Tables S1 and S2 report the results for parameter estimates and BIC selections, respectively. They yield similar patterns to those of Tables 1 and 2 in the manuscript. In addition, Figure S1 depicts the size and power of three tests with the inverse of probit link function (the results for sigmoid and log-log functions are similar and thus omitted). They yield similar findings to those obtained from Figures 2-4 of the manuscript.

In sum, our proposed estimates, selections and tests exhibit nice properties under mixture normal errors.

This section considers an overlapping design. We set p = 4, d 1 = 3 and d 2 = 4, where the covariates of Z are the same as the last three covariates of V and X. In addition, all covariates are independent and identically generated from the standard normal distribution N (0, 1) except for the intercepts v i1 = x i1 = 1 for i = 1, · · · , n.

The associated coefficients are respectively η = (1, 2, 3, 4) , α = (0.4, 0.6, 0.7) , β = (1.2, 1.4, 1.3, 1.6) and σ 2 = 1. The rest of simulation settings are the same as those in Section 3 of the paper. Table S3 reports the results of parameter estimates, which yields a similar pattern to that of Table 1 in the manuscript and supports Theorem 1.

In sum, our proposed estimates perform satisfactorily under the overlapping design. 

Spatial Econometrics: Methods and Models

Opinion leader detection: A methodological review

The diffusion of microfinance

Gossip: Identifying central individuals in a social network

Identification of peer effects through social networks

Models and Methods in Social Network Analysis

When early adopters don't adopt

Generalized Yule-Walker estimation for spatio-temporal models with unknown diagonal coefficients

Variable selection via nonconcave penalized likelihood and its oracle properties

Banded spatio-temporal autoregressions

An improved mix framework for opinion leader identification in online learning communities

GMM estimation of social interaction models with centrality

SuperedgeRank algorithm and its application in identifying opinion leader of online public opinion supernetwork

The relevance of broker networks for information diffusion in the stock market

Identification of endogenous social effects: The reflection problem

Identification problems in the social sciences

Internet banking adoption among mature customers: early majority or laggards?

Innovativeness in product usage: A comparison of early adopters and early majority

Social Network Analysis

Regression model selection-A residual likelihood approach

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery