mod-boot-d1下载_在线阅读_42

is_334261

暂无简介

mod-boot-d1 Bootstrapping Lasso estimators A. Chatterjee∗ Stat-Math Unit Indian Statistical Institute, New Delhi S. N. Lahiri† Department of Statistics Texas A&M University Abstract In this paper, we consider bootstrapping the Lasso estimator of the regression paramete...

Bootstrapping Lasso estimators A. Chatterjee∗ Stat-Math Unit Indian Statistical Institute, New Delhi S. N. Lahiri† Department of Statistics Texas A&M University Abstract In this paper, we consider bootstrapping the Lasso estimator of the regression parameter in a multiple linear regression model. It is known that the standard bootstrap method fails to be consistent. Here, we propose a modified bootstrap method, and show that it provides valid approximation to the distribution of the Lasso estimator, for all possible values of the unknown regression parameter vector, including the case where some of the components are zero. Further, we establish consistency of the modified bootstrap method for estimating the asymptotic bias and variance of the Lasso estimator. We also show that the residual bootstrap can be used to consistently estimate the distribution and variance of the Adaptive Lasso estimator. Using the former result, we formulate a novel data based method for choosing the optimal penalizing parameter for the Lasso using the modified bootstrap. A numerical study is performed to investigate the finite sample performance of the modified bootstrap. The methodology proposed in the paper is illustrated with a real data example. AMS (2000) Subject Classification: Primary: 62J07; Secondary: 62G09, 62E20. Keywords and Phrases: Penalized Regression, bootstrap variance estimation, regularization, shrinkage. 1 Introduction Consider the following regression model yi = x′iβ + �i, i = 1, . . . , n, (1.1) ∗email: cha@isid.ac.in †email: snlahiri@stat.tamu.edu 1 where, yi is the response, xi = (xi,1, . . . , xi,p) ′ is a p dimensional covariate vector, β = (β1, . . . , βp) ′ is the regression parameter and {�i : i = 1, . . . , n} are independent and identically distributed errors. We assume that p is fixed. The Lasso estimator of β is defined as the minimizer of the l1-norm penalized least-square criterion function, β̂n = argmin u∈Rp n∑ i=1 (yi − x′iu)2 + λn p∑ j=1 |uj |, (1.2) where, λn is a regularization parameter. The Lasso estimator was introduced by Tibshirani (1996) as an estimation and variable selection method. The Lasso estimator has two nice properties, namely, (i) the nature of regularization used in the Lasso leads to sparse solutions and (ii) it is also computationally feasible (see Efron et al. (2004), Osborne et al. (2000), Friedman et al. (2007)). The sparse solutions obtained by using the Lasso automatically leads to model selection. In the finite dimensional case, many authors have studied the model-consistency properties of the Lasso and investigated conditions under which the Lasso can recover the true sparsity pattern (see Zhao and Yu (2006), Wainwright (2006), Zou (2006)). Other than the linear model setup like (1.1), Yuan and Lin (2007) have studied the neighborhood selection properties of the Lasso in graphical models. Recently Bach (2009) considered using bootstrap samples in order to improve the model selection accuracy of the Lasso. An important problem in this context is the estimation consistency of the Lasso. This was first studied by Knight and Fu (2000) for the finite dimensional regression model (1.1). The asymptotic distribution was found and it was shown that the Lasso was weakly consistent. They also showed that if λn was sufficiently large, then some components of the Lasso estimate may be exactly zero. It was found that under appropriate regularity conditions, the limiting distribution of the Lasso estimator assigns positive mass at zero for the components where the true regression parameter has zero values. Since the limit distribution of the Lasso estimator is complicated (cf. Knight and Fu (2000)), it is important to have alternative approximations to the distribution of the Lasso estimator that can be used in practice to set confidence regions and to carry out tests on the parameter vector. Knight and Fu (2000) considered using the bootstrap to generate alternative approximations. More specifically, Knight and Fu (2000) considered the residual-based bootstrap method (cf. Freedman (1981)) for the Lasso estimator and sketched out its asymptotic behavior. Recently, it is further investigated rigorously by Chatterjee and Lahiri (2010), who show that the asymptotic distribution of the bootstrapped Lasso estimator is a random measure on Rp and that the bootstrap is inconsistent whenever one or more components of the regression parameter is zero. Thus, in situations where the limit distribution of the Lasso estimator is most complicated and alternative approximations are needed the most, the usual bootstrap fails drastically! In this paper, we construct a suitable modification to the residual based bootstrap method and show that under mild regularity conditions, the modified version of the bootstrap is 2 indeed consistent in estimating the limiting distribution of the Lasso estimator, even when some components of β are zero. Another important issue that has eluded a satisfactory solution to date is the problem of attaching standard error estimates to the Lasso estimates. Initially, Tibshirani (1996) suggested an approximation that had the drawback of providing zero standard error estimates when the estimated coefficient was zero. Osborne et al. (2000) suggested an improved alternative but as pointed out by Knight and Fu (2000), all these methods suffered from the drawback of considering the Lasso as an approximately linear transformation. Other related methods (like the ’sandwich formula’) for variance estimation in penalized regression setup were suggested by Fan and Li (2001) and Fan and Peng (2004) which only provided variance estimates for the underlying non-zero components. One of the main contributions of this paper is to show that the modified bootstrap method, that we propose here, gives a consistent variance estimator for the Lasso, for both zero and nonzero parameter values. In particular, the bootstrap based variance estimate overcomes the drawbacks of some of the earlier variance estimation techniques, like producing zero variance estimates for estimated zero coefficients. As an application to this result, we provide bootstrap based confidence regions for the true parameter vector. From Knight and Fu (2000)’s work, it is known that the asymptotic distribution of the Lasso estimator depends on the regularization parameter through λ0 where λnn−1/2 → λ0. In particular, the accuracy of the Lasso estimator β̂n critically depends on the choice of the regularization parameter λn. In this context, we formulate a new data-based method for selection of the regularization parameter. Since the modified bootstrap is consistent for the mean squared error (MSE) of the Lasso estimator, we define a criterion function based on the modified bootstrap estimator of the MSE as a rescaled function of λn. In this context, recently Hall et al. (2009) suggested using the the m-out-of-n bootstrap, to choose the optimal regularization parameter in the Adaptive Lasso setup. We also study the properties of the bootstrapped version of the Adaptive Lasso estimator (Zou (2006)). We find that the residual bootstrap based version of the Adaptive Lasso estimator is consistent in estimating the asymptotic distribution and variance of the Adaptive Lasso estimator. Similar to Lasso estimator, the question about the validity of the bootstrap for the Adaptive Lasso has been unresolved till now and our results show that the simple residual bootstrap can consistently estimate the distribution and provide variance estimates for the Adaptive Lasso estimator. This is unlike the case for the Lasso, where simple residual bootstrap fails. We conclude this section with a brief literature review. Knight and Fu (2000) derived the asymptotic distribution of the Lasso estimator under model (1.1) in the case where the dimension p of the regression 3 is fixed. Properties of the standard bootstrap method have been investigated by Knight and Fu (2000) and Chatterjee and Lahiri (2010), in the same set up. In the finite dimensional case, Po¨tscher and co-authors have interesting results on the impossibility of estimating the distribution function of a Lasso estimator in a uniform sense. The relevance of their results in the context of our work is discussed in Section 3.1. In the high-dimensional case, where p is allowed to grow with n, work on estimation consistency of the Lasso is limited; among them Huang et al. (2008) considered the asymptotic properties of `q norm penalized regression estimators (0 < q < 1) for high dimensional regression models. There is a large amount of literature on the variable selection properties of the Lasso and we do not attempt to summarize all the work. We refer to the recent paper by Zhang et al. (2008), who describe a two-step procedure for variable selection using the Lasso in the high dimensional setup, and also provide a clear picture of recent developments on variable selection in the high dimensional setting. The rest of the paper is organized as follows. In Section 2.1, we briefly review the residual based bootstrap and motivate the intuitive reasons behind its failure. We formulate the modified bootstrap method in Section 2.2. The main results on consistency of modified bootstrap are stated Section 3. In Section 4, the results on the consistency of the residual bootstrap for the Adaptive Lasso estimator are presented. The data-based method for the selection of the (MSE-optimal) regularization parameter is presented in Section 5. The finite sample performance of the proposed modified bootstrapped Lasso estimator and the data-based methodology of choosing the optimal regularization parameter is studied in Section 6 using a simulated data set. A real data example is presented in Section 7. The proofs are provided in Section 8. 2 Formulation of the modified bootstrap method 2.1 Background and motivation In a regression setup like (1.1), there are two approaches to bootstrapping depending on whether the xi’s are assumed to be random or not. In the case where xi are random, it is of interest to study the joint distribution of the covariates and the response and hence pairwise bootstrap is a relevant choice. In contrast, here we assume that the xi’s are non-random. In this situation, the standard approach to bootstrapping is the residual bootstrap (cf. Efron (1979), Freedman (1981)), which was considered by Knight and Fu (2000) in the context of the Lasso estimator. To motivate the modified bootstrap method, we first give a brief description of the residual bootstrap. Let β̂n denote the Lasso estimator of β given by (1.2). Define the residuals ei = yi − x′iβ̂n, i = 1, . . . , n. 4 Consider the set of centered residuals {ei − e¯n : i = 1, . . . , n}, where e¯n = n−1 ∑n i=1 ei. For the residual bootstrap, one selects a with replacement sample of size n, {e∗i : i = 1, . . . , n}, from the set of centered residuals and formulates the bootstrap version of (1.1) as y∗i = x ′ iβ̂n + e ∗ i , i = 1, . . . , n. Next, based on the bootstrap dataset {(y∗i ,xi) : i = 1, . . . , n}, the bootstrap version of the Lasso estimator is defined as: β∗n = argmin u∈Rp n∑ i=1 (y∗i − x′iu)2 + λn p∑ j=1 |uj |, (2.1) The bootstrap version of T n ≡ n1/2 ( β̂n − β ) is T ∗n = n 1/2 ( β∗n − β̂n ) . The residual bootstrap estimator of the unknown distribution Gn (say) of T n is the (conditional) distribution Ĝn(·) (say) of T ∗n given the observations {yi : i = 1, . . . , n}, i.e., Ĝn(B) = P∗ (T ∗n ∈ B) , B ∈ B(Rp), (2.2) where P∗ denotes conditional probability given the error variables {�i : i = 1, . . . , n} and B(Rp) denotes the Borel σ-field on Rp. For the bootstrap approximation to be useful, one would expect Ĝn(·) to be close to Gn(·). However, this is not the case; Chatterjee and Lahiri (2010) show that the residual bootstrap estimator Ĝn(·), instead of converging to the deterministic limit of Gn given by Knight and Fu (2000), converges weakly to a random probability measure and therefore, it fails to provide a valid approximation to Gn(·). To appreciate why the residual bootstrap approximation have a random limit and why it is inconsistent, first observe that the Lasso estimators of the non-zero components of β estimate their signs correctly with high probability but the estimators of the zero-components take both positive and negative values with positive probabilities, thereby erring to capture the target sign value (which is zero for such components) closely. A close examination of the proof of the main result (cf. Theorem 3.1 in Chatterjee and Lahiri (2010)), shows that although the formulation of the residual bootstrap mimics the main features of the regression model closely, it fails to reproduce the sign of the zero-components of β with sufficient accuracy in the formulation of the bootstrap Lasso estimation criterion (2.1), leading to the random limit. 5 2.2 A modified bootstrap method Based on the discussion of the last paragraph, we now propose a modified version of the bootstrapped Lasso estimator that more closely reproduces the sign-vector corresponding to the unknown parameter β. As seen in Chatterjee and Lahiri (2010), the inconsistency of the standard residual bootstrap arises when some components of β are zero. The key idea behind the modified bootstrap is to force components of the Lasso estimator β̂n to be exactly zero whenever they are close to zero. Since the original Lasso estimator is root-n consistent, its fluctuations are of the order n−1/2 around the true value. This suggests a neighborhood of order larger than n−1/2 around the true value will contain the values of the Lasso estimator with high probability. To that end, let {an} be a sequence of real numbers such that an + ( n−1/2 log n ) a−1n → 0 as n→∞. (2.3) For example, an = cn−δ satisfies (2.3) for all c ∈ (0,∞) and δ ∈ (0, 2−1). We threshold the components of the Lasso estimator β̂n at an and define the modified Lasso estimator as β˜n = ( β˜n,1, . . . , β˜n,p )′ , with β˜n,j = β̂n,j1 ( |β̂n,j | ≥ an ) , j = 1, . . . , p, (2.4) where β̂n is the usual Lasso estimate defined in (1.2) and 1 (·) denotes the indicator function. Note that for a nonzero component βj , ∣∣β̂n,j∣∣ = |βj |+Op (n−1/2) > |βj |2 ≥ an for n large, with high probability and therefore, β˜n,j = β̂n,j , for n large and with probability tending to 1. Thus, this shrinkage does not have any significant effect on the non-zero components. However, for a zero component, βj = 0, ∣∣β̂n,j∣∣ = |βj |+Op (n−1/2) = Op (n−1/2) ∈ [−an, an] , with probability tending to 1 as n→∞, and thus β˜n,j = β̂n,j1 (∣∣β̂n,j∣∣ > an) = 0 for large n, with probability tending to 1. In particular, the shrinkage by an accomplishes our main objective: namely, to capture the signs of the zero components precisely with probability tending to 1, as the sample size n goes to infinity. 6 Next, we define the modified residuals {ri : i = 1, . . . , n} based on this estimator β˜n by ri = yi − x′iβ˜n, i = 1, . . . , n. (2.5) Let r¯n = n−1 ∑n i=1 ri. We select a random sample {e∗∗1 , . . . , e∗∗n } of size n with replacement from the centered residuals {ri − r¯n : i = 1, . . . , n} and set y∗∗i = x ′ iβ˜n + e ∗∗ i , i = 1, . . . , n. Then, the modified bootstrap Lasso estimator is β∗∗n := argmin u∈Rp n∑ i=1 (y∗∗i − x′iu)2 + λn p∑ j=1 |uj |. (2.6) Let T ∗∗n = n 1/2 ( β∗∗n − β˜n ) , n ≥ 1, and let G˜n(·) denote the conditional distribution of T ∗∗n given the observations (or the error variables {�i : i = 1, . . . , n}), i.e., G˜n(B) = P∗(T ∗∗n ∈ B), B ∈ B(Rp). Thus, G˜n(·) is the modified bootstrap approximation to the unknown distribution Gn(·) of T n. The modified bootstrap estimator of a population parameter θn = ϕ(Gn), defined through a functional ϕ(·) of Gn, is ϕ(G˜n). For example, the modified bootstrap estimator of E (T n) = the scaled bias of β̂n is E∗(T ∗∗ n ) and similarly, that of Var(T n) is Var∗(T ∗∗ n ), where E∗ and Var∗ denote the expectation and variance under P∗. Remark 1: It should be noted that centering the residuals {ri : i = 1, . . . , n} is a must for the validity of the residual bootstrap in general, as it ensures that the bootstrap analogue of the model condition E(�1) = 0. Note that this is a sufficient condition to ensure E∗( ∑n i=1 xi� ∗∗ i ) = 0. For centered xi’s, the ’centering of residuals’ step can be bypassed, as the conditional mean of the sum ∑n i=1 xi� ∗∗ i will still be zero. For real data-sets, where the the responses and covariates are already known to be centered, the centering of residuals is not a required step. On the other hand, scaling the residuals is not as critical a condition: the residual bootstrap is known to be consistent for the OLS (and also for the Lasso, as shown here) without any scale adjustments. In the literature, the rescaling factor (1− p/n)−1/2 is sometimes used (see Efron (1982)) to improve finite sample accuracy, but asymptotically this has negligible effect as p is fixed in this case. Remark 2: In (2.4), the thresholded version of the Lasso estimator β˜n has been constructed by thresh- olding the usual Lasso estimator β̂n. It is possible to replace β̂n by any other √ n-consistent estimator of 7 β, and this fact follows from the proof of Theorem 3.1. For example, the usual least-squares estimator of β can be used. In such a situation, β˜n can be similarly defined by thresholding each component of the selected √ n-consistent estimator of β. In the context of computational efficiency, the choice of an alternative estima- tor, like the usual least-squares estimator, does provide some computational advantage over using the Lasso estimator β̂n. But, in practical situations, with the existence of extremely fast computational algorithms (Efron et al. (2004), Friedman et al. (2007)), that can be used for computing Lasso estimators, the extra effort in computing the Lasso estimator β̂n becomes a minor issue. In the next section we show that under some mild conditions, the modified bootstrap estimators of the distribution function of T n and of its bias and variance functionals are consistent. 3 Bootstrapping the Lasso estimator 3.1 Consistency of the distributional approximation The first result shows that the modified bootstrap gives a valid approximation to the distribution of T n: Theorem 3.1 (consistency of modified bootstrap). Suppose that the following assumptions hold: (C.1) n−1 ∑n i=1 xix ′ i → C, where C is positive definite. Further, n−1 ∑n i=1 ‖xi‖3 = O(1). (C.2) λnn−1/2 → λ0 ≥ 0. (C.3) The errors {�i : i = 1, . . . , n} are independent and identically distributed with E(�1) = 0 and Var (�1) = σ2 ∈ (0,∞). Then % ( G˜n(·), Gn(·) ) −→ 0 as n→∞, with probability 1, where %(·, ·) denotes the Prohorov metric on the set of all probability measures on (Rp,B(Rp)). Theorem 3.1 asserts strong consistency of the modified bootstrap distribution function estimator under Assumptions (C.1) - (C.3). In contrast, for the standard version of the residual bootstrap, Chatterjee and Lahiri (2010), shows that under the same set of regularity assumptions, if λ0 > 0 in Assumption (C.2) and if β has at least one zero-component, then % ( Ĝn(·), Gn(·) ) 6→ 0 in probability, as n→∞, 8 where Ĝn is the residual bootstrap estimate of Gn (cf. (2.2)). Thus, while the standard residual bootstrap has limited success in presence of zero-components, the modified bootstrap removes the limitation of the residual bootstrap, and provides a valid approximation to the distribution of the centered and scaled Lasso estimator for all values of the regression parameter β. Next let G∞(·) denote the limit distribution of T n (cf. Knight and Fu (2000)). Then, from Theorem 3.1, it follows that for any Borel set B ⊂ Rp with G∞(∂B) = 0, P∗(T ∗∗n ∈ B)−P(T

本文档为【mod-boot-d1】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

mod-boot-d1

热门搜索

历史搜索