LASSO Regression - p-values and coefficientsStandard errors for lasso prediction using RUsing regularization...
How can people dance around bonfires on Lag Lo'Omer - it's darchei emori?
Apparent Ring of Craters on the Moon
How to prevent bad sectors?
How many chess players are over 2500 Elo?
At what point in European history could a government build a printing press given a basic description?
Why do Russians call their women expensive ("дорогая")?
Employer demanding to see degree after poor code review
How to capture more stars?
How do I align equations in three columns, justified right, center and left?
Why are these traces shaped in such way?
Identifying an object pointer by generating and using a unique ID
How were these pictures of spacecraft wind tunnel testing taken?
Logarithm of dependent variable is uniformly distributed. How to calculate a confidence interval for the mean?
Array Stutter Implementation
Why are C64 games inconsistent with which joystick port they use?
When did God say "let all the angels of God worship him" as stated in Hebrews 1:6?
How can I get exact maximal value of this expression?
Ticket sales for Queen at the Live Aid
Rests in pickup measure (anacrusis)
Is the first derivative operation on a signal a causal system?
Canon 70D often overexposing or underexposing shots
Placing bypass capacitors after VCC reaches the IC
Where is the encrypted mask value?
Why colon to denote that a value belongs to a type?
LASSO Regression - p-values and coefficients
Standard errors for lasso prediction using RUsing regularization when doing statistical inferenceTesting for coefficients significance in Lasso logistic regressionExplanation for unstable lasso regression coefficients?How to handle with missing values in order to prepare data for feature selection with LASSO?Interpretation of coefficients of glmnet - LASSO/Cox model?Difference between trace plot and cv.glmnetInterpretation of LASSO regression coefficientsTesting for coefficients significance in Lasso logistic regressionA generalized LASSO constraintRelaxed Lasso Logistic Regression: Estimating second penalty parameterWhy Lasso classification results changes in Matlab?Implementing Lasso Regression in Numpy
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I've run a LASSO in R using cv.glmnet
. I would like to generate p-values for the coefficients that are selected.
I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html
While the boot.lasso.proj
program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.
Would it be safe to use the p-values from hdi
for the coefficients produced by cv.glmnet
?
p-value lasso
New contributor
$endgroup$
add a comment |
$begingroup$
I've run a LASSO in R using cv.glmnet
. I would like to generate p-values for the coefficients that are selected.
I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html
While the boot.lasso.proj
program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.
Would it be safe to use the p-values from hdi
for the coefficients produced by cv.glmnet
?
p-value lasso
New contributor
$endgroup$
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago
add a comment |
$begingroup$
I've run a LASSO in R using cv.glmnet
. I would like to generate p-values for the coefficients that are selected.
I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html
While the boot.lasso.proj
program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.
Would it be safe to use the p-values from hdi
for the coefficients produced by cv.glmnet
?
p-value lasso
New contributor
$endgroup$
I've run a LASSO in R using cv.glmnet
. I would like to generate p-values for the coefficients that are selected.
I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html
While the boot.lasso.proj
program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.
Would it be safe to use the p-values from hdi
for the coefficients produced by cv.glmnet
?
p-value lasso
p-value lasso
New contributor
New contributor
edited 9 hours ago
Richard Hardy
28.6k647134
28.6k647134
New contributor
asked 12 hours ago
jpryan28jpryan28
162
162
New contributor
New contributor
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago
add a comment |
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?
These issues are discussed on this site. This page is one good place to start, with links to the R hdi
package that you mention and also to the selectiveInference
package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.
Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.
$endgroup$
add a comment |
$begingroup$
Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm
command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:
library(ISLR)
library(glmnet)
ds <- na.omit(Hitters)
X <- as.matrix(ds[,1:10])
lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, nfolds=nrow(ds),
parallel = T)
opt_lam <- lM_LASSO$lambda.min
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
W
1
(Intercept) 4.5630727825
AtBat -0.0021567122
Hits 0.0115095746
HmRun 0.0055676901
Runs 0.0003147141
RBI 0.0001307846
Walks 0.0069978218
Years 0.0485039070
CHits 0.0003636287
keep_X <- rownames(W)[W!=0]
keep_X <- keep_X[!keep_X == "(Intercept)"]
X <- X[,keep_X]
summary(lm(log(ds$Salary)~X))
Call:
lm(formula = log(ds$Salary) ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.23409 -0.45747 0.06435 0.40762 3.02005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
XHits 0.0126216 0.0039645 3.184 0.00164 **
XHmRun 0.0057538 0.0103619 0.555 0.57919
XRuns 0.0003510 0.0048428 0.072 0.94228
XRBI 0.0002455 0.0045771 0.054 0.95727
XWalks 0.0072372 0.0026936 2.687 0.00769 **
XYears 0.0487293 0.0206030 2.365 0.01877 *
XCHits 0.0003622 0.0001564 2.316 0.02138 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6251 on 254 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16
Note that the coefficients are little different from the ones derived from the glmnet
model. Finally, you can use the stargazer
package to output into a well-formatted table. In this case, we have
stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
===============================================
Dependent variable:
---------------------------
Salary)
-----------------------------------------------
XAtBat -0.003**
(0.001)
XHits 0.013***
(0.004)
XHmRun 0.006
(0.010)
XRuns 0.0004
(0.005)
XRBI 0.0002
(0.005)
XWalks 0.007***
(0.003)
XYears 0.049**
(0.021)
XCHits 0.0004**
(0.0002)
Constant 4.580***
(0.156)
-----------------------------------------------
Observations 263
R2 0.521
Adjusted R2 0.506
Residual Std. Error 0.625 (df = 254)
F Statistic 34.521*** (df = 8; 254)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Bootstrap
Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:
library(boot)
W_boot <- function(ds, indices) {
ds_boot <- ds[indices,]
X <- as.matrix(ds_boot[,1:10])
y <- log(ds$Salary)
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
return(W)
}
results <- boot(data=ds, statistic=W_boot,
R=10000)
se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
se2 <- apply(results$t,2,sd)
se2 <- se2[W!=0]
plot(se2~se1)
abline(a=0,b=1)
There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.
$endgroup$
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with thelm
standard errors. Ignoring the intercept, the results seem to be consistent.
$endgroup$
– majeed simaan
5 hours ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?
These issues are discussed on this site. This page is one good place to start, with links to the R hdi
package that you mention and also to the selectiveInference
package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.
Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.
$endgroup$
add a comment |
$begingroup$
To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?
These issues are discussed on this site. This page is one good place to start, with links to the R hdi
package that you mention and also to the selectiveInference
package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.
Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.
$endgroup$
add a comment |
$begingroup$
To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?
These issues are discussed on this site. This page is one good place to start, with links to the R hdi
package that you mention and also to the selectiveInference
package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.
Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.
$endgroup$
To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?
These issues are discussed on this site. This page is one good place to start, with links to the R hdi
package that you mention and also to the selectiveInference
package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.
Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.
answered 5 hours ago
EdMEdM
23.4k234101
23.4k234101
add a comment |
add a comment |
$begingroup$
Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm
command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:
library(ISLR)
library(glmnet)
ds <- na.omit(Hitters)
X <- as.matrix(ds[,1:10])
lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, nfolds=nrow(ds),
parallel = T)
opt_lam <- lM_LASSO$lambda.min
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
W
1
(Intercept) 4.5630727825
AtBat -0.0021567122
Hits 0.0115095746
HmRun 0.0055676901
Runs 0.0003147141
RBI 0.0001307846
Walks 0.0069978218
Years 0.0485039070
CHits 0.0003636287
keep_X <- rownames(W)[W!=0]
keep_X <- keep_X[!keep_X == "(Intercept)"]
X <- X[,keep_X]
summary(lm(log(ds$Salary)~X))
Call:
lm(formula = log(ds$Salary) ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.23409 -0.45747 0.06435 0.40762 3.02005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
XHits 0.0126216 0.0039645 3.184 0.00164 **
XHmRun 0.0057538 0.0103619 0.555 0.57919
XRuns 0.0003510 0.0048428 0.072 0.94228
XRBI 0.0002455 0.0045771 0.054 0.95727
XWalks 0.0072372 0.0026936 2.687 0.00769 **
XYears 0.0487293 0.0206030 2.365 0.01877 *
XCHits 0.0003622 0.0001564 2.316 0.02138 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6251 on 254 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16
Note that the coefficients are little different from the ones derived from the glmnet
model. Finally, you can use the stargazer
package to output into a well-formatted table. In this case, we have
stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
===============================================
Dependent variable:
---------------------------
Salary)
-----------------------------------------------
XAtBat -0.003**
(0.001)
XHits 0.013***
(0.004)
XHmRun 0.006
(0.010)
XRuns 0.0004
(0.005)
XRBI 0.0002
(0.005)
XWalks 0.007***
(0.003)
XYears 0.049**
(0.021)
XCHits 0.0004**
(0.0002)
Constant 4.580***
(0.156)
-----------------------------------------------
Observations 263
R2 0.521
Adjusted R2 0.506
Residual Std. Error 0.625 (df = 254)
F Statistic 34.521*** (df = 8; 254)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Bootstrap
Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:
library(boot)
W_boot <- function(ds, indices) {
ds_boot <- ds[indices,]
X <- as.matrix(ds_boot[,1:10])
y <- log(ds$Salary)
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
return(W)
}
results <- boot(data=ds, statistic=W_boot,
R=10000)
se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
se2 <- apply(results$t,2,sd)
se2 <- se2[W!=0]
plot(se2~se1)
abline(a=0,b=1)
There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.
$endgroup$
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with thelm
standard errors. Ignoring the intercept, the results seem to be consistent.
$endgroup$
– majeed simaan
5 hours ago
add a comment |
$begingroup$
Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm
command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:
library(ISLR)
library(glmnet)
ds <- na.omit(Hitters)
X <- as.matrix(ds[,1:10])
lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, nfolds=nrow(ds),
parallel = T)
opt_lam <- lM_LASSO$lambda.min
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
W
1
(Intercept) 4.5630727825
AtBat -0.0021567122
Hits 0.0115095746
HmRun 0.0055676901
Runs 0.0003147141
RBI 0.0001307846
Walks 0.0069978218
Years 0.0485039070
CHits 0.0003636287
keep_X <- rownames(W)[W!=0]
keep_X <- keep_X[!keep_X == "(Intercept)"]
X <- X[,keep_X]
summary(lm(log(ds$Salary)~X))
Call:
lm(formula = log(ds$Salary) ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.23409 -0.45747 0.06435 0.40762 3.02005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
XHits 0.0126216 0.0039645 3.184 0.00164 **
XHmRun 0.0057538 0.0103619 0.555 0.57919
XRuns 0.0003510 0.0048428 0.072 0.94228
XRBI 0.0002455 0.0045771 0.054 0.95727
XWalks 0.0072372 0.0026936 2.687 0.00769 **
XYears 0.0487293 0.0206030 2.365 0.01877 *
XCHits 0.0003622 0.0001564 2.316 0.02138 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6251 on 254 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16
Note that the coefficients are little different from the ones derived from the glmnet
model. Finally, you can use the stargazer
package to output into a well-formatted table. In this case, we have
stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
===============================================
Dependent variable:
---------------------------
Salary)
-----------------------------------------------
XAtBat -0.003**
(0.001)
XHits 0.013***
(0.004)
XHmRun 0.006
(0.010)
XRuns 0.0004
(0.005)
XRBI 0.0002
(0.005)
XWalks 0.007***
(0.003)
XYears 0.049**
(0.021)
XCHits 0.0004**
(0.0002)
Constant 4.580***
(0.156)
-----------------------------------------------
Observations 263
R2 0.521
Adjusted R2 0.506
Residual Std. Error 0.625 (df = 254)
F Statistic 34.521*** (df = 8; 254)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Bootstrap
Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:
library(boot)
W_boot <- function(ds, indices) {
ds_boot <- ds[indices,]
X <- as.matrix(ds_boot[,1:10])
y <- log(ds$Salary)
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
return(W)
}
results <- boot(data=ds, statistic=W_boot,
R=10000)
se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
se2 <- apply(results$t,2,sd)
se2 <- se2[W!=0]
plot(se2~se1)
abline(a=0,b=1)
There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.
$endgroup$
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with thelm
standard errors. Ignoring the intercept, the results seem to be consistent.
$endgroup$
– majeed simaan
5 hours ago
add a comment |
$begingroup$
Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm
command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:
library(ISLR)
library(glmnet)
ds <- na.omit(Hitters)
X <- as.matrix(ds[,1:10])
lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, nfolds=nrow(ds),
parallel = T)
opt_lam <- lM_LASSO$lambda.min
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
W
1
(Intercept) 4.5630727825
AtBat -0.0021567122
Hits 0.0115095746
HmRun 0.0055676901
Runs 0.0003147141
RBI 0.0001307846
Walks 0.0069978218
Years 0.0485039070
CHits 0.0003636287
keep_X <- rownames(W)[W!=0]
keep_X <- keep_X[!keep_X == "(Intercept)"]
X <- X[,keep_X]
summary(lm(log(ds$Salary)~X))
Call:
lm(formula = log(ds$Salary) ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.23409 -0.45747 0.06435 0.40762 3.02005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
XHits 0.0126216 0.0039645 3.184 0.00164 **
XHmRun 0.0057538 0.0103619 0.555 0.57919
XRuns 0.0003510 0.0048428 0.072 0.94228
XRBI 0.0002455 0.0045771 0.054 0.95727
XWalks 0.0072372 0.0026936 2.687 0.00769 **
XYears 0.0487293 0.0206030 2.365 0.01877 *
XCHits 0.0003622 0.0001564 2.316 0.02138 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6251 on 254 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16
Note that the coefficients are little different from the ones derived from the glmnet
model. Finally, you can use the stargazer
package to output into a well-formatted table. In this case, we have
stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
===============================================
Dependent variable:
---------------------------
Salary)
-----------------------------------------------
XAtBat -0.003**
(0.001)
XHits 0.013***
(0.004)
XHmRun 0.006
(0.010)
XRuns 0.0004
(0.005)
XRBI 0.0002
(0.005)
XWalks 0.007***
(0.003)
XYears 0.049**
(0.021)
XCHits 0.0004**
(0.0002)
Constant 4.580***
(0.156)
-----------------------------------------------
Observations 263
R2 0.521
Adjusted R2 0.506
Residual Std. Error 0.625 (df = 254)
F Statistic 34.521*** (df = 8; 254)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Bootstrap
Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:
library(boot)
W_boot <- function(ds, indices) {
ds_boot <- ds[indices,]
X <- as.matrix(ds_boot[,1:10])
y <- log(ds$Salary)
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
return(W)
}
results <- boot(data=ds, statistic=W_boot,
R=10000)
se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
se2 <- apply(results$t,2,sd)
se2 <- se2[W!=0]
plot(se2~se1)
abline(a=0,b=1)
There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.
$endgroup$
Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm
command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:
library(ISLR)
library(glmnet)
ds <- na.omit(Hitters)
X <- as.matrix(ds[,1:10])
lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, nfolds=nrow(ds),
parallel = T)
opt_lam <- lM_LASSO$lambda.min
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
W
1
(Intercept) 4.5630727825
AtBat -0.0021567122
Hits 0.0115095746
HmRun 0.0055676901
Runs 0.0003147141
RBI 0.0001307846
Walks 0.0069978218
Years 0.0485039070
CHits 0.0003636287
keep_X <- rownames(W)[W!=0]
keep_X <- keep_X[!keep_X == "(Intercept)"]
X <- X[,keep_X]
summary(lm(log(ds$Salary)~X))
Call:
lm(formula = log(ds$Salary) ~ X)
Residuals:
Min 1Q Median 3Q Max
-2.23409 -0.45747 0.06435 0.40762 3.02005
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
XHits 0.0126216 0.0039645 3.184 0.00164 **
XHmRun 0.0057538 0.0103619 0.555 0.57919
XRuns 0.0003510 0.0048428 0.072 0.94228
XRBI 0.0002455 0.0045771 0.054 0.95727
XWalks 0.0072372 0.0026936 2.687 0.00769 **
XYears 0.0487293 0.0206030 2.365 0.01877 *
XCHits 0.0003622 0.0001564 2.316 0.02138 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6251 on 254 degrees of freedom
Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16
Note that the coefficients are little different from the ones derived from the glmnet
model. Finally, you can use the stargazer
package to output into a well-formatted table. In this case, we have
stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
===============================================
Dependent variable:
---------------------------
Salary)
-----------------------------------------------
XAtBat -0.003**
(0.001)
XHits 0.013***
(0.004)
XHmRun 0.006
(0.010)
XRuns 0.0004
(0.005)
XRBI 0.0002
(0.005)
XWalks 0.007***
(0.003)
XYears 0.049**
(0.021)
XCHits 0.0004**
(0.0002)
Constant 4.580***
(0.156)
-----------------------------------------------
Observations 263
R2 0.521
Adjusted R2 0.506
Residual Std. Error 0.625 (df = 254)
F Statistic 34.521*** (df = 8; 254)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
Bootstrap
Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:
library(boot)
W_boot <- function(ds, indices) {
ds_boot <- ds[indices,]
X <- as.matrix(ds_boot[,1:10])
y <- log(ds$Salary)
lM_LASSO <- glmnet(X,y = log(ds$Salary),
intercept=TRUE, alpha=1, lambda = opt_lam)
W <- as.matrix(coef(lM_LASSO))
return(W)
}
results <- boot(data=ds, statistic=W_boot,
R=10000)
se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
se2 <- apply(results$t,2,sd)
se2 <- se2[W!=0]
plot(se2~se1)
abline(a=0,b=1)
There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.
edited 5 hours ago
answered 7 hours ago
majeed simaanmajeed simaan
1464
1464
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with thelm
standard errors. Ignoring the intercept, the results seem to be consistent.
$endgroup$
– majeed simaan
5 hours ago
add a comment |
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with thelm
standard errors. Ignoring the intercept, the results seem to be consistent.
$endgroup$
– majeed simaan
5 hours ago
4
4
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
$endgroup$
– Ben Bolker
7 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the
lm
standard errors. Ignoring the intercept, the results seem to be consistent.$endgroup$
– majeed simaan
5 hours ago
$begingroup$
Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the
lm
standard errors. Ignoring the intercept, the results seem to be consistent.$endgroup$
– majeed simaan
5 hours ago
add a comment |
jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago
$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago