LASSO Regression - p-values and coefficientsStandard errors for lasso prediction using RUsing regularization...

How can people dance around bonfires on Lag Lo'Omer - it's darchei emori?

Apparent Ring of Craters on the Moon

How to prevent bad sectors?

How many chess players are over 2500 Elo?

At what point in European history could a government build a printing press given a basic description?

Why do Russians call their women expensive ("дорогая")?

Employer demanding to see degree after poor code review

How to capture more stars?

How do I align equations in three columns, justified right, center and left?

Why are these traces shaped in such way?

Identifying an object pointer by generating and using a unique ID

How were these pictures of spacecraft wind tunnel testing taken?

Logarithm of dependent variable is uniformly distributed. How to calculate a confidence interval for the mean?

Array Stutter Implementation

Why are C64 games inconsistent with which joystick port they use?

When did God say "let all the angels of God worship him" as stated in Hebrews 1:6?

How can I get exact maximal value of this expression?

Ticket sales for Queen at the Live Aid

Rests in pickup measure (anacrusis)

Is the first derivative operation on a signal a causal system?

Canon 70D often overexposing or underexposing shots

Placing bypass capacitors after VCC reaches the IC

Where is the encrypted mask value?

Why colon to denote that a value belongs to a type?



LASSO Regression - p-values and coefficients


Standard errors for lasso prediction using RUsing regularization when doing statistical inferenceTesting for coefficients significance in Lasso logistic regressionExplanation for unstable lasso regression coefficients?How to handle with missing values in order to prepare data for feature selection with LASSO?Interpretation of coefficients of glmnet - LASSO/Cox model?Difference between trace plot and cv.glmnetInterpretation of LASSO regression coefficientsTesting for coefficients significance in Lasso logistic regressionA generalized LASSO constraintRelaxed Lasso Logistic Regression: Estimating second penalty parameterWhy Lasso classification results changes in Matlab?Implementing Lasso Regression in Numpy






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







3












$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    5 hours ago


















3












$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$












  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    5 hours ago














3












3








3





$begingroup$


I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$




I've run a LASSO in R using cv.glmnet. I would like to generate p-values for the coefficients that are selected.



I found the boot.lass.proj to produce bootstrapped p-values
https://rdrr.io/rforge/hdi/man/boot.lasso.proj.html



While the boot.lasso.proj program produced p-values, I assume it is doing its own lasso - but I'm not seeing a way to get the coefficients.



Would it be safe to use the p-values from hdi for the coefficients produced by cv.glmnet?







p-value lasso






share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










share|cite|improve this question









New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








share|cite|improve this question




share|cite|improve this question








edited 9 hours ago









Richard Hardy

28.6k647134




28.6k647134






New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








asked 12 hours ago









jpryan28jpryan28

162




162




New contributor



jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




New contributor




jpryan28 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    5 hours ago


















  • $begingroup$
    What is hdi here?
    $endgroup$
    – Richard Hardy
    9 hours ago










  • $begingroup$
    HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
    $endgroup$
    – jpryan28
    5 hours ago
















$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago




$begingroup$
What is hdi here?
$endgroup$
– Richard Hardy
9 hours ago












$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago




$begingroup$
HDI - the hdi package that contains boot.lass.proj rdrr.io/rforge/hdi
$endgroup$
– jpryan28
5 hours ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






share|cite|improve this answer









$endgroup$





















    1












    $begingroup$

    Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



    library(ISLR)
    library(glmnet)
    ds <- na.omit(Hitters)
    X <- as.matrix(ds[,1:10])
    lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
    intercept=TRUE, alpha=1, nfolds=nrow(ds),
    parallel = T)
    opt_lam <- lM_LASSO$
    lambda.min
    lM_LASSO <- glmnet(X,y = log(ds$Salary),
    intercept=TRUE, alpha=1, lambda = opt_lam)
    W <- as.matrix(coef(lM_LASSO))
    W

    1
    (Intercept) 4.5630727825
    AtBat -0.0021567122
    Hits 0.0115095746
    HmRun 0.0055676901
    Runs 0.0003147141
    RBI 0.0001307846
    Walks 0.0069978218
    Years 0.0485039070
    CHits 0.0003636287

    keep_X <- rownames(W)[W!=0]
    keep_X <- keep_X[!keep_X == "(Intercept)"]
    X <- X[,keep_X]
    summary(lm(log(ds$Salary)~X))

    Call:
    lm(formula = log(ds$Salary) ~ X)

    Residuals:
    Min 1Q Median 3Q Max
    -2.23409 -0.45747 0.06435 0.40762 3.02005

    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
    XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
    XHits 0.0126216 0.0039645 3.184 0.00164 **
    XHmRun 0.0057538 0.0103619 0.555 0.57919
    XRuns 0.0003510 0.0048428 0.072 0.94228
    XRBI 0.0002455 0.0045771 0.054 0.95727
    XWalks 0.0072372 0.0026936 2.687 0.00769 **
    XYears 0.0487293 0.0206030 2.365 0.01877 *
    XCHits 0.0003622 0.0001564 2.316 0.02138 *
    ---
    Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 0.6251 on 254 degrees of freedom
    Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
    F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


    Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



    stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
    ===============================================
    Dependent variable:
    ---------------------------
    Salary)
    -----------------------------------------------
    XAtBat -0.003**
    (0.001)

    XHits 0.013***
    (0.004)

    XHmRun 0.006
    (0.010)

    XRuns 0.0004
    (0.005)

    XRBI 0.0002
    (0.005)

    XWalks 0.007***
    (0.003)

    XYears 0.049**
    (0.021)

    XCHits 0.0004**
    (0.0002)

    Constant 4.580***
    (0.156)

    -----------------------------------------------
    Observations 263
    R2 0.521
    Adjusted R2 0.506
    Residual Std. Error 0.625 (df = 254)
    F Statistic 34.521*** (df = 8; 254)
    ===============================================
    Note: *p<0.1; **p<0.05; ***p<0.01


    Bootstrap



    Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



    library(boot)

    W_boot <- function(ds, indices) {
    ds_boot <- ds[indices,]
    X <- as.matrix(ds_boot[,1:10])
    y <- log(ds$Salary)
    lM_LASSO <- glmnet(X,y = log(ds$
    Salary),
    intercept=TRUE, alpha=1, lambda = opt_lam)
    W <- as.matrix(coef(lM_LASSO))
    return(W)
    }

    results <- boot(data=ds, statistic=W_boot,
    R=10000)

    se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
    se2 <- apply(results$t,2,sd)
    se2 <- se2[W!=0]
    plot(se2~se1)
    abline(a=0,b=1)


    enter image description here
    There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






    share|cite|improve this answer











    $endgroup$









    • 4




      $begingroup$
      I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
      $endgroup$
      – Ben Bolker
      7 hours ago










    • $begingroup$
      Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
      $endgroup$
      – majeed simaan
      5 hours ago












    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



    These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



    Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






    share|cite|improve this answer









    $endgroup$


















      2












      $begingroup$

      To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



      These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



      Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






      share|cite|improve this answer









      $endgroup$
















        2












        2








        2





        $begingroup$

        To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



        These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



        Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.






        share|cite|improve this answer









        $endgroup$



        To expand on what Ben Bolker notes in a comment on another answer, the issue of what a frequentist p-value means for a regression coefficient in LASSO is not at all easy. What's the actual null hypothesis against which you are testing the coefficient values? How do you take into account the fact that LASSO performed on multiple samples from the same population may return wholly different sets of predictors, particularly with the types of correlated predictors that often are seen in practice? How do you take into account that you have used the outcome values as part of the model-building process, for example in the cross-validation or other method you used to select the level of penalty and thus the number of retained predictors?



        These issues are discussed on this site. This page is one good place to start, with links to the R hdi package that you mention and also to the selectiveInference package, which is also discussed on this page. Statistical Learning with Sparsity covers inference for LASSO in Chapter 6, with references to the literature as of a few years ago.



        Please don't simply use the p-values returned by those or any other methods for LASSO as simple plug-and-play results. It's important to think why/whether you need p-values and what they really mean in LASSO. If your main interest is in prediction rather than inference, measures of predictive performance would be much more useful to you and to your audience.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 5 hours ago









        EdMEdM

        23.4k234101




        23.4k234101

























            1












            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices) {
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)
            }

            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$









            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago
















            1












            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices) {
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)
            }

            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$









            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago














            1












            1








            1





            $begingroup$

            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices) {
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)
            }

            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.






            share|cite|improve this answer











            $endgroup$



            Recall that LASSO functions as an elimination process. In other words, it keeps the "best" feature space using CV. One possible remedy is to select the final feature space and feed it back into an lm command. This way, you would be able to compute the statistical significance of the final selected X variables. For instance, see the following code:



            library(ISLR)
            library(glmnet)
            ds <- na.omit(Hitters)
            X <- as.matrix(ds[,1:10])
            lM_LASSO <- cv.glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, nfolds=nrow(ds),
            parallel = T)
            opt_lam <- lM_LASSO$
            lambda.min
            lM_LASSO <- glmnet(X,y = log(ds$Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            W

            1
            (Intercept) 4.5630727825
            AtBat -0.0021567122
            Hits 0.0115095746
            HmRun 0.0055676901
            Runs 0.0003147141
            RBI 0.0001307846
            Walks 0.0069978218
            Years 0.0485039070
            CHits 0.0003636287

            keep_X <- rownames(W)[W!=0]
            keep_X <- keep_X[!keep_X == "(Intercept)"]
            X <- X[,keep_X]
            summary(lm(log(ds$Salary)~X))

            Call:
            lm(formula = log(ds$Salary) ~ X)

            Residuals:
            Min 1Q Median 3Q Max
            -2.23409 -0.45747 0.06435 0.40762 3.02005

            Coefficients:
            Estimate Std. Error t value Pr(>|t|)
            (Intercept) 4.5801734 0.1559086 29.377 < 2e-16 ***
            XAtBat -0.0025470 0.0010447 -2.438 0.01546 *
            XHits 0.0126216 0.0039645 3.184 0.00164 **
            XHmRun 0.0057538 0.0103619 0.555 0.57919
            XRuns 0.0003510 0.0048428 0.072 0.94228
            XRBI 0.0002455 0.0045771 0.054 0.95727
            XWalks 0.0072372 0.0026936 2.687 0.00769 **
            XYears 0.0487293 0.0206030 2.365 0.01877 *
            XCHits 0.0003622 0.0001564 2.316 0.02138 *
            ---
            Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

            Residual standard error: 0.6251 on 254 degrees of freedom
            Multiple R-squared: 0.5209, Adjusted R-squared: 0.5058
            F-statistic: 34.52 on 8 and 254 DF, p-value: < 2.2e-16


            Note that the coefficients are little different from the ones derived from the glmnet model. Finally, you can use the stargazer package to output into a well-formatted table. In this case, we have



            stargazer::stargazer(lm(log(ds$Salary)~X),type = "text")
            ===============================================
            Dependent variable:
            ---------------------------
            Salary)
            -----------------------------------------------
            XAtBat -0.003**
            (0.001)

            XHits 0.013***
            (0.004)

            XHmRun 0.006
            (0.010)

            XRuns 0.0004
            (0.005)

            XRBI 0.0002
            (0.005)

            XWalks 0.007***
            (0.003)

            XYears 0.049**
            (0.021)

            XCHits 0.0004**
            (0.0002)

            Constant 4.580***
            (0.156)

            -----------------------------------------------
            Observations 263
            R2 0.521
            Adjusted R2 0.506
            Residual Std. Error 0.625 (df = 254)
            F Statistic 34.521*** (df = 8; 254)
            ===============================================
            Note: *p<0.1; **p<0.05; ***p<0.01


            Bootstrap



            Using a bootstrap approach, I compare the above standard errors with the bootstrapped one as a robustness check:



            library(boot)

            W_boot <- function(ds, indices) {
            ds_boot <- ds[indices,]
            X <- as.matrix(ds_boot[,1:10])
            y <- log(ds$Salary)
            lM_LASSO <- glmnet(X,y = log(ds$
            Salary),
            intercept=TRUE, alpha=1, lambda = opt_lam)
            W <- as.matrix(coef(lM_LASSO))
            return(W)
            }

            results <- boot(data=ds, statistic=W_boot,
            R=10000)

            se1 <- summary(lm(log(ds$Salary)~X))$coef[,2]
            se2 <- apply(results$t,2,sd)
            se2 <- se2[W!=0]
            plot(se2~se1)
            abline(a=0,b=1)


            enter image description here
            There seems to be a small bias for the intercept. Otherwise, the ad-hoc approach seems to be justified. In any case, you may wanna check this thread for further discussion on this.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 5 hours ago

























            answered 7 hours ago









            majeed simaanmajeed simaan

            1464




            1464








            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago














            • 4




              $begingroup$
              I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
              $endgroup$
              – Ben Bolker
              7 hours ago










            • $begingroup$
              Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
              $endgroup$
              – majeed simaan
              5 hours ago








            4




            4




            $begingroup$
            I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
            $endgroup$
            – Ben Bolker
            7 hours ago




            $begingroup$
            I have very serious concerns about this approach (I know it's widely done), because it doesn't take the selection process (nor the process of selecting the penalty by cross-validation) into account. Can you point to references that assert that this approach actually has correct frequentist properties? (I can imagine that it's OK in particular cases where you have lots of data and the parameters have either zero (or very small) effects or very large effects ...)
            $endgroup$
            – Ben Bolker
            7 hours ago












            $begingroup$
            Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
            $endgroup$
            – majeed simaan
            5 hours ago




            $begingroup$
            Good point. I don't have a specific paper to justify this methodology, but my thinking that this should not be that different from the "true" standard errors. As you mentioned above, running a bootstrap would serve as a more robust solution. I reran the code with the addition of bootstrap and compared the results with the lm standard errors. Ignoring the intercept, the results seem to be consistent.
            $endgroup$
            – majeed simaan
            5 hours ago










            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.













            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.












            jpryan28 is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f410173%2flasso-regression-p-values-and-coefficients%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

            Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

            Nicolae Petrescu-Găină Cuprins Biografie | Opera | In memoriam | Varia | Controverse, incertitudini...