Alternatives to minimizing loss in regressionHow to design and implement an asymmetric loss function for...

Argand formula and more for quaternions?

In syntax, why cannot we say things like "he took walked at the park"? but can say "he took a walk at the park"?

Alternatives to minimizing loss in regression

Why was the LRV's speed gauge displaying metric units?

GNU GPL V3 with no code change disclosure

Exploiting the delay when a festival ticket is scanned

Who said "one can be a powerful king with a very small sceptre"?

Do the books ever say oliphaunts aren’t elephants?

What clothes would flying-people wear?

Why is it considered acid rain with pH <5.6?

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

What do I lose by going Paladin 17 / Warlock 3, instead of taking 1 additional level or 1 fewer level in Warlock?

A variant of the Multiple Traveling Salesman Problem

Piece of chess engine, which accomplishes move generation

What is the reason for cards stating "Until end of turn, you don't lose this mana as steps and phases end"?

How long does it take for electricity to be considered OFF by general appliances?

GNU sort stable sort when sort does not know sort order

How did the SysRq key get onto modern keyboards if it's rarely used?

How to season a character?

Was Donald Trump at ground zero helping out on 9-11?

How did astronauts using rovers tell direction without compasses on the Moon?

Why is my fluorescent tube orange on one side, white on the other and dark in the middle?

My employer is refusing to give me the pay that was advertised after an internal job move

Convert graph format for Mathematica graph functions



Alternatives to minimizing loss in regression


How to design and implement an asymmetric loss function for regression?Difference between Random Forest and MARTDesigning Asymmetric regression (assymettric loss for regression)Minimizing symmetric mean absolute percentage error (SMAPE)Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?When is logistic regression minimizing under squared error loss the same as maximizing binomial likelihood?Alternatives to Regression AnalysisHow does one compare the statistical performance of different models on a regression or function approximation task over different data sets?Loss function of linear regressionNeural Networks - Difference between 1 and 2 layers?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







2












$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$














  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    8 hours ago












  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    5 hours ago


















2












$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$














  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    8 hours ago












  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    5 hours ago














2












2








2


1



$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$




We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.







regression error






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 8 hours ago









user332577user332577

364 bronze badges




364 bronze badges















  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    8 hours ago












  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    5 hours ago


















  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    8 hours ago












  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    5 hours ago
















$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
8 hours ago






$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
8 hours ago














$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
5 hours ago




$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
5 hours ago










2 Answers
2






active

oldest

votes


















3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$











  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    5 hours ago












  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    3 hours ago



















2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$











  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    5 hours ago












  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    3 hours ago
















3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$











  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    5 hours ago












  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    3 hours ago














3












3








3





$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$




But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 8 hours ago









AdamOAdamO

37.7k2 gold badges68 silver badges151 bronze badges




37.7k2 gold badges68 silver badges151 bronze badges











  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    5 hours ago












  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    3 hours ago














  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    5 hours ago












  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    3 hours ago








1




1




$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
5 hours ago






$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
5 hours ago














$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
5 hours ago






$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
5 hours ago














$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
3 hours ago




$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
3 hours ago













2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago
















2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago














2












2








2





$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$



Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 7 hours ago

























answered 8 hours ago









olooneyolooney

2,2289 silver badges20 bronze badges




2,2289 silver badges20 bronze badges















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago


















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    5 hours ago












  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago
















$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
5 hours ago






$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
5 hours ago














$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago




$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Taj Mahal Inhaltsverzeichnis Aufbau | Geschichte | 350-Jahr-Feier | Heutige Bedeutung | Siehe auch |...

Baia Sprie Cuprins Etimologie | Istorie | Demografie | Politică și administrație | Arii naturale...

Ciclooctatetraenă Vezi și | Bibliografie | Meniu de navigare637866text4148569-500570979m