Why does mean tend be more stable in different samples than median?Why is median age a better statistic than...
How frequently do Russian people still refer to others by their patronymic (отчество)?
How can I effectively map a multi-level dungeon?
Did Stalin kill all Soviet officers involved in the Winter War?
Is it possible that Curiosity measured its own methane or failed doing the spectrometry?
How can I define a very large matrix efficiently?
How can power levels matter in a magic system that emphasizes control?
What does it mean for a bass player to play "on the one"?
Why do Klingons use cloaking devices?
What instances can be solved today by modern solvers (pure LP)?
How to iterate equal values with the standard library?
Was Wolfgang Unzicker the last Amateur GM?
Should I warn my boss I might take sick leave
Who pays for increased security measures on flights to the US?
Show that there are infinitely more problems than we will ever be able to compute
Is it possible to spoof an IP address to an exact number?
What can a novel do that film and TV cannot?
Does Evolution Sage proliferate Blast Zone when played?
Sleepy tired vs physically tired
Do intermediate subdomains need to exist?
How did שְׁלֹמֹה (shlomo) become Solomon?
What do you call the angle of the direction of an airplane?
Can mxd files be under version control?
Do we have a much compact and generalized version of erase–remove idiom?
Apex Sleep: what is CPU penalty
Why does mean tend be more stable in different samples than median?
Why is median age a better statistic than mean age?Is median fairer than mean?Crash course in robust mean estimationWhen is the median more affected by sampling error than the mean?Is median better than mean for reporting on website speedWhat is the multivariate analog of the median?Is the mean smaller than the medianFor what (symmetric) distributions is sample mean a more efficient estimator than sample median?Comparison of medians in samples with unequal variance, size and shapeIs there more than one “median” formula?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
Section 1.7.2 of Discovering Statistics Using R by Andy Fields, et all, while listing a virtues of mean vs median, states:
... the mean tends to be stable in different samples.
This after explaining median's many virtues, e.g.
... The median is relatively unaffected by extreme scores at either end of the distribution ...
Given that median is relatively unaffected by extreme scores, I'd have thought it to be more stable across samples. So I was puzzled by authors's assertion. To confirm I ran a simulation — I generated 1M random numbers and sampled 100 numbers 1000 times and computed mean and median of each sample and then computed the sd of those sample means and medians.
nums = rnorm(n = 10**6, mean = 0, sd = 1)
hist(nums)
length(nums)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10**3) { b = sample(x=nums, 10**2); medians[i]= median(b); means[i]=mean(b) }
sd(means)
>> [1] 0.0984519
sd(medians)
>> [1] 0.1266079
p1 <- hist(means, col=rgb(0, 0, 1, 1/4))
p2 <- hist(medians, col=rgb(1, 0, 0, 1/4), add=T)
As you can see the means are more tightly distributed than medians.
In the attached image the red histogram is for medians — as you can see it is less tall and has fatter tail which also confirms the assertion of the author.
I’m flabbergasted by this, though! How can median which is more stable tends to ultimately vary more across samples? It seems paradoxical! Any insights would be appreciated.
mean median
New contributor
$endgroup$
add a comment |
$begingroup$
Section 1.7.2 of Discovering Statistics Using R by Andy Fields, et all, while listing a virtues of mean vs median, states:
... the mean tends to be stable in different samples.
This after explaining median's many virtues, e.g.
... The median is relatively unaffected by extreme scores at either end of the distribution ...
Given that median is relatively unaffected by extreme scores, I'd have thought it to be more stable across samples. So I was puzzled by authors's assertion. To confirm I ran a simulation — I generated 1M random numbers and sampled 100 numbers 1000 times and computed mean and median of each sample and then computed the sd of those sample means and medians.
nums = rnorm(n = 10**6, mean = 0, sd = 1)
hist(nums)
length(nums)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10**3) { b = sample(x=nums, 10**2); medians[i]= median(b); means[i]=mean(b) }
sd(means)
>> [1] 0.0984519
sd(medians)
>> [1] 0.1266079
p1 <- hist(means, col=rgb(0, 0, 1, 1/4))
p2 <- hist(medians, col=rgb(1, 0, 0, 1/4), add=T)
As you can see the means are more tightly distributed than medians.
In the attached image the red histogram is for medians — as you can see it is less tall and has fatter tail which also confirms the assertion of the author.
I’m flabbergasted by this, though! How can median which is more stable tends to ultimately vary more across samples? It seems paradoxical! Any insights would be appreciated.
mean median
New contributor
$endgroup$
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
4
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
1
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago
add a comment |
$begingroup$
Section 1.7.2 of Discovering Statistics Using R by Andy Fields, et all, while listing a virtues of mean vs median, states:
... the mean tends to be stable in different samples.
This after explaining median's many virtues, e.g.
... The median is relatively unaffected by extreme scores at either end of the distribution ...
Given that median is relatively unaffected by extreme scores, I'd have thought it to be more stable across samples. So I was puzzled by authors's assertion. To confirm I ran a simulation — I generated 1M random numbers and sampled 100 numbers 1000 times and computed mean and median of each sample and then computed the sd of those sample means and medians.
nums = rnorm(n = 10**6, mean = 0, sd = 1)
hist(nums)
length(nums)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10**3) { b = sample(x=nums, 10**2); medians[i]= median(b); means[i]=mean(b) }
sd(means)
>> [1] 0.0984519
sd(medians)
>> [1] 0.1266079
p1 <- hist(means, col=rgb(0, 0, 1, 1/4))
p2 <- hist(medians, col=rgb(1, 0, 0, 1/4), add=T)
As you can see the means are more tightly distributed than medians.
In the attached image the red histogram is for medians — as you can see it is less tall and has fatter tail which also confirms the assertion of the author.
I’m flabbergasted by this, though! How can median which is more stable tends to ultimately vary more across samples? It seems paradoxical! Any insights would be appreciated.
mean median
New contributor
$endgroup$
Section 1.7.2 of Discovering Statistics Using R by Andy Fields, et all, while listing a virtues of mean vs median, states:
... the mean tends to be stable in different samples.
This after explaining median's many virtues, e.g.
... The median is relatively unaffected by extreme scores at either end of the distribution ...
Given that median is relatively unaffected by extreme scores, I'd have thought it to be more stable across samples. So I was puzzled by authors's assertion. To confirm I ran a simulation — I generated 1M random numbers and sampled 100 numbers 1000 times and computed mean and median of each sample and then computed the sd of those sample means and medians.
nums = rnorm(n = 10**6, mean = 0, sd = 1)
hist(nums)
length(nums)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10**3) { b = sample(x=nums, 10**2); medians[i]= median(b); means[i]=mean(b) }
sd(means)
>> [1] 0.0984519
sd(medians)
>> [1] 0.1266079
p1 <- hist(means, col=rgb(0, 0, 1, 1/4))
p2 <- hist(medians, col=rgb(1, 0, 0, 1/4), add=T)
As you can see the means are more tightly distributed than medians.
In the attached image the red histogram is for medians — as you can see it is less tall and has fatter tail which also confirms the assertion of the author.
I’m flabbergasted by this, though! How can median which is more stable tends to ultimately vary more across samples? It seems paradoxical! Any insights would be appreciated.
mean median
mean median
New contributor
New contributor
New contributor
asked 8 hours ago
Alok LalAlok Lal
211 bronze badge
211 bronze badge
New contributor
New contributor
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
4
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
1
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago
add a comment |
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
4
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
1
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
4
4
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
1
1
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Comment: Just to echo back your simulation, using a distribution for which SDs of means and medians have the opposite result:
Specifically, nums
are now from a Laplace distribution (also called 'double exponential'), which can be simulated as the difference of two exponential distribution with the same rate (here the default rate 1).
[Perhaps see Wikipedia on Laplace distributions.]
set.seed(2019)
nums = rexp(10^6) - rexp(10^6)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10^3) { b = sample(x=nums, 10^2);
medians[i]= median(b); means[i]=mean(b) }
sd(means)
[1] 0.1442126
sd(medians)
[1] 0.1095946 # <-- smaller
hist(nums, prob=T, br=70, ylim=c(0,.5), col="skyblue2")
curve(.5*exp(-abs(x)), add=T, col="red")
Note: Another easy possibility, explicitly mentioned in @whuber's link, is Cauchy, which
can be simulated as Student's t distribution with one degree of freedom, rt(10^6, 1)
. However, its tails are so heavy that making a nice histogram is problematic.
$endgroup$
add a comment |
$begingroup$
As @whuber and others have said, the statement is not true in general. And if you’re willing to be more intuitive — I can’t keep up with the deep math geeks around here — you might look at other ways mean and median are stable or not. For these examples, assume an odd number of points so I can keep my descriptions consistent and simple.
Imagine you have spread of points on a number line. Now imagine you take all of the points above the middle and move them up to 10x their values. The median is unchanged, the mean moved significantly. So the median seems more stable.
Now imagine these points are fairly spread out. Move the center point up and down. A one-unit move changes the median by one, but barely moved the mean. The median now seems less stable and more sensitive to small movements of a single point.
Now imagine taking the highest point and moving it smoothly from the highest to the lowest point. The mean will also smoothly move. But the median will jump: it won’t move at all until your high point becomes lower than the previous median, then it starts following the point until it goes below the next point, then the median jumps instantly to that point and again doesn’t move as you continue moving your point downwards. This is very digital-like and obviously the median is not as smooth as the mean. In fact it has a discontinuity where it instantaneously changes value. Unstable.
So different transformations of your points cause either mean or median to look less smooth or stable in some sense. The math heavy-hitters here have shown you distributions from which you can sample, which more closely matches your experiment, but hopefully this intuition helps as well.
$endgroup$
add a comment |
$begingroup$
Suppose you have $n$ data points from some underlying continuous distribution with mean $mu$ and variance $sigma^2 < infty$. Let $f$ be the density function for this distribution and let $m$ be its median. To simplify this result further, let $tilde{f}$ be the corresponding standardised density function, given by $tilde{f}(z) = sigma cdot f(mu+sigma z)$ for all $z in mathbb{R}$. The asymptotic variance of the sample mean and sample median are given respectively by:
$$mathbb{V}(bar{X}_n) = frac{sigma^2}{n}
quad quad quad quad quad
mathbb{V}(tilde{X}_n) rightarrow frac{sigma^2}{n} cdot frac{1}{4} cdot tilde{f}Big( frac{m-mu}{sigma} Big)^{-2}.$$
We therefore have:
$$frac{mathbb{V}(bar{X}_n)}{mathbb{V}(tilde{X}_n)} rightarrow 4 cdot tilde{f}Big( frac{m-mu}{sigma} Big)^2.$$
As you can see, the relative size of the variance of the sample mean and sample median is determined (asymptotically) by the standardised density value at the true median. Thus, for large $n$ we have the asymptotic correspondence:
$$mathbb{V}(bar{X}_n) < mathbb{V}(tilde{X}_n)
quad quad iff quad quad
f_* equiv tilde{f} Big( frac{m-mu}{sigma} Big) < frac{1}{2}.$$
That is, for large $n$, and speaking asymptotically, the variance of the sample mean will be lower than the variance of the sample median if and only if the standardised density at the standardised median value is less than one-half. The data you used in your simulation example was generated from a normal distribution, so you have $f_* = 1 / sqrt{2 pi} = 0.3989423 < 1/2$. Thus, it is unsurprising that you found a higher variance for the sample median in that example.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Alok Lal is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f415755%2fwhy-does-mean-tend-be-more-stable-in-different-samples-than-median%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Comment: Just to echo back your simulation, using a distribution for which SDs of means and medians have the opposite result:
Specifically, nums
are now from a Laplace distribution (also called 'double exponential'), which can be simulated as the difference of two exponential distribution with the same rate (here the default rate 1).
[Perhaps see Wikipedia on Laplace distributions.]
set.seed(2019)
nums = rexp(10^6) - rexp(10^6)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10^3) { b = sample(x=nums, 10^2);
medians[i]= median(b); means[i]=mean(b) }
sd(means)
[1] 0.1442126
sd(medians)
[1] 0.1095946 # <-- smaller
hist(nums, prob=T, br=70, ylim=c(0,.5), col="skyblue2")
curve(.5*exp(-abs(x)), add=T, col="red")
Note: Another easy possibility, explicitly mentioned in @whuber's link, is Cauchy, which
can be simulated as Student's t distribution with one degree of freedom, rt(10^6, 1)
. However, its tails are so heavy that making a nice histogram is problematic.
$endgroup$
add a comment |
$begingroup$
Comment: Just to echo back your simulation, using a distribution for which SDs of means and medians have the opposite result:
Specifically, nums
are now from a Laplace distribution (also called 'double exponential'), which can be simulated as the difference of two exponential distribution with the same rate (here the default rate 1).
[Perhaps see Wikipedia on Laplace distributions.]
set.seed(2019)
nums = rexp(10^6) - rexp(10^6)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10^3) { b = sample(x=nums, 10^2);
medians[i]= median(b); means[i]=mean(b) }
sd(means)
[1] 0.1442126
sd(medians)
[1] 0.1095946 # <-- smaller
hist(nums, prob=T, br=70, ylim=c(0,.5), col="skyblue2")
curve(.5*exp(-abs(x)), add=T, col="red")
Note: Another easy possibility, explicitly mentioned in @whuber's link, is Cauchy, which
can be simulated as Student's t distribution with one degree of freedom, rt(10^6, 1)
. However, its tails are so heavy that making a nice histogram is problematic.
$endgroup$
add a comment |
$begingroup$
Comment: Just to echo back your simulation, using a distribution for which SDs of means and medians have the opposite result:
Specifically, nums
are now from a Laplace distribution (also called 'double exponential'), which can be simulated as the difference of two exponential distribution with the same rate (here the default rate 1).
[Perhaps see Wikipedia on Laplace distributions.]
set.seed(2019)
nums = rexp(10^6) - rexp(10^6)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10^3) { b = sample(x=nums, 10^2);
medians[i]= median(b); means[i]=mean(b) }
sd(means)
[1] 0.1442126
sd(medians)
[1] 0.1095946 # <-- smaller
hist(nums, prob=T, br=70, ylim=c(0,.5), col="skyblue2")
curve(.5*exp(-abs(x)), add=T, col="red")
Note: Another easy possibility, explicitly mentioned in @whuber's link, is Cauchy, which
can be simulated as Student's t distribution with one degree of freedom, rt(10^6, 1)
. However, its tails are so heavy that making a nice histogram is problematic.
$endgroup$
Comment: Just to echo back your simulation, using a distribution for which SDs of means and medians have the opposite result:
Specifically, nums
are now from a Laplace distribution (also called 'double exponential'), which can be simulated as the difference of two exponential distribution with the same rate (here the default rate 1).
[Perhaps see Wikipedia on Laplace distributions.]
set.seed(2019)
nums = rexp(10^6) - rexp(10^6)
means=vector(mode = "numeric")
medians=vector(mode = "numeric")
for (i in 1:10^3) { b = sample(x=nums, 10^2);
medians[i]= median(b); means[i]=mean(b) }
sd(means)
[1] 0.1442126
sd(medians)
[1] 0.1095946 # <-- smaller
hist(nums, prob=T, br=70, ylim=c(0,.5), col="skyblue2")
curve(.5*exp(-abs(x)), add=T, col="red")
Note: Another easy possibility, explicitly mentioned in @whuber's link, is Cauchy, which
can be simulated as Student's t distribution with one degree of freedom, rt(10^6, 1)
. However, its tails are so heavy that making a nice histogram is problematic.
edited 6 hours ago
answered 6 hours ago
BruceETBruceET
10k1 gold badge8 silver badges24 bronze badges
10k1 gold badge8 silver badges24 bronze badges
add a comment |
add a comment |
$begingroup$
As @whuber and others have said, the statement is not true in general. And if you’re willing to be more intuitive — I can’t keep up with the deep math geeks around here — you might look at other ways mean and median are stable or not. For these examples, assume an odd number of points so I can keep my descriptions consistent and simple.
Imagine you have spread of points on a number line. Now imagine you take all of the points above the middle and move them up to 10x their values. The median is unchanged, the mean moved significantly. So the median seems more stable.
Now imagine these points are fairly spread out. Move the center point up and down. A one-unit move changes the median by one, but barely moved the mean. The median now seems less stable and more sensitive to small movements of a single point.
Now imagine taking the highest point and moving it smoothly from the highest to the lowest point. The mean will also smoothly move. But the median will jump: it won’t move at all until your high point becomes lower than the previous median, then it starts following the point until it goes below the next point, then the median jumps instantly to that point and again doesn’t move as you continue moving your point downwards. This is very digital-like and obviously the median is not as smooth as the mean. In fact it has a discontinuity where it instantaneously changes value. Unstable.
So different transformations of your points cause either mean or median to look less smooth or stable in some sense. The math heavy-hitters here have shown you distributions from which you can sample, which more closely matches your experiment, but hopefully this intuition helps as well.
$endgroup$
add a comment |
$begingroup$
As @whuber and others have said, the statement is not true in general. And if you’re willing to be more intuitive — I can’t keep up with the deep math geeks around here — you might look at other ways mean and median are stable or not. For these examples, assume an odd number of points so I can keep my descriptions consistent and simple.
Imagine you have spread of points on a number line. Now imagine you take all of the points above the middle and move them up to 10x their values. The median is unchanged, the mean moved significantly. So the median seems more stable.
Now imagine these points are fairly spread out. Move the center point up and down. A one-unit move changes the median by one, but barely moved the mean. The median now seems less stable and more sensitive to small movements of a single point.
Now imagine taking the highest point and moving it smoothly from the highest to the lowest point. The mean will also smoothly move. But the median will jump: it won’t move at all until your high point becomes lower than the previous median, then it starts following the point until it goes below the next point, then the median jumps instantly to that point and again doesn’t move as you continue moving your point downwards. This is very digital-like and obviously the median is not as smooth as the mean. In fact it has a discontinuity where it instantaneously changes value. Unstable.
So different transformations of your points cause either mean or median to look less smooth or stable in some sense. The math heavy-hitters here have shown you distributions from which you can sample, which more closely matches your experiment, but hopefully this intuition helps as well.
$endgroup$
add a comment |
$begingroup$
As @whuber and others have said, the statement is not true in general. And if you’re willing to be more intuitive — I can’t keep up with the deep math geeks around here — you might look at other ways mean and median are stable or not. For these examples, assume an odd number of points so I can keep my descriptions consistent and simple.
Imagine you have spread of points on a number line. Now imagine you take all of the points above the middle and move them up to 10x their values. The median is unchanged, the mean moved significantly. So the median seems more stable.
Now imagine these points are fairly spread out. Move the center point up and down. A one-unit move changes the median by one, but barely moved the mean. The median now seems less stable and more sensitive to small movements of a single point.
Now imagine taking the highest point and moving it smoothly from the highest to the lowest point. The mean will also smoothly move. But the median will jump: it won’t move at all until your high point becomes lower than the previous median, then it starts following the point until it goes below the next point, then the median jumps instantly to that point and again doesn’t move as you continue moving your point downwards. This is very digital-like and obviously the median is not as smooth as the mean. In fact it has a discontinuity where it instantaneously changes value. Unstable.
So different transformations of your points cause either mean or median to look less smooth or stable in some sense. The math heavy-hitters here have shown you distributions from which you can sample, which more closely matches your experiment, but hopefully this intuition helps as well.
$endgroup$
As @whuber and others have said, the statement is not true in general. And if you’re willing to be more intuitive — I can’t keep up with the deep math geeks around here — you might look at other ways mean and median are stable or not. For these examples, assume an odd number of points so I can keep my descriptions consistent and simple.
Imagine you have spread of points on a number line. Now imagine you take all of the points above the middle and move them up to 10x their values. The median is unchanged, the mean moved significantly. So the median seems more stable.
Now imagine these points are fairly spread out. Move the center point up and down. A one-unit move changes the median by one, but barely moved the mean. The median now seems less stable and more sensitive to small movements of a single point.
Now imagine taking the highest point and moving it smoothly from the highest to the lowest point. The mean will also smoothly move. But the median will jump: it won’t move at all until your high point becomes lower than the previous median, then it starts following the point until it goes below the next point, then the median jumps instantly to that point and again doesn’t move as you continue moving your point downwards. This is very digital-like and obviously the median is not as smooth as the mean. In fact it has a discontinuity where it instantaneously changes value. Unstable.
So different transformations of your points cause either mean or median to look less smooth or stable in some sense. The math heavy-hitters here have shown you distributions from which you can sample, which more closely matches your experiment, but hopefully this intuition helps as well.
answered 5 hours ago
WayneWayne
16.7k2 gold badges40 silver badges80 bronze badges
16.7k2 gold badges40 silver badges80 bronze badges
add a comment |
add a comment |
$begingroup$
Suppose you have $n$ data points from some underlying continuous distribution with mean $mu$ and variance $sigma^2 < infty$. Let $f$ be the density function for this distribution and let $m$ be its median. To simplify this result further, let $tilde{f}$ be the corresponding standardised density function, given by $tilde{f}(z) = sigma cdot f(mu+sigma z)$ for all $z in mathbb{R}$. The asymptotic variance of the sample mean and sample median are given respectively by:
$$mathbb{V}(bar{X}_n) = frac{sigma^2}{n}
quad quad quad quad quad
mathbb{V}(tilde{X}_n) rightarrow frac{sigma^2}{n} cdot frac{1}{4} cdot tilde{f}Big( frac{m-mu}{sigma} Big)^{-2}.$$
We therefore have:
$$frac{mathbb{V}(bar{X}_n)}{mathbb{V}(tilde{X}_n)} rightarrow 4 cdot tilde{f}Big( frac{m-mu}{sigma} Big)^2.$$
As you can see, the relative size of the variance of the sample mean and sample median is determined (asymptotically) by the standardised density value at the true median. Thus, for large $n$ we have the asymptotic correspondence:
$$mathbb{V}(bar{X}_n) < mathbb{V}(tilde{X}_n)
quad quad iff quad quad
f_* equiv tilde{f} Big( frac{m-mu}{sigma} Big) < frac{1}{2}.$$
That is, for large $n$, and speaking asymptotically, the variance of the sample mean will be lower than the variance of the sample median if and only if the standardised density at the standardised median value is less than one-half. The data you used in your simulation example was generated from a normal distribution, so you have $f_* = 1 / sqrt{2 pi} = 0.3989423 < 1/2$. Thus, it is unsurprising that you found a higher variance for the sample median in that example.
$endgroup$
add a comment |
$begingroup$
Suppose you have $n$ data points from some underlying continuous distribution with mean $mu$ and variance $sigma^2 < infty$. Let $f$ be the density function for this distribution and let $m$ be its median. To simplify this result further, let $tilde{f}$ be the corresponding standardised density function, given by $tilde{f}(z) = sigma cdot f(mu+sigma z)$ for all $z in mathbb{R}$. The asymptotic variance of the sample mean and sample median are given respectively by:
$$mathbb{V}(bar{X}_n) = frac{sigma^2}{n}
quad quad quad quad quad
mathbb{V}(tilde{X}_n) rightarrow frac{sigma^2}{n} cdot frac{1}{4} cdot tilde{f}Big( frac{m-mu}{sigma} Big)^{-2}.$$
We therefore have:
$$frac{mathbb{V}(bar{X}_n)}{mathbb{V}(tilde{X}_n)} rightarrow 4 cdot tilde{f}Big( frac{m-mu}{sigma} Big)^2.$$
As you can see, the relative size of the variance of the sample mean and sample median is determined (asymptotically) by the standardised density value at the true median. Thus, for large $n$ we have the asymptotic correspondence:
$$mathbb{V}(bar{X}_n) < mathbb{V}(tilde{X}_n)
quad quad iff quad quad
f_* equiv tilde{f} Big( frac{m-mu}{sigma} Big) < frac{1}{2}.$$
That is, for large $n$, and speaking asymptotically, the variance of the sample mean will be lower than the variance of the sample median if and only if the standardised density at the standardised median value is less than one-half. The data you used in your simulation example was generated from a normal distribution, so you have $f_* = 1 / sqrt{2 pi} = 0.3989423 < 1/2$. Thus, it is unsurprising that you found a higher variance for the sample median in that example.
$endgroup$
add a comment |
$begingroup$
Suppose you have $n$ data points from some underlying continuous distribution with mean $mu$ and variance $sigma^2 < infty$. Let $f$ be the density function for this distribution and let $m$ be its median. To simplify this result further, let $tilde{f}$ be the corresponding standardised density function, given by $tilde{f}(z) = sigma cdot f(mu+sigma z)$ for all $z in mathbb{R}$. The asymptotic variance of the sample mean and sample median are given respectively by:
$$mathbb{V}(bar{X}_n) = frac{sigma^2}{n}
quad quad quad quad quad
mathbb{V}(tilde{X}_n) rightarrow frac{sigma^2}{n} cdot frac{1}{4} cdot tilde{f}Big( frac{m-mu}{sigma} Big)^{-2}.$$
We therefore have:
$$frac{mathbb{V}(bar{X}_n)}{mathbb{V}(tilde{X}_n)} rightarrow 4 cdot tilde{f}Big( frac{m-mu}{sigma} Big)^2.$$
As you can see, the relative size of the variance of the sample mean and sample median is determined (asymptotically) by the standardised density value at the true median. Thus, for large $n$ we have the asymptotic correspondence:
$$mathbb{V}(bar{X}_n) < mathbb{V}(tilde{X}_n)
quad quad iff quad quad
f_* equiv tilde{f} Big( frac{m-mu}{sigma} Big) < frac{1}{2}.$$
That is, for large $n$, and speaking asymptotically, the variance of the sample mean will be lower than the variance of the sample median if and only if the standardised density at the standardised median value is less than one-half. The data you used in your simulation example was generated from a normal distribution, so you have $f_* = 1 / sqrt{2 pi} = 0.3989423 < 1/2$. Thus, it is unsurprising that you found a higher variance for the sample median in that example.
$endgroup$
Suppose you have $n$ data points from some underlying continuous distribution with mean $mu$ and variance $sigma^2 < infty$. Let $f$ be the density function for this distribution and let $m$ be its median. To simplify this result further, let $tilde{f}$ be the corresponding standardised density function, given by $tilde{f}(z) = sigma cdot f(mu+sigma z)$ for all $z in mathbb{R}$. The asymptotic variance of the sample mean and sample median are given respectively by:
$$mathbb{V}(bar{X}_n) = frac{sigma^2}{n}
quad quad quad quad quad
mathbb{V}(tilde{X}_n) rightarrow frac{sigma^2}{n} cdot frac{1}{4} cdot tilde{f}Big( frac{m-mu}{sigma} Big)^{-2}.$$
We therefore have:
$$frac{mathbb{V}(bar{X}_n)}{mathbb{V}(tilde{X}_n)} rightarrow 4 cdot tilde{f}Big( frac{m-mu}{sigma} Big)^2.$$
As you can see, the relative size of the variance of the sample mean and sample median is determined (asymptotically) by the standardised density value at the true median. Thus, for large $n$ we have the asymptotic correspondence:
$$mathbb{V}(bar{X}_n) < mathbb{V}(tilde{X}_n)
quad quad iff quad quad
f_* equiv tilde{f} Big( frac{m-mu}{sigma} Big) < frac{1}{2}.$$
That is, for large $n$, and speaking asymptotically, the variance of the sample mean will be lower than the variance of the sample median if and only if the standardised density at the standardised median value is less than one-half. The data you used in your simulation example was generated from a normal distribution, so you have $f_* = 1 / sqrt{2 pi} = 0.3989423 < 1/2$. Thus, it is unsurprising that you found a higher variance for the sample median in that example.
answered 5 hours ago
BenBen
33.6k2 gold badges40 silver badges146 bronze badges
33.6k2 gold badges40 silver badges146 bronze badges
add a comment |
add a comment |
Alok Lal is a new contributor. Be nice, and check out our Code of Conduct.
Alok Lal is a new contributor. Be nice, and check out our Code of Conduct.
Alok Lal is a new contributor. Be nice, and check out our Code of Conduct.
Alok Lal is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f415755%2fwhy-does-mean-tend-be-more-stable-in-different-samples-than-median%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Yeah, but try it by sampling from nums <- rt(n = 10**6, 1.1). That t1.1 distribution will give a bunch of extreme values, not necessarily balanced between positive and negative (just as good a chance of getting another positive extreme value as a negative extreme value to balance), that will cause a gigantic variance in $bar{x}$. This is what median shields against. The normal distribution is unlikely to give any especially extreme values to stretch out the $bar{x}$ distribution wider than median.
$endgroup$
– Dave
8 hours ago
4
$begingroup$
The author's statement is not generally true. (We have received many questions here related to errors in this author's books, so this is not a surprise.) The standard counterexamples are found among the "stable distributions", where the mean is anything but "stable" (in any reasonable sense of the term) and the median is far more stable.
$endgroup$
– whuber♦
7 hours ago
1
$begingroup$
"... the mean tends to be stable in different samples." is a nonsense statement. "stability" is not well defined. The (sample) mean is indeed quite stable in a single sample because it is a nonrandom quantity. If the data are "instable" (highly variable?) the mean is "instable" too.
$endgroup$
– AdamO
6 hours ago