Posts Tagged ‘statistics’

The Significance of Underlying Variance for Social Outcomes

Thursday, 8 August 2024

Measures — quantities to which some arithmetic can be meaningfully applied — can be fitted to some human attributes, even if not to others. When attempting to compare some populations to others, an assumption is made that the properties of the individuals within these populations are subject to quantification of some sort, and that the quantities are commensurable across populations. But usually these assumptions are implicit and unrecognized, and even those who have some awareness that they are dealing with quantities very often don't have a proper grasp of elementary issues.

Very often, people try to understand distinct populations in terms of some notion of averages. If each and every member of every population were exactly average in every regard, then averages would be perfect measures of the populations as such. More generally, if for any two populations the share of that population deviating from average by some specific amount were the same, averages would be sufficient for any comparison of the attributes of populations, except for population sizes. But if the overall variance from average in one population is different from that in another, then thinking in terms of averages can go very, very wrong.

Here are hypothetic distributions for some attribute within two populations, each population having the same number of members:[1] [two lognormal distributions of equal median but of different variance] For both Population A and Population B, the median[2] of the attribute is the same, but Population B has more variance from the arithmetic mean than does Population A. Even though each of these two populations have the same median, more members of the population of greater variance are below some some measure, and more members of that same population are above some measure. [two lognormal distributions of equal median but of different variance] Population B2 [two lognormal distributions of equal mean but of different variance] has the same variance as Population B, but the same arithmetic mean (rather than median) as Population A. Again, even though the center is, by some measure, the same for both populations, more members of the population of greater variance are below some some measure, and more members of that same population are above some measure.

Even if a population has a higher center than Population A, if it has a greater variance then it will dominate the lower range of measures below some measure. [two lognormal distributions of different mean and variance] And even if a population has a lower center than Population A, if it has a greater variance then it will dominate the higher range of measures beyond some measure. [two lognormal distributions of different mean and variance]

If a measurable attribute correlates positively with social success, then ceteris paribus, a population of higher variance is going to dominate both social winners beyond some level and social losers below some level. If a population generally has greater variance amongst its attributes, then — discarding the assumption of ceteris paribus — that population is going to dominate both social winners beyond some level and social losers below some level; even if it has the same median or same mean or even a lower center than another populations.

In fact, though I cannot readily graph the cases in which attributes are only partially ordered and not measurable, the reader should see that the underlying point does not depend upon the measurability of the attributes, but only upon one population having greater propensity for variance than another.

But

  • If one is only looking at the losers and thoughtlessly assuming that their numbers are explained by averages, then one is going inappropriately to infer that the population is generically inferior.
  • If one is only looking at the losers, while thoughtlessly assuming that the averages are the same and that nothing else about the population itself can explain the difference in outcomes, then one is going inappropriately to infer that the population are victims of systemic bias.
  • If one is only looking at the winners and thoughtlessly assuming that their numbers are explained by averages, then one is going inappropriately to infer that the population is generically superior.
  • If one is only looking at the winners, while thoughtlessly assuming that the averages are the same and that nothing else about the population itself can explain the difference in outcomes, then one is going inappropriately to infer that any rival population are victims of systemic bias.

In each case, if we look at the other end of the distribution, the thoughtless conclusion falls apart.

When the last of these errors is made, an attempt may be undertaken to offset illusory bias, by putting an institutional thumb on the scales to shift Population A generally forward, until the the number of social winners at every level is at least the same. But notice what is then really happening as the relative outcomes for most members of the population of greater variance fall increasingly below those of the population of less variance — at previously targetted levels the population of lower variance comes to enjoy greater social success than does the population of greater variance. And notice that the population of greater variance necessarily still dominates above some value, albeït that the value increases as the institutional thumb comes down ever harder in a misguided attempt to match the upper tails of the distribution.

Only actual systemic bias can bring the number of social winners across populations into equality above any given level of social success beyond the center; and, the larger the population, the greater the required bias for such an outcome, and the more that most of the population of greater variance are victimized.

If the two populations are not equal in size, then the foregoing analysis would simply need to entail talk of proportionality. But one might as well speak and write of two populations of the same size, because the real-world application involves two populations that are very close to the same size in most first-world nations. The greater variance of one of those two populations is a consequence of the greater chaos in the formation of one of that population's chromosomes and of a lack of redunancy for another.

Unfortunately, most of the attempts to analyze what has been happening has entailed ham-fisted theorizing about differing averages.


[1] A population with finite membership cannot perfectly conform to a continuous distribution function; but, the larger the population, the less the necessary non-conformance.

[2] The median for each population is the point such that as many members of the population are above it as are below it.

Franken Does Truth a Service

Thursday, 8 October 2009
Franken gets testy over statistics by Eric Roper of the Star Tribune

The senator spent the bulk of his time attempting to debunk the witness, particularly a statistic in his testimony that employees have a 63 percent chance of prevailing in arbitration compared to 43 percent in litigation.

[…]

De Bernardo eventually conceded that he did not know whether $50 would be considered "prevailing" in the statistic,[…].

While Franken has at times resorted to worse intellectual dishonesty than in this case he exposes, here he is right on the mark. As Franken's line of questioning and the answer that it elicts show, the statistic in question tells us virtually nothing about whether the outcomes of arbitration would be considered equally or more favorable to employees than are the outcomes of litigation. It is, in other words, a garbage statistic.

Firms are entitled to require arbitration as a condition of doing business with them, but those who deal with these firms are likewise entitled to require that there be no such imposition as their own condition of doing business.

Other Statistical Analysis

Sunday, 14 June 2009

Meanwhile, Andrew Sullivan notes the The Results as They Came In for the Iranian Presidential election. Specifically, as the official figures were up-dated, they fell almost perfectly along a straight line. The fit has an R2 of .998, which is a virtual impossibility. So the official figures are a bald lie. That doesn't mean that Ahmadinejad wouldn't have won under a fair count, but it means that he chose to steal the election rather than to risk losing under a fair count.

Tweaking the Truth

Friday, 9 January 2009

Here's an example of a wretched journalistic practice:

US job losses hit record in 2008 from the BBC
More US workers lost jobs last year than in any year since World War II, with employers axing 2.6 million posts and 524,000 in December alone.

The US jobless rate rose to 7.2% in December, the highest in 16 years.
One should not begin with a news story with a lie, and then correct it. The head-line here doesn't refer to a post-war record. A head-line is often the only part of the story read, and almost always the first part of the story read. So people with no sense of history — because their educators in the school system and in the main-stream media don't impart one to them — are filled with anxiety and rage. The emotional effect lingers even after the correction is given, and some never get the correction.

The economic news has been bad, but it simply doesn't compare to that of the Great Depression — which itself shouldn't be seen as necessarily our worse down-turn. (Those who uncritically presume that it was should look into the Depression of 1837.)


One should, BTW, be careful to distinguish amongst different statistics:

  • the number of jobs lost,
  • the unemployment rate,
  • the employment rate,
  • changes in these rates.
Note that the article acknowledges that the unemployment rate is not at a post-war high; it has, rather, climbed faster than at any time previous in the post-war period. The unemployment rate itself was worse at the end of the kinder, gentler Administration of GHW Bush.

Second, when there is job creätion as well as job loss, people may lose jobs but spend relatively little time unemployed. (Being dismissed from a job is still a stressful experience for most people, but not necessarily equivalent to being materially impoverished.)

Finally, the unemployment rate is not simply the complement of the employment rate. The unemployment rate, which tries to measure the number people who are seeking employment and unable to find it, is a fairly junky statistic. On the one hand, it doesn't count people who would choose to work if the were offered a job, but who have just given-up hope of finding one; and it doesn't count people who are under-employed, wanting full-time jobs but only able to secure part-time employment. On the other hand, it does count people who aren't sincerely seeking employment, but are going through the motions of seeking a job so that they can continue to collect benefits from programmes that require them to seek employment. The employment rate is simply the percentage of people of working age who have jobs. It has problems — including that it counts under-employed people — but it's less junky that the unemployment rate.