Supplementary statistical struggles

A reader writes:

First off, thank you for running such a great site. I love learning from you, and the comfortable way you write so well. (You are a net benefit to the planet, something not true of many of us.)

So I apologize for criticizing. But unless I'm seriously misunderstanding you, your bit on chances of getting hit by a rocket is off.

It seems perfectly reasonable that the distribution of rockets hits will be a Poisson distribution: i.e., for a given area and a given unit time, a Poisson distribution will model the probability of getting hit once there in that time, versus twice, versus n times, versus never. (And note that this varies linearly, as you would expect, in both area and period of time.)

But the thing about a Poisson distribution is that it is (as the statisticians like to say) memoryless. The chance of getting hit in some area in any unit time is independent of how many times that area's been hit in the past.

So, sure, the chances of not getting hit in the next hour and getting hit in the hour after that are lower, by multiplication of probabilities, than the chances of simply getting hit in the next hour. But that's not a question anyone cares about, is it? Answering what I understand to be the writer's question, he's asking whether, an area having been hit in the last h hours, it is more or less likely to be hit in the next, say, h hours, And the answer is, if the rockets are no more aimed than V-2s, that it doesn't matter whether or not the area's been hit before.

But, again, what a great site you do. With thanks,

Eric

You're exactly right about the characteristics of the Poisson distribution, and the fact that a chance of a future hit does not depend in any way on whether a hit happened in the past, presuming hits truly are randomly distributed.

But the NEXT [arbitrary time period] in which any [arbitrary location] is likely to be hit is still the VERY NEXT [arbitrary time period], if the distribution is random. Because, as per the lightning-strike analogy, for a hit to happen the [time period] after next, it must NOT happen in the next [time period]. This gives lower and lower probabilities of the next hit being at a given time the further that time is in the future.

(You said this too, but I'm repeating it yet again because it's one of the slipperier statistical concepts and has led to a lot of erroneous conclusions on a wide range of subjects. See also non-transitivity, mechanical failure stats, and tax brackets.)

As you say, this is still no help at all in figuring out where and who is going to get hit, or not. But it's the explanation for the "clusters" that often make random events look very NON-random, and my correspondent from Israel wanted to know whether this apparent un-randomness was of any predictive value. Which, as you say, it unfortunately is not.

(Also, in reality, human aiming of even unguided garage-built rockets may entirely swamp the random-clustering effect. So in reality a missile landing in some particular place probably does mean more missiles will land there, but not because of any abstract quirk of probability.)

6 Responses to “Supplementary statistical struggles”

  1. wumpus Says:

    Actually, one big requirement for getting struck by lighting is that there has to be a thunderstorm going on. The odds of tomorrow's weather being the same as today's is pretty good (and beating this is a hard requirement for weather forecasting), and also likely implies that you are in any local thunderstorm season.

    If you computed both "chance of there being a thunderstorm" and "chance of getting struck by lightning in a thunderstorm", the effects could easily dwarf the tiny increment you are trying to show. Depending on the climate involved, the chance for being hit a few months later could fall to zero.

    Still, a fascinating look at tiny differences in odds.

    • wumpus Says:

      To be a bit more explicit: You are given one data point, at day 0 someone is struck by lightning (and presumably a thunderstorm occurred that day).

      Day 1 has a higher than average chance of a thunderstorm (and thus a higher than average chance of being struck by lightning), followed by day 2-day n quickly reverting to average chances of thunderstorms (which would also include seasonal climate consideration, but would also be added to day 1's odds).

      The fact that day 1 is likely to be the highest chance of "next time lightning strikes" is unsurprising considering it is likely the highest chance of lightning striking at all. You might want to switch to meteor strikes (ignoring the Perseids), but I've never heard "meteors don't strike the same place twice".

    • dan Says:

      Yeah, it's based on the Statistics Experiment Land premise that lightning-strikes are exactly equally likely on any day, and that lightning never strikes more than once. Real-world examples outside the casino tend to be rather less useful for explanation of the basic concepts.

  2. alan_cam Says:

    When am I most likely to win the lottery? This week.
    To win next week, multiply the odds off winning next week by the odds of losing this week.

    The kicker: the odds may be most favorable this week, but that doesn't make them good.

    • dan Says:

      Quite so.

      But again, real-world considerations butt in; your chance of winning the lottery this week if you neglect to purchase a ticket are substantially lower than the chance that a regular player will win the lottery after fifty years of their weekly tickets not winning :-).

      • Gary Says:

        Of course, by the same statistical analysis, and if we are living in Statistics World, the chance of NOT (being hit by lightning/winning the lottery etc.) is also highest in the following week. I think this only seems surprising because the problem is a conceptual one in that we somehow equate 'the chance of an event first happening in the next period' with 'the chance an event happens in a given period'. The problem is in understanding the implication of the question, not the answer.


Leave a Reply