Stats Heads: What's the math?

iago · April 24, 2003, 08:02:23 PM

I'm trying to write up some quickie perl functions to run some probabilities on several dice methods, and I want to be able to avoid the ugly and memory-consumptive brute force methods that are far too easy to fall into.

Problem is, stats math makes my head swim a little, but I'll start out by showing what I _do_ know.

Okay, so I know that if I have 2 ten-sided dice, and I'm trying to work out the probability of rolling a single 5 on one, but not the other, looks like this:

Probability of rolling a 5 (P) = .1
Probability of not rolling a 5 (Q) = .9
Number of rolls (N) = 2
Number of times a 5 should occur (X) = 1
Number of times a 5 should not occur (F) = N - X = 1

Exact chance of rolling only one 5 (C):

C = ( N! / ( X! * F! ) ) * ( P * X ) * ( Q * F )

Okay, so that's great.

What I haven't figured out, easily at least, is the math to use to calculate the following things:

On a AdB, what's the chance of rolling a total T? (Simple example: 2d10, chance of rolling a total of 7)

On a AdB, selecting the best C, what's the chance of rolling a total T? (Simple example: 4d10, select best 2, chance of rolling a total of 7)

On a AdB, selecting the worst C, what's the chance of rolling a total T? (Simple example: 4d10, select worst 2, chance of rolling a total of 7)

Anyone out there have the kind of on-hand stats knowledge that can tell me what the formulae should look like? I want to get better at this, but I feel like I don't know the vocabulary to use in a google search to find an online tutorial that'll make sense to me.

bowlingm · April 24, 2003, 08:35:29 PM

Iago,

I'm not sure there's a closed form solution to these type of things. You can do a recursive definition and get the following. Let's assume the dice go between 0 and B. I know this isn't true, but one can just subtract A from T and decrease B by one to make this assumption true.

For case (1)

Pr(A, B, T) = 1 if T==0 and A==0, OR
Pr(A, B, T) = 0 if T<0 or A<=0, OR
Pr(A, B, T) = sum_{i=0..T} F(A-1, B, T-i)

You could define this as a perl function pretty easily, although it probably won't be very efficient. You could memoize it, if you happen to be a comp sci graduate, and so know what that is.

Alternatively, why not just estimate it. Computers are really fast, just let them roll the dice one million times and find out how often it happens. It's far easier to code, easier to modify to try different things, and probably faster than the recursive solution.

So, you just generate four random numbers and check if they sum to T. Or sort them and sum the C smallest to see if they sum to T. Repeat 10^6 times and your accuracy should be pretty good. In fact, if you want you can look up the stat accuracy on such a binomial distribution to know how many you need to roll for a certain accuracy. You can google for binomial hypothesis testing or confidence intervals if you want to be certain.

Mike

iago · April 24, 2003, 08:38:46 PM

Quote from: bowlingmAlternatively, why not just estimate it. Computers are really fast, just let them roll the dice one million times and find out how often it happens. It's far easier to code, easier to modify to try different things, and probably faster than the recursive solution.

My first stab at running through all the possible die combinations yielded out of memory errors -- beyond certain quantities of diec this just isn't practical. I'd expect the recursive solution, which (at first blush reading) has fewer than 10^(number of dice rolled) calculations per given set of initial arguments, would be the faster one.

Mike Holmes · April 24, 2003, 09:52:20 PM

Quote from: iagoI'm trying to write up some quickie perl functions to run some probabilities on several dice methods, and I want to be able to avoid the ugly and memory-consumptive brute force methods that are far too easy to fall into.

Don't I know it. I have a spreadsheet open at all times, just in case. Makes it too easy to go that rout rather than flexing the brain muscle.

Feeling that way today, too. So check out:
http://www.indie-rpgs.com/viewtopic.php?t=875

That might help, though I can't remember.

Also watch your phrasing on your questions.

QuoteOn a AdB, what's the chance of rolling a total T? (Simple example: 2d10, chance of rolling a total of 7)

Properly read that's the chance of rolling seven and only seven. Most people mean seven or higher, or somesuch. Make it clear.

If you haven't gotten it by tomorrow, I may take a stab at it. Sorry for being lazy today. :-)

Mike

iago · April 24, 2003, 09:57:29 PM

Quote from: Mike HolmesAlso watch your phrasing on your questions.

QuoteOn a AdB, what's the chance of rolling a total T? (Simple example: 2d10, chance of rolling a total of 7)
Properly read that's the chance of rolling seven and only seven. Most people mean seven or higher, or somesuch. Make it clear.

Meant it as I said it -- a chance of rolling a total of 7, not a chance of rolling a total of 7 or higher.

7 or higher's easy -- add up the chances of rolling exactly 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, and 20.

What I'm looking for here is the atom of exact totals -- once I've got those, doing a 'sum of all results in a range' is pretty easy aggregated math.

It's the exact-chance function that's the bitch. :)

Mike Holmes · April 24, 2003, 10:08:11 PM

Quote from: iagoIt's the exact-chance function that's the bitch. :)

Check the last post in the thread I cited above, including the link there.

Mike

iago · April 24, 2003, 10:10:00 PM

Quote from: Mike HolmesCheck the last post in the thread I cited above, including the link there.

Workin' on it.

The last post in it seems to have at least the core of what I'm going for, though I don't know yet if it evinces a strategy for the 'total of lowest/highest X' -- but it gets me started. :)

Walt Freitag · April 24, 2003, 10:26:06 PM

Hi Fred,

Haven't checked the heavy-math methods, just thinking about programming approaches.

The first one can be done very efficiently in both time and space using an array for the probability distribution across each possible total. You can accumulate this distribution one die at a time because only the totals matter.

A distribution of totals is a list in which the i-th element is the probablity of the total being i. The index of the first element is zero. So, for example, the distribution for 1d10 would be [0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1].

Code Select


oldDist = [1.0]  -- before any dice are rolled, total is 0 with 100% probability

repeat with dieCounter = 1 to A
  newDist = []
  repeat with distIndex = (dieCounter - 1) to (dieCounter - 1) * B
    repeat with sideCounter = 1 to B
      newDist[distIndex + sideCounter] = 
             newDist[distIndex + sideCounter] + (oldDist[distIndex] / B)
    end repeat
  end repeat
  oldDist = newDist
end repeat

Depending on the language, you might have to initialize newDist to be an array of zeros with dieCounter * B + 1 elements instead of just an empty array.

This gives you a distribution of probabilities across all the possible totals, which you can easily look up for a specific value or add up totals to get e.g. the probability of rolling over or under a certain total. And it's very efficient in space (needs only two arrays of maximum A * B + 1 elements each) and time (takes less than (A * B)^2 steps to go through the three nested loops).

(BTW, there's something a little odd about Mike's recursive method. The results appear to be independent of B, which doesn't make sense. Should it be {i=0..B} in the last line instead of {i=0..T}?)

Unfortunately it won't work for the latter two problems, since for them the actual results of all previous dice have to be carried over, not just the totals.

For the other two problems, you can use a recursive method to step through all the possible die combinations and just tally up how many meet the success conditions and how many don't. This will get very slow at high numbers of dice (it takes B ^ A steps) but at least you won't get memory overflows from creating huge arrays.

In this case, a one-dimensional array is handed down representing a die roll to which each level of recursion adds another die; at the base case the die roll is evaluated by a separate routine (not included) that determines e.g. whether or not the C highest numbers in the roll add up to T, and returns 1.0 if yes, otherwise 0.0. The rolls will get checked in the order [1, 1, ... , 1], then [1, 1, ... , 2] to [B, B, ... , B].

You could do the same thing with nested loops but then in most languages you can't make the number of layers of nesting ( = A) a variable. The recursive format solves that.

Call the following function with roll = [] to start.

Code Select


Pr(A, B, C, T, roll)
  prob = 0
  if A = 1 then
    repeat with thisDieValue = 1 to B
      prob = prob + evaluateRoll (B, C, T, append (roll, thisDieValue)) / B
    end repeat
  else
    repeat with thisDieValue = 1 to B
      prob = prob + pr(A-1, B, C, T, append(roll, thisDieValue)) / B
    end repeat
  end if
return prob

Append adds the element to the list, so for example append ([3, 4, 5], 7) ==> [3, 4, 5, 7] and append([], 5) ==> [5].

It would be more traditional to make the recursive calls all the way to A = 0 (in which case the function would evaluateRoll on the roll passed to it with no further changes) but this way saves a few steps.

You could get a bit more efficient with a lot more complexity, by stepping through the rolls in such a way that equivalent rolls (such as [1, 2, 3] vs. [3, 1, 2]) are not repeated, and instead each roll is weighted by the number of different ways it can appear (e.g. [1, 2, 3] would have a weighting of 6, compared to a weighting of 1 for [1, 1, 1]).

Caution: the pseudocode is not tested, may need some tweaking.

- Walt

J B Bell · April 24, 2003, 10:48:35 PM

Combinatorics with dice beyond the very simplest systems are prone, I've found, to a lot of "operator error". Dice pools can get very hideous very fast.

I don't want to discourage you from really doing the math rather than crunching through a million trials, but I went this route a long time and mainly got a lot of contusions from slamming my head on my desk, even with the help of a professional mathematician*.

What I did recently after slogging through another syrupy run of many trials was to learn a little C. Programs to bash through a zillion trials are simple in any language, and C isn't hard for someone who knows Perl (except that you'll be howling in pain about all the convenience functions it lacks for a little while). C is also amazingly, super-duper fast. Stuff that was taking over a minute each for 20 different trials just whizzes by too fast to read now. Whee!

Consider giving it a shot. Think of the street cred; guys who code in C are the steroid-enhanced jocks of the programming world. You'll have your stats in record time and be able to spend what would have been boring hours waiting for results picking among your adoring, skimpily-attired fans. Oh, wait. Maybe not. But it is faster.

--JB

* He helped me with the math, not the contusions.

M. J. Young · April 25, 2003, 04:17:54 AM

I've skimmed a lot of what's here, so forgive me if I'm hitting something someone already said.

I often read someone suggesting that you set up a computer to run the dice a million times and see the results. I find this silly; it seems to be a lot more work for the computer for a lot less accuracy.

Whenever I've done a dice test, I've set up nested loops (I work in Basic, mostly, even though I'm sure there are faster ways to do it in better languages). Each loop represents a die, stepping from the lowest to the highest value.

The tests are done in the middle.

Thus for AdB, the chance of rolling T is rather easily determined. Since the total number of permutations is B^A, you don't have to track that. Set up A loops step B, and in the center of the loop add the sum of each loop value, compare it to T, and increase X=X+1. When it finishes, have it print X and also X/(B^A), and you've got your probability.

Usually when I do these, I dimension a matrix for the range of the possible totals (for AdB, that's generally from A to A*B, but it's easier to let it run from 1 to A*B, at least in Basic, because you can use the calculated value as the position in the matrix and have it report back the matrix variable value). From this, you can get it to report the probability of each outcome in one shot.

If you're using a lot of dice or particularly large dice, this can take a while; but I don't think it takes as long as running a million random rolls, and it will be more accurate. Also, it goes slower in Basic than it does in more advanced languages.

"Selecting the best/worst C" makes it more complicated, but you could incorporate that into the test with a few extra steps. In Basic, it would be a nightmare trying to create the test that would identify the two lowest or two highest, but I suspect you can do that much more easily in other languages.

I hope my meager input is of some help here.

--M. J. Young

iago · April 25, 2003, 04:30:48 AM

A lot of the suggestions revolve around brute-forcing the problem. I'd hoped to avoid it, but as it turns out, brute seems the way to go.

The program I ended up with started to really hyperventillate around about rolling 12d10 selecting the 2 best -- I think all of its problem was in the 12d10, not selecting the 2 best.

What I've got at least limits itself to generating the unique combinations of dice results, rather than generating a data set that's 10^12 in size, for example -- it goes for the 'weight' notion that an earlier poster suggested to make this work. I think I got it right. :)

Anyway, in case this is of use:

The script: http://www.iago.net/pigmath.pl.txt
The output: http://www.iago.net/pigmethod-2d10.txt

bowlingm · April 27, 2003, 02:14:50 AM

Iago,

There must be a bug somewhere in your program. In order to prove that one million die rolls isn't a big deal, I hacked the C program I used to generate various probabilities for the Sorcerer mechanic. I ended up with different probabilities for the first table than you gave in your results. So I tried to analytically compute just one entry.

Consider the first entry. Probability of rolling a 2 when taking the two worst dice from a 10d10... Well, we can easily caclulate the probability of not rolling a 2 when taking the two worst. It would mean you didn't roll two 1's on those dice. The probability of not rolling any 1's is 0.9^10. The probability of rolling exactly one 1 is (0.9^9 * 0.1 * 10). Combined that equals 73.6%, which means your first entry should be 26.4%, not 22.2% as you reported.

You can have a look at my output... http://reef.coral.cs.cmu.edu/~mhb/iago.txt

It only does the 2 worst side, not the 2 best, since you can just flip the table to get the other case. Probability of rolling a 20 with the 2 best is equal to the probability of rolling a 2 with the 2 worst. It did 10^6 rolls for each table. Computing the confidence intervals, this comes out to the true probability being within 0.2% of the estimate at 95% confidence.

BTW, it took under 16s to compute the entire table on my 700mhz machine.

Mike

The Forge Archives

News:

Stats Heads: What's the math?

iago

bowlingm

iago

Mike Holmes

iago

Mike Holmes

iago

Walt Freitag

J B Bell

M. J. Young

iago

bowlingm