Topic: Help with dice statistics
Started by: coxcomb
Started on: 2/10/2005
Board: Indie Game Design
On 2/10/2005 at 10:41pm, coxcomb wrote:
Help with dice statistics
I know there are people on this forum who are gifted with advanced math skills. Any help is appreciated!
So the end result of every roll is 3d6, each die result being taken individually. The average roll per die is [1+2+3+4+5+6 / 6] which comes out to 3.5. That part is easy (unless I have been wrong all these years).
The difficulty is that players may be rolling more than three dice and selecting the highest three or the lowest three. At this point my math fails me. If it were a straight 4d6, then the average per die is still 3.5, right? But when you are selecting for high or low the average must be affected, right?
Any help?
Thanks in advance.
On 2/11/2005 at 3:29am, Brendan wrote:
RE: Help with dice statistics
I could swear the first edition DMG had some bell-curve tables that addressed this issue, in the section on character generation, but I don't have mine anywhere close at hand. Does anybody else?
On 2/11/2005 at 3:40am, coxcomb wrote:
RE: Help with dice statistics
Brendan wrote: I could swear the first edition DMG had some bell-curve tables that addressed this issue, in the section on character generation, but I don't have mine anywhere close at hand. Does anybody else?
I think those just showed the curve for 3d6 all added together, which is a snap to figure out even for math-challenged me.
What I'm looking for is specifically the effect of rolling more dice and selecting the highest or lowest on the average roll per die.
On 2/11/2005 at 4:15am, Brendan wrote:
RE: Help with dice statistics
Yeah, I thought they also showed the way the curve would lean if you did the drop-lowest-die method of attribute generation. Maybe I'm wrong.
At any rate, this got stuck in my head and I couldn't figure out an analytical way to solve it, because I'm bad at math. So I wrote a couple of PHP scripts to do it for me, taking advantage of the astounding redundancy produced by nested for loops; I'd be happy to reproduce the code if anybody wants it, but I don't want to fill this thread without provocation.
Anyway, you can see the results, all rolls listed and final totals at the bottom. I checked these a few times, but it's entirely possible I missed something--I'd be happy to be corrected.
Dropping the lowest die: http://www.xorph.com/dice_experiment_min.php
Dropping the highest: http://www.xorph.com/dice_experiment_max.php
It looks like die-dropping weights the per-die average about .58 either way.
On 2/11/2005 at 4:31am, coxcomb wrote:
RE: Help with dice statistics
Cool, thanks!
If nobody can give me a happy algorithm for this, I'll just write a little program to do as you ahve done for various numbers of dice.
On 2/11/2005 at 6:23am, Eero Tuovinen wrote:
RE: Help with dice statistics
I've had this ready for hours now, but the web's been stormy... anyway, here's the analytical take. I'm not native, so my math terminology may be a little wonky. I do play a mathematician in tv, though. These should be reasonably correct, but it's not as if I'd care enough to doublecheck :D
This might seem a little messy, as the forum doesn't do mathematical symbols. S means summing over the given variables, [x over y] means the number of x-subsets in y (a basic combinatorics function, you'll be needing a calculator or spreadsheet for this anyway). I'm also lazy enough not to find a calculator, so you get to calculate the numbers yourself.
First, the number of equal combinations: for a pool of n dice there's A(n)=S[r=1, n][r over n] different subgroups of the dice. This number is important, because that's the number of different instances of a particular value being the highest in the rolled pool. Second, the number of accumulating combinations: for a pool of n dice of at most value m there's B(n,m)=m^n different combinations. This number is important, because it's the size of the probability space.
The probability of a n die pool rolling at most m is clearly (m/6)^n, which we'll use to our benefit. The probability of the highest die in a n die pool having value m is P(m)=(A(n)/B(n,m))(m/6)^(n-1). (That's because the former term is the percentage of instances with at least one die in a given value wíthin the "less than m" space, while the latter is the percentage of instances where no die is over m in the whole result space. Thus we get the probability of all dice under m, at least one die at m.)
Now that we have a probability function, we can calculate the expected value for the highest die of a n die pool: E(n)=S[m=1,6](P(m)*m).
For the second highest die to be at m the probability is P(m)=(A(n-1)/B(n-1,m))(m/6)^(n-1)((7-m)/6), which is the same calculation, but with the assumption that the highest die value is already at least m. The expected value is calculated in the same way as with the highest die.
For the third highest deduct one point more from n and add another term to represent the second die being already over m (effectively squaring the last term in the above case).
When looking for the lowest values it's the same calculations, but with the complementing values. Most effective to just take the symmetrical values in relation to 3.5.
Basic probabilities, dear Watson. In general, discrete probability problems are easiest to solve with brute force and consideration of whole probability spaces - ask yourself how to represent the subset of wanted results numerically as compared to the space of all possible results, and you have an equation.
On 2/11/2005 at 2:13pm, coxcomb wrote:
RE: Help with dice statistics
Um...thanks Eero
I'm afraid I failed my comprehend math roll while reading your reply. Can you, or someone else, clarify for the simple-minded?
On 2/11/2005 at 6:53pm, shaheddy wrote:
RE: Help with dice statistics
Eero, your method is incorrect. Try both the highest and second highest formulas for 2d3, m=2. In both cases, the probability comes out to 1/3, but your formulas yield 1/6 and 2/9 respectively. You have two errors. The first is that in your first formula, you take into account the possibility that all the dice turn up the maximum, but then multiply by (m/6)^(n-1). The second is that in your second formula, you assume that the highest die is "marked", ie a distinguishable color. This is also what makes this problem so difficult! Also, by the way, A(n) is just 2^n - 1. You say either a die is in your group or not, which gives 2^n, and then you subtract off the empty group.
Here's an alternate explanation for the first formula, calculating the maximum value of n dice. Let's say p(m) is the chance that one particular die turns up m. Clearly that's 1/6. Let's define c(m) as the chance that one particular die turns up m or less. That's clearly m/6. Notice that p(m)=c(m)-c(m-1). (By the way, c stands for cumulative.)
Now we roll n dice. The chance that all of them turn up m or less, which I'll call C(m), is just the product of the probabilities that each of them turns up m or less, so in this case c(m)^n, or (m/6)^n. Then the chances that the maximum is exactly m is just P(m)=C(m)-C(m-1)=(m/6)^n - ((m-1)/6)^n.
The expected value is then the weighted average of all possible values. You have to sum m^(n+1) over all values of m, for which there are some annoying formulas out there.
Anyway, your original problem is quite difficult for the reason I mentioned above - that is, that the highest die is not marked. That makes any recursion really messy. Though I'm probably going to continue looking for a better solution myself, I'm not confident of finding anything.
On 2/11/2005 at 6:54pm, Eero Tuovinen wrote:
RE: Help with dice statistics
[Edit: crossposted with Shadeddy, and I agree with him on all counts. This one should be better, though.]
OK... let's try it again. I think I made a few mistakes in that one, anyway. That's what you get by doing math at night. This one should be simpler, I try to not use the combinatory function.
A="At least one die is at m."
B="All dice are at most m."
The die pool is n dice. Question: What is P(A and B), the probability of both happening?
A and B are dependent: if we know that B happens, that makes A somewhat more likely. Specifically:
The probability of B is P(B)=(m/6)^n. This is easy, because it's just n independent dice rolling at most m.
The probability of A when B has happened is P(A|B)=1-(1-1/m)^n. This is because A|B is the complement of C(A|B)="All dice roll under m, when they already are at most m.", which is the same as the dice actually only having m sides.
The probability rule of multiplication has it that P(A and B)=P(B)P(A|B). Thus we get P(A and B)=[(m/6)^n][1-(1-1/m)^n]. This is the probability of the highest die value being m in a n die pool. Simple, ne?
When we know that, we can calculate the expected value: Calculate P(A and B) for each value from 1 to 6, and multiplicate probability with the value in question. Sum together, and you have the expected highest roll in a pool of n dice.
Are you with me so far? If you understood that, I can next explain how to do the second-highest and third-highest.
On 2/11/2005 at 7:30pm, shaheddy wrote:
RE: Help with dice statistics
Eero, this looks like an excellent idea. I think I can see where this is going, and I'm definitely interested in seeing the rest of your calculation.
Also, the original question was (as I understood it) to come up with a closed form for the expected value in general. Do you see any way to do this? My own thought is that it will be best to approximate with a continuous probability, and use integrals.
On 2/11/2005 at 7:45pm, Eero Tuovinen wrote:
RE: Help with dice statistics
shaheddy wrote:
Also, the original question was (as I understood it) to come up with a closed form for the expected value in general. Do you see any way to do this? My own thought is that it will be best to approximate with a continuous probability, and use integrals.
Actually, I integrated yesterday like nobody's business, but decided then that that isn't worthy of us. Discrete solutions or death!
As for a generic solution, it's coming along. The expected value depends on the number of dice rolled, and the place of the given die in the order. We can calculate the expected value for the highest, second highest and so on, so a general solution for "the expected value for the three highest dice" is simple: it's just the average of the three singular values we get by the method here.
The solution will have pretty many calculations to it, though, that cannot be avoided. I suggest making a spreadsheet out of it: then you can just feed it the number of dice and let the program do the rest.
But let's let coxcomb think through the above partial solution before continuing with the elaborations.
On 2/11/2005 at 11:02pm, Walt Freitag wrote:
RE: Help with dice statistics
Meanwhile, here are some exact statistics for d6 pools, using brute force summation of all combinations (not monte carlo). Y'all can use these to check your formulas against.
The figures given are average per die; to get the actual average roll multiply by the number of dice kept.
roll 2 keep highest 1 4.472222
roll 3 keep highest 1 4.958333
roll 3 keep highest 2 4.229167
roll 4 keep highest 1 5.244599
roll 4 keep highest 2 4.672068
roll 4 keep highest 3 4.081533
roll 5 keep highest 1 5.430941
roll 5 keep highest 2 4.965085
roll 5 keep highest 3 4.476723
roll 5 keep highest 4 3.982735
roll 6 keep highest 1 5.560292
roll 6 keep highest 2 5.172239
roll 6 keep highest 3 4.757930
roll 6 keep highest 4 4.336120
roll 6 keep highest 5 3.912058
roll 7 keep highest 1 5.654117
roll 7 keep highest 2 5.325730
roll 7 keep highest 3 4.967585
roll 7 keep highest 4 4.600689
roll 7 keep highest 5 4.230292
roll 7 keep highest 6 3.859020
- Walt
On 2/12/2005 at 2:21am, Grand_Commander13 wrote:
RE: Help with dice statistics
... O_O
THANK YOU WALT!!!
No, not for the statistics; I already have a program that generates the distribution of different ways for rolling dice. Thank you for confirming that my statistics are right! I always wondered...
Anyway, my program's report for 4d6, keep highest 3:
[code]3: 1 0.0772%
4: 4 0.3086%
5: 10 0.7716%
6: 21 1.6204%
7: 38 2.9321%
8: 62 4.7840%
9: 91 7.0216%
10: 122 9.4136%
11: 148 11.4198%
12: 167 12.8858%
13: 172 13.2716%
14: 160 12.3457%
15: 131 10.1080%
16: 94 7.2531%
17: 54 4.1667%
18: 21 1.6204%[/code]
On 2/12/2005 at 3:04am, coxcomb wrote:
RE: Help with dice statistics
Well, ashamed as I am to admit it, the math being slung around above will not penetrate the density of my skull.
I guess I'll just write a program that brute force all the statistics I need. Unless someone with both math-fu and explain-to-the-dim-jutsu can help.
[edited for typo]
On 2/12/2005 at 3:32am, Grand_Commander13 wrote:
RE: Help with dice statistics
Well, my Java program got those stats by brute-forcing them. It was really annoying to write it. You think it'd be so simple. If only arrays could be dynamically sized...
On 2/12/2005 at 3:53am, Eero Tuovinen wrote:
RE: Help with dice statistics
OK, once more, slowly...
Above I derivated the expected value of the highest die in a n die pool. The probability function was P=[(m/6)^n][1-(1-1/m)^n]. This is how you calculate the expected value for n=3:
P(1)=(1/6^3)
P(2)=[(2/6)^3][1-(1/2)^3]
P(3)=[(3/6)^3][1-(2/3)^3]
P(4)=[(4/6)^3][1-(3/4)^3]
P(5)=[(5/6)^3][1-(4/5)^3]
P(6)=[(6/6)^3][1-(5/6)^3]
Now, the expected value is calculated by multiplying each probability with the value it's the probability for. That's what you do in the one die case as well: that calculation you did in the first post, remember? The divisor, 6, was there because each of those results has 1/6 probability in a die roll. You just did the summing first and divided after that to get the result. Here we do the divisions first, because the probabilities of the different results are different. So:
E=P(1)+2P(2)+3P(3)+4P(4)+5P(5)+6P(6), where E is the expected value. You can check the calculation against Walt's values. This is really simple to do with a spreadsheet.
Now, the second die:
A="At least one die is at k."
B="All dice are at most k, except the highest."
C="Highest die is at least k."
The die pool is still n dice. Question: What is P(A and B and C), which is the probability for the second-highest die to hit k? Event C is necessary because otherwise it wouldn't be the second-highest die.
All three probabilities are dependent on each other. The probability of C is simple, because we already calculated it: P(C)=[sum of P(k)...P(6) above]. Probability of B happening if C happens is simple, too, because it's the same as probability of event B in the last case, except with a pool one die smaller. Thus P(B|C)=(k/6)^(n-1). (The one die is already assumed to be higher than k and out of the calculation.)
Probability of A when B and C happen is again the same case as with the highest die, the pool is just a little smaller. Thus P(A|B and C)=1-(1-1/k)^(n-1)
By the multiplication rule I mentioned earlier we get P(A and B and C)=P(A and B|C)P(C)=P(A|BC)P(B|C)P(C). This is again a readily calculable probability function, which we can use to find the expected value, just like with the highest die.
The third die can be found in the same manner. As you can see above, the calculation itself is just a matter of picking the already calculated components from the earlier calculation. Perhaps shaheddy derivates that one for kicks ;)
And the grail we're after, the expected average value of the three highest dice, is just a matter of averaging between the three results once they're in. After that we have a general, albeit long, function for a pool with n dice.
But anyway, coxcomb: if you'll tell us more specifically at which point you're getting lost, we can perhaps explain it in more detail. It's difficult to explain these things when we don't know your background on math. Alternatively, we can just produce the results if you tell us how big die pools you're using... or you can write that program, it's not like it'd produce any worse results than the analytical approach!
On 2/12/2005 at 7:01am, Brendan wrote:
RE: Help with dice statistics
Grand_Commander13 wrote: Well, my Java program got those stats by brute-forcing them. It was really annoying to write it. You think it'd be so simple. If only arrays could be dynamically sized...
That, my good man, is what PHP is for. (Or vectors.)
I'm working on a generalized version of the scripts I posted earlier, which should let you have the per-die average for any number and sides of dice.