News:

Forum changes: Editing of posts has been turned off until further notice.

Main Menu

Help with dice statistics

Started by coxcomb, February 10, 2005, 10:41:00 PM

Previous topic - Next topic

coxcomb

I know there are people on this forum who are gifted with advanced math skills. Any help is appreciated!

So the end result of every roll is 3d6, each die result being taken individually. The average roll per die is [1+2+3+4+5+6 / 6] which comes out to 3.5. That part is easy (unless I have been wrong all these years).

The difficulty is that players may be rolling more than three dice and selecting the highest three or the lowest three. At this point my math fails me. If it were a straight 4d6, then the average per die is still 3.5, right? But when you are selecting for high or low the average must be affected, right?

Any help?

Thanks in advance.
*****
Jay Loomis
Coxcomb Games
Check out my http://bigd12.blogspot.com">blog.

Brendan

I could swear the first edition DMG had some bell-curve tables that addressed this issue, in the section on character generation, but I don't have mine anywhere close at hand.  Does anybody else?

coxcomb

Quote from: BrendanI could swear the first edition DMG had some bell-curve tables that addressed this issue, in the section on character generation, but I don't have mine anywhere close at hand.  Does anybody else?

I think those just showed the curve for 3d6 all added together, which is a snap to figure out even for math-challenged me.

What I'm looking for is specifically the effect of rolling more dice and selecting the highest or lowest on the average roll per die.
*****
Jay Loomis
Coxcomb Games
Check out my http://bigd12.blogspot.com">blog.

Brendan

Yeah, I thought they also showed the way the curve would lean if you did the drop-lowest-die method of attribute generation.  Maybe I'm wrong.

At any rate, this got stuck in my head and I couldn't figure out an analytical way to solve it, because I'm bad at math.  So I wrote a couple of PHP scripts to do it for me, taking advantage of the astounding redundancy produced by nested for loops; I'd be happy to reproduce the code if anybody wants it, but I don't want to fill this thread without provocation.

Anyway, you can see the results, all rolls listed and final totals at the bottom.  I checked these a few times, but it's entirely possible I missed something--I'd be happy to be corrected.

Dropping the lowest die:  http://www.xorph.com/dice_experiment_min.php

Dropping the highest:  http://www.xorph.com/dice_experiment_max.php

It looks like die-dropping weights the per-die average about .58 either way.

coxcomb

Cool, thanks!

If nobody can give me a happy algorithm for this, I'll just write a little program to do as you ahve done for various numbers of dice.
*****
Jay Loomis
Coxcomb Games
Check out my http://bigd12.blogspot.com">blog.

Eero Tuovinen

I've had this ready for hours now, but the web's been stormy... anyway, here's the analytical take. I'm not native, so my math terminology may be a little wonky. I do play a mathematician in tv, though. These should be reasonably correct, but it's not as if I'd care enough to doublecheck :D

This might seem a little messy, as the forum doesn't do mathematical symbols. S means summing over the given variables, [x over y] means the number of x-subsets in y (a basic combinatorics function, you'll be needing a calculator or spreadsheet for this anyway). I'm also lazy enough not to find a calculator, so you get to calculate the numbers yourself.

First, the number of equal combinations: for a pool of n dice there's A(n)=S[r=1, n][r over n] different subgroups of the dice. This number is important, because that's the number of different instances of a particular value being the highest in the rolled pool. Second, the number of accumulating combinations: for a pool of n dice of at most value m there's B(n,m)=m^n different combinations. This number is important, because it's the size of the probability space.

The probability of a n die pool rolling at most m is clearly (m/6)^n, which we'll use to our benefit. The probability of the highest die in a n die pool having value m is P(m)=(A(n)/B(n,m))(m/6)^(n-1). (That's because the former term is the percentage of instances with at least one die in a given value wíthin the "less than m" space, while the latter is the percentage of instances where no die is over m in the whole result space. Thus we get the probability of all dice under m, at least one die at m.)

Now that we have a probability function, we can calculate the expected value for the highest die of a n die pool: E(n)=S[m=1,6](P(m)*m).

For the second highest die to be at m the probability is P(m)=(A(n-1)/B(n-1,m))(m/6)^(n-1)((7-m)/6), which is the same calculation, but with the assumption that the highest die value is already at least m. The expected value is calculated in the same way as with the highest die.

For the third highest deduct one point more from n and add another term to represent the second die being already over m (effectively squaring the last term in the above case).

When looking for the lowest values it's the same calculations, but with the complementing values. Most effective to just take the symmetrical values in relation to 3.5.

Basic probabilities, dear Watson. In general, discrete probability problems are easiest to solve with brute force and consideration of whole probability spaces - ask yourself how to represent the subset of wanted results numerically as compared to the space of all possible results, and you have an equation.
Blogging at Game Design is about Structure.
Publishing Zombie Cinema and Solar System at Arkenstone Publishing.

coxcomb

Um...thanks Eero

I'm afraid I failed my comprehend math roll while reading your reply. Can you, or someone else, clarify for the simple-minded?
*****
Jay Loomis
Coxcomb Games
Check out my http://bigd12.blogspot.com">blog.

shaheddy

Eero, your method is incorrect. Try both the highest and second highest formulas for 2d3, m=2. In both cases, the probability comes out to 1/3, but your formulas yield 1/6 and 2/9 respectively. You have two errors. The first is that in your first formula, you take into account the possibility that all the dice turn up the maximum, but then multiply by (m/6)^(n-1). The second is that in your second formula, you assume that the highest die is "marked", ie a distinguishable color. This is also what makes this problem so difficult! Also, by the way, A(n) is just 2^n - 1. You say either a die is in your group or not, which gives 2^n, and then you subtract off the empty group.

Here's an alternate explanation for the first formula, calculating the maximum value of n dice. Let's say p(m) is the chance that one particular die turns up m. Clearly that's 1/6. Let's define c(m) as the chance that one particular die turns up m or less. That's clearly m/6. Notice that p(m)=c(m)-c(m-1). (By the way, c stands for cumulative.)

Now we roll n dice. The chance that all of them turn up m or less, which I'll call C(m), is just the product of the probabilities that each of them turns up m or less, so in this case c(m)^n, or (m/6)^n. Then the chances that the maximum is exactly m is just P(m)=C(m)-C(m-1)=(m/6)^n - ((m-1)/6)^n.

The expected value is then the weighted average of all possible values. You have to sum m^(n+1) over all values of m, for which there are some annoying formulas out there.

Anyway, your original problem is quite difficult for the reason I mentioned above - that is, that the highest die is not marked. That makes   any recursion really messy. Though I'm probably going to continue looking for a better solution myself, I'm not confident of finding anything.

Eero Tuovinen

[Edit: crossposted with Shadeddy, and I agree with him on all counts. This one should be better, though.]

OK... let's try it again. I think I made a few mistakes in that one, anyway. That's what you get by doing math at night. This one should be simpler, I try to not use the combinatory function.

A="At least one die is at m."
B="All dice are at most m."
The die pool is n dice. Question: What is P(A and B), the probability of both happening?

A and B are dependent: if we know that B happens, that makes A somewhat more likely. Specifically:

The probability of B is P(B)=(m/6)^n. This is easy, because it's just n independent dice rolling at most m.

The probability of A when B has happened is P(A|B)=1-(1-1/m)^n. This is because A|B is the complement of C(A|B)="All dice roll under m, when they already are at most m.", which is the same as the dice actually only having m sides.

The probability rule of multiplication has it that P(A and B)=P(B)P(A|B). Thus we get P(A and B)=[(m/6)^n][1-(1-1/m)^n]. This is the probability of the highest die value being m in a n die pool. Simple, ne?

When we know that, we can calculate the expected value: Calculate P(A and B) for each value from 1 to 6, and multiplicate probability with the value in question. Sum together, and you have the expected highest roll in a pool of n dice.

Are you with me so far? If you understood that, I can next explain how to do the second-highest and third-highest.
Blogging at Game Design is about Structure.
Publishing Zombie Cinema and Solar System at Arkenstone Publishing.

shaheddy

Eero, this looks like an excellent idea. I think I can see where this is going, and I'm definitely interested in seeing the rest of your calculation.

Also, the original question was (as I understood it) to come up with a closed form for the expected value in general. Do you see any way to do this? My own thought is that it will be best to approximate with a continuous probability, and use integrals.

Eero Tuovinen

Quote from: shaheddy
Also, the original question was (as I understood it) to come up with a closed form for the expected value in general. Do you see any way to do this? My own thought is that it will be best to approximate with a continuous probability, and use integrals.

Actually, I integrated yesterday like nobody's business, but decided then that that isn't worthy of us. Discrete solutions or death!

As for a generic solution, it's coming along. The expected value depends on the number of dice rolled, and the place of the given die in the order. We can calculate the expected value for the highest, second highest and so on, so a general solution for "the expected value for the three highest dice" is simple: it's just the average of the three singular values we get by the method here.

The solution will have pretty many calculations to it, though, that cannot be avoided. I suggest making a spreadsheet out of it: then you can just feed it the number of dice and let the program do the rest.

But let's let coxcomb think through the above partial solution before continuing with the elaborations.
Blogging at Game Design is about Structure.
Publishing Zombie Cinema and Solar System at Arkenstone Publishing.

Walt Freitag

Meanwhile, here are some exact statistics for d6 pools, using brute force summation of all combinations (not monte carlo). Y'all can use these to check your formulas against.

The figures given are average per die; to get the actual average roll multiply by the number of dice kept.

roll 2 keep highest 1  4.472222
roll 3 keep highest 1  4.958333
roll 3 keep highest 2  4.229167
roll 4 keep highest 1  5.244599
roll 4 keep highest 2  4.672068
roll 4 keep highest 3  4.081533
roll 5 keep highest 1  5.430941
roll 5 keep highest 2  4.965085
roll 5 keep highest 3  4.476723
roll 5 keep highest 4  3.982735
roll 6 keep highest 1  5.560292
roll 6 keep highest 2  5.172239
roll 6 keep highest 3  4.757930
roll 6 keep highest 4  4.336120
roll 6 keep highest 5  3.912058
roll 7 keep highest 1  5.654117
roll 7 keep highest 2  5.325730
roll 7 keep highest 3  4.967585
roll 7 keep highest 4  4.600689
roll 7 keep highest 5  4.230292
roll 7 keep highest 6  3.859020

- Walt
Wandering in the diasporosphere

Grand_Commander13

...  O_O

THANK YOU WALT!!!

No, not for the statistics; I already have a program that generates the distribution of different ways for rolling dice.  Thank you for confirming that my statistics are right!  I always wondered...

Anyway, my program's report for 4d6, keep highest 3:
3: 1 0.0772%
4: 4 0.3086%
5: 10 0.7716%
6: 21 1.6204%
7: 38 2.9321%
8: 62 4.7840%
9: 91 7.0216%
10: 122 9.4136%
11: 148 11.4198%
12: 167 12.8858%
13: 172 13.2716%
14: 160 12.3457%
15: 131 10.1080%
16: 94 7.2531%
17: 54 4.1667%
18: 21 1.6204%

coxcomb

Well, ashamed as I am to admit it, the math being slung around above will not penetrate the density of my skull.

I guess I'll just write a program that brute force all the statistics I need. Unless someone with both math-fu and explain-to-the-dim-jutsu can help.

[edited for typo]
*****
Jay Loomis
Coxcomb Games
Check out my http://bigd12.blogspot.com">blog.

Grand_Commander13

Well, my Java program got those stats by brute-forcing them.  It was really annoying to write it.  You think it'd be so simple.  If only arrays could be dynamically sized...