Monty Testing Framework

Raw Statistics for test 69d3c8dc7c5b3724e29d2712

Unless otherwise specified, all Elo quantities below are logistic.

Context

Base TC	10+0.1
Test TC	10+0.1
Book	UHO_Lichess_4852_v1.epd
Threads	1
Base options	Hash=16
New options	Hash=16

SPRT parameters

Alpha	0.05
Beta	0.05
Elo0 (normalized)	0.0
Elo1 (normalized)	4.0
Batch size (games)	64

Draws

Draw ratio	0.42188
Pentanomial draw ratio	0.39062
DrawElo (BayesElo)	159.82

SPRT bounds

	Logistic	Normalized	BayesElo	Score
H0	-0.000	0.000	0.000	0.50000
H1	2.716	4.000	3.650	0.50391

Note: normalized Elo is inversely proportional to the square root of the number of games it takes on average to detect a given strength difference with a given level of significance. It is given by logistic_elo/(2*standard_deviation_per_game). In other words if the draw ratio is zero and Elo differences are small then normalized Elo and logistic Elo coincide.

Pentanomial statistics

Basic statistics

Elo	-43.6578 [-75.5244, -15.4045]
LOS(1-p)	0.00114
LLR	-0.5491 [-2.9444, 2.9444]

Generalized Log Likelihood Ratio

Logistic (exact)	-0.53772
Logistic (alt)	-0.52194
Logistic (alt2)	-0.55961
Normalized (exact)	-0.54912
Normalized (alt)	-0.54012

Note: The quantities labeled alt and alt2 are various approximations for the exact quantities. Simulations indicate that the exact quantities perform better under extreme conditions.

Auxilliary statistics

Games	256
Results [0-2]	[13, 37, 50, 25, 3]
Distribution	{0.00: 0.10156, 0.25: 0.28906, 0.50: 0.39062, 0.75: 0.19531, 1.00: 0.02344}
(DD,WL) split	(0.17969, 0.21094)
Expected value	0.43750
Variance	0.05762
Skewness	-0.01324
Excess kurtosis	-0.50187
Score	0.43750
Variance/game	0.11523 [0.09080, 0.13967]
Stdev/game	0.33946 [0.30133, 0.37372]
Normalized Elo	-63.97
LLR jumps [0-2]	[-0.022039, -0.015047, -0.004307, 0.014302, 0.054441]
Expected overshoot [H0,H1]	[0.07676, 0.00000]

Trinomial statistics

Note: The following quantities are computed using the incorrect trinomial model and so they should be taken with a grain of salt. The trinomial quantities are listed because they serve as a sanity check for the correct pentanomial quantities and moreover it is possible to extract some genuinely interesting information from the comparison between the two.

Basic statistics

Elo	-43.6578 [-77.5803, -11.9361]
LOS(1-p)	0.00341
LLR	-0.5014 [-2.9444, 2.9444]

Generalized Log Likelihood Ratio

Logistic (exact)	-0.49412
Logistic (alt)	-0.49350
Logistic (alt2)	-0.50818
Normalized (exact)	-0.50141
Normalized (alt)	-0.49350
BayesElo	-0.49709

Note: BayesElo is the LLR as computed using the BayesElo model. It is not clear how to generalize it to the pentanomial case.

Auxilliary statistics

Games	256
Results [losses, draws, wins]	[90, 108, 58]
Distribution {loss ratio, draw ratio, win ratio}	{0.00: 0.35156, 0.50: 0.42188, 1.00: 0.22656}
Expected value	0.43750
Variance	0.14062
Skewness	0.20833
Excess kurtosis	-1.20139
Score	0.43750
Variance/game	0.14062 [0.12523, 0.15602]
Stdev/game	0.37500 [0.35388, 0.39499]
Normalized Elo	-57.91
LLR jumps [loss, draw, win]	[-0.012289, -0.001792, 0.014718]

Comparison

Variance ratio (pentanomial/trinomial)	0.81944
Variance difference (trinomial-pentanomial)	0.02539
RMS bias	0.15934
RMS bias (Elo)	114.719