In the first article in this series we took a look at how other models used to make publically available predictions for sports events are normally performing.
The second article focused on how we obtained Pinnacle’s odds and how we could use this odds to measure the performance of our biathlon model.
The third article included performance data and some brief thoughts on the performance.
This fourth and final article will conclude the series and will discuss the performance in more depth.
At first glance it may appear peculiar that a somewhat new website like Sportindepth can have a sport model with results far superior to FiveThirtyEight. However, even if our results, some 7 to 23 percent yield when measured against the opening odds, appears very impressive when compared with FiveThirtyEight’s models, which lost more than could be expected by betting random outcomes. There are several reasons why things are not so straight forward.
Firstly the sample used to measure FiveThirtyEight’s model is huge and clearly sufficient to do the job. The sample used to measure our biathlon model is small and clearly insufficient for drawing any firm conclusions.
Secondly the sports FiveThirtyEight’s models are making forecasts for are the most turnover intensive betting sports in the world. It should be obvious that beating the betting markets making predictions for such sports is infinitely harder than if you make predictions for a sport like Biathlon which, even as it is rather popular in Northern Europe, will only get a tiny fraction of the turnover of the sports FiveThirtyEight is producing forecasts for.
Thirdly the article which dealt with FiveThirtyEight’s performance was measuring solely against the closing odds, which we know is harder to beat. We don’t know how FiveThirtyEight’s models would have performed if they were tested on the opening rather than the closing odds.
Because of the above it should be clear that this really is an example of comparing apples to oranges. Even if our model has produced superior results when measured against the betting market, we cannot really use this to say anything definitive about which of the models are better.
Luckily our ambitions at the start was never to compare our model to any other models, but rather to try to discuss the claim that our model’s predictions were “not very good” and to test if it really is correct that our simulations are “incapable of beating the betting markets”. This we have done to the best of our abilities and even if the sample size, at the moment, is just too small to make firm conclusions, the indications are that the predictions of our model are very good and that they somewhat likely are capable of beating the markets.
So we have established that the samples used are too small for drawing firm conclusions and that it is somewhat likely that our model is good enough to beat the betting market.
As we are not fully satisfied with these conclusions and would have preferred to produce something more definitive we will round off this article series, by calculating, in the form of a P-value, how often selecting the outcomes to bet at random would have produced a similar or better results for each of the four scenarios we discussed in the previous article. In addition to this, we will provide you with the possibility to simulate the result of picking the bets randomly. Random betting should on average produce a loss similar to the margin employed by Pinnacle of some 6.50%.
The data from the first tab of our Google doc, “Opening odds 0% min. value” has a P-value of about 0.12. This means that random betting would about 12 percent of the time deliver a result equal or better than the results of our model.
Below you can see result data and graphs for both the betting history of our model and one iteration of random betting. The betting history of our model will always remain the same, but every time you hit the simulate button, which located to the left and below the graphs, a new set of results will appear for the simulated data. This simulation should enable you to get a good “real-life” feel for how likely the results of our model are to be caused simply by luck. To run another iteration of the simulated results, just click on the “simulate” button.
The data from the second tab of our Google doc, “Opening odds 5% min. value” has a P-value of between 0.06 and 0.05. This means that random betting would about five to six percent of the time deliver a result equal or better than the results of our model.
Below you can see result data and graphs for both the betting history of our model and one iteration of random betting. The betting history of our model will always remain the same, but every time you hit the simulate button a new set of results will appear for the simulated data. To run another iteration of the simulated results, just click on the “simulate” button.
The data from the third tab of our Google doc, “Closing odds 0% min. value” has a P-value of about 0.20. This means that random betting would about one time in five deliver a result equal or better than the results of our model.
Below you can see result data and graphs for both the betting history of our model and one iteration of random betting. The betting history of our model will always remain the same, but every time you hit the simulate button a new set of results will appear for the simulated data. To run another iteration of the simulated results, just click on the “simulate” button.
The data from the fourth tab of our Google doc, “Closing odds 5% min. value” has a P-value of about 0.25. This means that random betting would about one time in four deliver a result equal or better than the results of our model.
Below you can see result data and graphs for both the betting history of our model and one iteration of random betting. The betting history of our model will always remain the same, but every time you hit the simulate button a new set of results will appear for the simulated data. To run another iteration of the simulated results, just click on the “simulate” button.
That it seems not unlikely that our model is able to beat the betting market is impressive. However, we are not at all surprised by our findings. After all we know very well how much work has gone into making the model and how complex it truly is. If we had understood at the start how many factors we would need to incorporate to have something making quality predictions and how much work this would be chances are that we would never have started building it. Literally many thousands of working hours has been spent getting the model as good as it currently is.