The internet is wonderful. After my last post on the Scholz et al. replication of Bueno de Mesquita's (BDM) political forecasting model, I got an email from Jeremy McKibben-Sanders who had independently replicated the same model at about the same time. His code is a lot cleaner and more elegant than mine, and -- more importantly -- actually replicated the model. Between his code, and generously being provided the original code by Dr. Jason Scholz and the Australian Defence Science and Technology Organization, I was able to fix my own implementation.
One thing that the replication process made clear to me was how even small differences between my code and the original led to significant changes in the model behavior, which helped lead to the following analysis.
I have a lot of open questions about BDM's model, as well as the Scholz et al. replication. Jeremy's code comments highlight many of them. But one thing that bothered me was that model, like BDM's published results, provides a point forecast: for any set of inputs, there is exactly one deterministic output, with no attempt to introduce or quantify any uncertainty. Yet understanding, measuring and conveying uncertainty are some of the most important parts of political forecasting.
There are a few possible sources of uncertainty in the model. For example, it includes each actor's probability of successfully challenging other actors, even though the actual outcomes of those challenges end up being deterministic. For now, though, I look at another source of uncertainty: the input values. To use a BDM-type model, we need to map some real-world characteristics of the actors onto numbers. Sometimes these are more or less straightforward; in the example in the Scholz paper, each actor's desired outcome is number of years before certain regulations go into effect. But most of the time, the real-world property is fuzzy. Just how much power does an actor have? How much do they care about an issue? This is, ultimately, an estimate -- and as such, has some uncertainty attached to it. Are we sure that a given actor really cares about an issue at 0.4, not 0.41 or 0.39? How sure are we?
It might be the case that it just doesn't matter that much. If slightly different actor attributes lead to more or less the same results, rough estimates might be good enough. On the other hand, if it turns out to matter a great deal, it raises significant questions about the accuracy of the model, and particularly its point forecasts.
Fortunately, with the working replication code in hand, this is a relatively easy thing to check. I decided to focus on the salience parameter, since it seems to be the most subjective. I start with the example scenario given in the Scholz et al. paper, and add noise in order to perturb each actor's salience by a small percentage. The noise is drawn from a normal distribution, such that:
With the perturbed saliences, I run the model for 10 steps, as in the original, and at each step record the median position, which is the one with the greatest support. I repeat this analysis 100 times, with new random perturbations initializing each iteration. Below are box-and-whiskers plots showing the distribution of median positions at each step, along with a black line representing the median positions traced starting from the original, unperturbed parameters.
In fact, it appears that these small perturbations matter a great deal. So much so, that the 'original' final position is actually an outlier in the distribution of possible final positions.
Suppose that the BDM model, as implemented here, is actually an accurate model of the world -- and that the only uncertainty comes from the salience estimates. If this were the case, and we wanted to use the model to make a prediction, the plot above suggests that the point prediction from the estimated values would actually be pretty unlikely. Only by explicitly modeling the uncertainty can we get a distribution of final outcomes, which in turn we can use to find the most likely ones.
I know that I'm not the first person to notice this. The Scholz et al. paper notes that "concerns with regard to the model’s sensitivity and convergence have been identified as areas of further work," though I haven't seen anything they've published on it. Someone who also replicated the model in a proprietary context told me that they discovered the same thing. I also suspect that the proprietary version of the BDM model that's used to consult for corporations and the CIA involves some sort of uncertainty measure. But as far as I can tell, this uncertainty isn't indicated in any of their publicity material.
Forecasting is hard. Building a general forecasting model, which can work for any given political sub-system with only a few numeric parameters and produce plausible behavior and results, is hard. If nothing else, this should be a reminder of that, and of the importance of not looking for the one 'correct' prediction. Articulate your assumptions (including in the form of code), estimate your uncertainty, and if possible provide a distribution of outcomes.