I'm trying something a bit new for this forecast (and soon a couple of other forecasts with a numerical modelling component): sharing public squiggle models for my forecasts. You can find my model here.
Some of the thought process:
- Probably reasonably well modelled via Poisson distribution. However, it's not clear what the rate should be.
- It's not necessarily the case that the rate will remain constant throughout the year, but it seems like a reasonable simplifying assumption, especially early on in the forecast window
- I've considered a few different ways of predicting the rate for this year. For example: a judgemental prediction, assuming the rate is uniformly likely to be equal to any of those in the past 4 years, assuming the rate is likely to be equal to one of those in the past four years in proportion to the likelihood of this year's data, assuming the rate follows a gamma distribution, etc. See the squiggle page for a bit more detail.
- The biggest question is how we should treat the year 2022/23 which had just 14 entries. What was going on this year? I'm not sure that it actually reflects a downtick in German language propaganda, but could perhaps reflect a gap in German language coverage by the EUvsDisinfo team. Since either of these possibilities would influence the resolution of this question, we shouldn't just ignore it.
- When weighing up my different models, I have given more weight to those which I think deal with 22/23 more reasonably in my opinion.
My 'inside view' here is that since it's an election year in Germany, we should expect an increase in both the resources EUvsDisinfo focusses on German coverage, as well as possibly increased attempts to influence the German election via media. I'd put a ballpark on between 0 to 20% more detections than in a normal year. I just apply this to the predictive distribution rather than the rate parameter distribution for simplicity.
The numbers the model give seem intuitively about right, if maybe a little high on the >90 bucket.
I'll be updating this model with new data throughout the window, and probably making tweaks and revisions. I welcome any suggestions/feedback.
Another alternative approach I may consider later is to predict the number of articles in any language, and then predict the fraction we should expect to be German.
Adjusting downwards for time. Seems like there are still lots of research positions open in Beijing: whether within MRSA or not is difficult to tell, but especially with just 2 months to go it seems unlikely we'll get an announcement. And if I were Microsoft, I might wait until after the next US administration takes power to make a decision like this.