We failed to predict the second wave

but the models may still be useful

15 min readApr 13, 2021

Throughout this pandemic, numerous modelling teams have been working on projecting trends for the COVID-19 pandemic. One common prediction by almost all modelling groups in India was that, from roughly January 2021 onwards, the number of daily cases and fatalities would be very low. Our modelling team from the Tata Institute of Fundamental Research, Mumbai, released a report in October 2020 where one of our primary predictions was that Mumbai would achieve some form of herd immunity around February 2021 if the city were to open up in November 2020 (there were additional caveats but the projections were broadly the same; details further on in the article).

We were happy to see the numbers keep decreasing from November 2020 onwards to very low figures in January 2021. However, the situation changed quite dramatically from mid-February 2021. Maharashtra saw a sharp rise in COVID-19 cases, and almost every other state followed shortly after that. In Mumbai (the city I follow most closely), the reported daily cases hit close to 12,000 on 4th April 2021. To put this in perspective, prior to this recent trend, the highest reached was about 2,800 (on 8th October 2020). Not all of this can be attributed to just increased testing, as the test positivity also rose from about 2–3% in January 2021 to close to 30% in the recent trend (if we ignored RATs, the positivity rate was even higher). On 11th April 2021, Mumbai recorded 58 COVID-19 deaths.

None of the models, ours including, came anywhere close to predicting the steep rise in numbers that we have been seeing the past few weeks. We all got it wrong. There are criticisms of specific models in the public domain but much of the criticisms apply to all models.

What went wrong? What did we miss? Given the current situation, is the criticism that modelling efforts were just a data-fitting exercise (or as some would say “fiddling with knobs”) justified? Is there something we can learn from this?
If COVID-19 modelling is to be more than an intellectual exercise, particularly if it is to be used to assist public policy, it is important for all of us working with COVID-19 epidemiological models to reflect on this deeply. All of us must accept, in all humility, that there are many things that we got wrong. This article is an attempt to articulate what possibly went wrong in these modelling efforts and address the following questions:

What is the purpose of these simulators?
Do we know what went wrong with the simulations? If so, why did it go wrong?
Is there still a redeeming feature to these simulations?

The purpose of a simulator

Dr. Mukund Thattai (NCBS, Bengaluru) once mentioned, in a panel on modelling efforts, that a simulator is nothing but a tool to obtain the logical consequences of a set of assumptions (also called a ‘model’).

Every simulator is essentially just that. We start with a model which is a set of assumptions on the disease dynamics are that is believed to be a “reasonable” simplification of reality. A simulation of the model informs us of the consequence of the model assumptions. The primary purpose of a simulator and its projections is to assist local authorities in making policy decisions.

Typically models are under-determined systems. This means that there are a lot more parameters of the model than available data, due to which these parameters cannot be uniquely determined even if all observed data was accurately available. Since the system is under-determined, a simulator tries to “fit” to observed data by setting the parameters of the model so that the simulator output is close to the observed data. These parameters are then used to project trends for the future.

We may already encounter some key issues:

There are multiple possible choices of parameter values that could all “appear” to be equally good with respect to the data being fit to.
The underlying assumption that the model dynamics “adequately” capture the actual behaviour could be wrong.
Even if the assumed model dynamics adequately capture reality at a certain point, they may not be applicable at a different point in time if the reality itself evolves.

Any modeller should try and address these issues honestly. In order to do that, it is important to at least recognise that there are issues in the first place. In the present situation, given the unprecedented rise in cases, it is clear that these issues exist in all modelling efforts. But it may be possible that even if these issues exist, the model may be able to say something useful. This is particularly tricky, and it is easy to fool oneself and forget that there are issues with the model that need to be addressed.

What did the models get wrong, and why?

Let us return to the recent trend of substantially high numbers of cases and fatalities all across the country. None of the simulators predicted anything of this sort. Therefore, whatever were the assumptions in the simulators, they all failed to reliably incorporate the cause for the current dynamics. What went wrong?

There are multiple potential “explanations” for this. We could be seeing a more infectious strain. Or maybe we are looking at possible re-infections. Or maybe people are a lot more relaxed with respect to safety measures.

Which of these is it? We honestly don’t know at this point. The recent press release by the Ministry of Health and Family Welfare seems to give some credence to the first conjecture. There are multiple reported cases of reinfections, lending some evidence to the second conjecture.

It may however be that this increase is almost entirely due to behavioural patterns of individuals, in which case there is something fundamentally wrong about the model predicting the dynamics of the population. In our report from October 2020, it was projected that Mumbai would likely reach herd immunity by February 2021. More precisely, the report predicted that even if the city were to completely open up from November 2020 onwards (all workplaces operate at 100% attendance, and only that people continue to wear masks and practice home-isolation on showing symptoms at the same compliance rate), the rise in infections would be mild enough that the city’s medical infrastructure will be sufficient to absorb this. (The report also considered scenarios where the city opened up later on, but the rise in infections were projected to be even lower in those cases.) The current trend then shows that this projection was wrong, and quite likely one or more of the issues stated above is to blame.

However, if the presence of a new strain, or the possibility of reinfections, is a significant part of the “explanation”, then it may be the case that simulator is still in a position to provide useful predictions if we can reliably capture the new disease dynamics. At the moment, given that we do not know what the correct explanation is, any modelling effort will be making additional assumptions on what these new dynamics are. The conclusions obtained will be correspondingly more uncertain.

Modelling: Over-simplifying vs over-fitting

At the end of the day, every model makes assumptions about what is a good simplification of reality, and then chooses parameters that best fit the available data. This data is often just three time series — cases, recovered, fatalities. Therefore, the initial calibration is indeed turning a few “knobs” (parameters) to fit to observed data. Once this calibration is done, you use these parameters to make “predictions” for the future.

However, as mentioned earlier, most models often have a lot more knobs than data to fit to and so one can fit the model to essentially any data. Therefore, it is important to explicitly mention what the calibrated parameters are and, if subsequent adjustments were made, provide justifications for each of those adjustments.

Having more parameters is a double-edged sword. It allows the model to have more nuance but it also makes it easier to fit to data. If there are numerous different ways to fit to data, which of the conclusions (if any) is reliable? To better illustrate this issue, and especially understand the modelling efforts for COVID-19, we need to look a little deeper into how these models and simulation efforts typically work.

Compartmental models

The SIR model is the simplest of models for a disease spread within a population. The SIR model assumes that the population can be neatly split into 3 compartments — susceptible, infective and removed — and eventually boils down to just two parameters often called β and γ . The parameter β captures the likelihood of interactions, and also the transmissibility of the disease from one individual to another. The parameter γ is the removal rate, captures the chance that a person currently infected becomes non-infectious within a fixed period of time (say, the person recovers or dies).

One usually fits such a model to data by, for instance, setting the infective compartment to equal the active reported cases, removed compartment to equal the number of recovered and fatalities, and the susceptible compartment to be everyone else. One then chooses the best choice of β and γ and the initial values of S, I and R to best explains the data observed so far. This sets up the model, which can then be used for predictions.

In this set-up, since there are so few parameters, we usually do not have the problem of an under-determined system. However, this simplistic dynamics is probably not a reliable simplification of the disease progression for an extended period of time. This may be due to a variety of reasons — interaction patterns may be far from homogeneous, or there may be interventions such as lockdowns that change interaction patterns within a population, or there may be improvements in medical care that cause changes in the removal rate, or there may be changes in the number of people tested, etc.

There are many views taken to try and address some of the shortcomings listed above. One approach is to create more compartments. For instance, in the case of COVID-19, it seems clear from various serological surveys that there is a large population of people who get exposed to the disease but show no symptoms and are unlikely to contribute to the reported cases at all. Or maybe individuals, after being infective, do not develop symptoms right away and they stay dormant (possibly silently infectious) for a significant period. Adding more compartments to SIR models results in models such as SEIR (with E being the ‘exposed’ category) or maybe further gradations based on severe or mild symptoms and so on.

The other issue is also the fact that interaction and recovery patterns may not be the same throughout a long duration as there may be “interventions” added by local authorities. One way in which many compartmental models address this is by fragmenting the observed data into shorter time frames and using the best fit for each time frame.

Agent-based models

Agent-based models are on the extreme side of compartmental models where we essentially have a compartment for each individual with their own characteristics. In an agent-based model, the simulator creates an agent for each different individual keeping in mind certain factors relevant to the disease spread. The model introduces an interaction graph where different agents can meet at certain points (say schools, workplaces, community centres, public transport etc.) and each such interaction is a potential for an infectious person to infect a susceptible person.

The primary advantage of this approach is that it allows additional heterogeneity in how disease spread may actually happen. It allows one to study targeted interventions (say, what is the effect of of reducing attendance in workplaces to 20%). These can be incorporated into an agent-based simulator with a little additional work.

Agent-based simulators are therefore more flexible than those based more closely on the SIR model. However, agent-based models come with a host of other issues. For instance, everything from our current understanding suggests that age and presence of co-morbidities do influence the disease progression. But to incorporate this into the simulator, then one needs to feed specific numbers for this — numbers that are often not available! Often one relies on a handful of studies in a completely different context and uses that data to build the age-based transition matrix that is fed into the simulator. Public transportation might have a strong influence on the transmission. But, does the simulator have reliable data on commute patterns in the city? It may be the case that high density areas such as the slums in Mumbai probably see a faster spread as a result of being densely packed, but different publicly available data sources give very different estimates of what fraction of Mumbai stays in these high-density areas. Hence, yes, the agent-based simulator could potentially make use of this nuance, but data at that level of granularity may not be available. These are all places where a model makes additional assumptions.

In other words, the issue of having an under-determined system is a much larger issue here. Agent-based simulators have so many knobs (parameters) that one can potentially fit anything by setting appropriate parameter values. It is therefore a lot more important to identify potential red flags in agent-based models because it is so much easier to fit to data.

(Another issue is also the fact that agent-based simulators tend to involve a much more complicated codebase compared to a simulator for a compartmental model, and hence there is a lot of scope for programming errors as well.)

Choice of parameters in under-determined systems

Both the models mentioned above involves working with under-determined systems. Whatever are the choices for these model parameters, a key question is whether they “make sense.” This is where the domain knowledge becomes very relevant. A person without domain knowledge may only be able to spot clear red flags to say that a certain choice of parameters do not make sense. Large deviations in parameters are one such example of red flags. For instance, say the fit in a month had a much higher recovery rate compared to the next month. Unless there appears to be a plausible explanation for this sudden drop in recovery rate, this is a clear red flag. Or, say, we know that stricter measures were imposed during a period but the best-fit β value seems to be larger than earlier — this is a red flag.

Another way of spotting a red flag is via sensitivity analysis. If one were to change one of the parameters very slightly, without substantially affecting the fit to observed data, does that lead to a substantial change in predictions? If two “equally good” parameter settings lead to drastically different conclusions, we cannot have much confidence in the conclusions.

Even if we are unable to spot any red flags, that does not mean that model or the predictions are correct. There is just no way, without substantial domain knowledge, to confidently assert that parameter choices make sense. Models hence go by some parameter choices that seem reasonable, while continuing to look for potential red flags.

What modellers should do

With the potential concerns when dealing with such under-determined systems, modellers can take steps towards being transparent about the simulator. This would be a step towards alleviating some of the legitimate concerns.

Release the source code for the simulator with as much detail and documentation as possible.
This would make it easier for others to be able to spot unnoticed red flags, and also gives an opportunity for others to run the simulations for themselves.
Be precise about what it is that is being modelled (is it daily cases, or fatalities, or hospitalisation/ICU requirement, or something else?), what the model assumptions are, and for what time period the model assumptions is presumed to remain valid.
If there are factors that seem relevant (say, the possibility of reinfections, or presence of more infectious strains) but are not taken into account in the model (possibly due to lack of data) state them clearly as assumptions made in the model.
Include sensitivity analysis of the principal parameters of the model.
When possible, list some explicit falsifiability criteria — what are observations that will indicate that there are issues in the modelling assumptions?
Significant deviation of newly observed data from the simulator prediction is an indication that the model is no longer adequately capturing reality. One should not be hasty to tune additional parameters to fit the new trend. Doing so, without sufficient justification, would lead to over-fitting as alluded to earlier.
In instances when such changes to the model are justifiable, explicitly mention what changes were between the versions along with justifications for making these changes.
Uncertainties in model parameters translate to uncertainties in the predictions. The predictions would be reliable only as long as the assumptions capture the real dynamics. One must be cognisant of these internal uncertainties. The confidence with which the predictions are reported must be calibrated appropriately.
Reach out to field experts when possible. They may be able to identify red flags that went unnoticed, and also offer an alternate perspective from their experience on the ground.

Addressing some common criticisms

Below are some criticisms on modelling efforts that, I believe, are important to address.

Your model failed to predict this recent wave. You even predicted herd immunity by February 2021.
Herd immunity is a nuanced term that is often dynamic in nature. It roughly refers to a threshold fraction of people already recovered from the disease when a population, under normal behaviour, starts seeing fewer daily infections due to the presence of sufficiently many immune people within the population, thus slowing down the spread of the disease. The presence of a more infectious strain, or the possibility of reinfections would certainly change this threshold.
If the presence of a more infectious strain, or reinfections are significant causes for the recent spikes, then the model may still be able to provide useful projections if these new dynamics are incorporated.

On the other hand, if the cause the recent spike is merely due to people returning to normal behaviour, then the model assumptions leading to such herd immunity claims are invalidated.
You keep updating your predictions as and when new data arrives.
This is an absolutely valid criticism and models that do this should be called out. Every single time a model needs to be “re-fit” to data, it risks overfitting to data and thus reduces the confidence in its predictions. Sometimes these updates may be warranted if there is a compelling justification for incorporating the change in dynamics. But a model that does this too freely is waving a red flag.
Why should I trust “predictions” from people with no real epidemiological expertise?
As discussed earlier, domain expertise — be it epidemiological, or virological, or in public policy — is very important to make sure that the model is based on “reasonable” assumptions. The confidence in the model and its predictions should take into account the extent of inputs from domain experts.
However, this is one of many criteria. Does the modelling team have experience with simulations? How transparent are the modellers about the parameters of the model? How accurate is the model in predicting what it set out to do? These are some criteria that could be used to evaluate a model (and others were mentioned in the previous section).
It should be noted that merely having the simulator output match observed data does not mean that the model assumptions accurately capture reality. Misplaced confidence while making predictions from simulations, without adequate checks as described above, only adds to the distrust.
How can you be sure that this is due to a more infectious strain, or due to reinfections?
From my limited understanding, the current data available is insufficient to substantiate these conjectures. We will have to wait until there are some studies shedding light on these conjectures.
There is also a deeper issue when it comes to models incorporating these. Even if reinfections or variant strains play a substantial role in the recent trend, until we have quantitative studies on how strong these effects are, any model incorporating these effects is forced to make additional assumptions. These additional uncertainties translate to uncertainties in the predictions as well.
How can you make recommendations for certain interventions, such as closure of workplaces, without taking into account economic or sociological costs of such actions?
This is a very valid criticism. Non-pharmaceutical interventions such as lockdowns will slow the spread of an epidemic like COVID-19, but there several costs, both economic and social, that are associated with it. A report by the World Health Organisation noted that the effects of extended lockdowns were visible in various health care services such as routine immunisations, cancer diagnosis and treatment, tuberculosis detection and treatment, antenatal care, to name just a few.
Simulations focus on projecting health care requirements for COVID-19 patients alone. Estimating the economic and sociological costs of interventions are often outside the scope of these simulations and models. Projections for hospitals and ICU requirements for COVID-19 patients alone are valuable inputs to policy makers to enable them to make necessary arrangements. However, public policy cannot be based only on these as the other costs alluded to above need to be taken into account as well.
Any simulation that provides “recommendations” must clearly indicate that they are purely with respect to the metric of hospitalisation requirements by COVID-19 patients only, and not taking into account other economic and sociological costs of the interventions studied. It is up to the local authorities to eventually make policy decisions after taking all these factors into account.

Closing remarks

We all make our well-intentioned efforts to be of assistance in these difficult times. Moments like these serve as a good reality check to reflect critically on these efforts. It would be a tragedy to let the moment pass without this introspection.

It may be that these models can be fixed when we have a better understanding of what the underlying cause for the deviation is. When we do, we may be in a position to incorporate them in the model and give useful advice.

Or maybe these models can’t be fixed and the flaw is more fundamental. If the model cannot be redeemed, it is what it is and we should be willing to admit this openly.

The basic axiom of the scientific method is “falsifiability”. If there are no reliable ways in which we can say “there are issues with the model”, then we are not doing science. If we can’t even see now that there are issues to be addressed, we probably never will.

About me: I am one of the members of the IISc-TIFR City-Scale Agent-based COVID-19 Epidemic Simulator team and have been working on this project since March 2020. I am a theoretical computer scientist and have no background in epidemiology. My skills from the perspective of this simulator, if anything, are only in designing and coding the dynamics of the disease spread and heuristics for potential interventions. I acknowledge everyone who gave feedback on this article, including my team members. The views expressed in the article however are my own.