Researchers have developed predictive models of flu-like activity that provide real-time estimates of flu activity and accurate forecasts of flu-like illness levels, according to Science Daily.
“There are many data sources and models that can be used to predict flu-like symptoms in the population,” said study lead author Mauricio Santillana, PhD, of Boston Children’s Computational Health Informatics Program and the Harvard John A. Paulson School of Engineering and Applied Sciences. “But our question was, if we have many models each predicting flu activity, do we gain anything by combining them?”
Santillana and John Brownstein’s, PhD, team started with four separate now-casting models of flu-like illness activity, each fed aggregated, anonymized, national-level data from one of four sources: a) search data from Google; b) Twitter data; c) near-real time clinical data from electronic health record (EHR) manager athenahealth; and d) crowd-sourced flu data from Flu Near You, a participatory surveillance system developed by HealthMap. In an approach similar to that used by weather forecasters to predict hurricane tracks, the team then used machine-learning techniques to generate a set of “ensemble” models that incorporated the results produced by the other four single-source models.
To determine their ensemble models’ accuracy and robustness, Santillana and Brownstein’s team compared their results to those of each of the four real-time source models, as well as both CDC’s historical flu-like illness reports and GFT-based now-casts from the 2013-14 and 2014-15 flu seasons. The ensemble models not only outperformed their four real-time source models, but when compared to CDC’s historical flu-like illness reports, generated better forecasts of both the timing and the magnitude of flu-like illness activity at each time horizon measured (“this week,” “next week,” “in two weeks”) than models that rely on historical information only.
The ensemble predictions also accurately tracked CDC’s reports of actual flu activity, with near perfect correlation (0.99 Pearson correlation) for real time estimates and slightly smaller correlation (0.90 Pearson correlation) at the two-week time horizon.