3 days popular7 days popular1 month popular3 months popular

Forecasting diseases using Wikipedia

Analyzing page views of articles could make it possible to monitor and forecast diseases around the globe, according to research published in .

Dr and her team from successfully monitored influenza outbreaks in the United States, Poland, Japan and , dengue fever in Brazil and , and tuberculosis in China and .

The team was also able to forecast all but one of these outbreaks (tuberculosis in China) at least 28 days in advance. The results suggest that people start searching for disease-related information on Wikipedia before they seek medical attention.

The paper shows the potential to transfer models across different regions; that is, one can “train” a computer model using public health data in one location and implement the model in another region. For example, researchers could create models using data from Japan to track and forecast disease in Thailand. This is particularly important for countries that do not offer reliable disease data.

Sara Del Valle says: “A global disease-forecasting system will change the way we respond to epidemics. In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today’s forecast. The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal.”


Global Disease Monitoring and Forecasting with Wikipedia, Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R, PLoS Comput Biol, doi:10.1371/journal.pcbi.1003892, published 13 November 2014.

This work is supported in part by NIH/NIGMS/MIDAS under grant U01-GM097658-01 and the Defense Threat Reduction Agency (DTRA), Joint Science and Technology Office for Chemical and Biological Defense under project numbers CB3656 and CB10007. Data collected using QUAC; this functionality was supported by the U.S. Department of Energy through the LANL LDRD Program. Computation used HPC resources provided by the LANL Institutional Computing Program. LANL is operated by Los Alamos National Security, LLC for the Department of Energy under contract DE-AC52-06NA25396. Approved for public release: LA-UR,14-22535. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

The authors have declared that no competing interests exist.