January 3-16 2022
Large-scale studies over-estimate vaccination coverage
In the fight against COVID-19, public health officials rely on surveillance data that is either collected physically (during a visit to the doctor, for example) or through on-line surveys. Small-scale studies carry the risk of targeting an unrepresentative sample of the population, while large-scale surveillance is potentially more useful, since it enables, through specific statistical analyses, study of the behaviour of subgroups in the population in order to apply targeted measures (to non-vaccinated sectors of the community, for example). Because of their size, these large-scale analyses (Big Data) are supposed to reduce the risk of errors.
To test this premise, British and American researchers (Oxford, Stanford and Harvard Universities) looked at estimates of 1st dose vaccine uptake in the United States between January and May 2021 (“have you been vaccinated?”, “do you want to be vaccinated?” etc).. They analysed the results of 3 on-line surveys: Delphi-Facebook (4,5 million respondents), Census Bureau’s Household Pulse (600 000 respondents), and Axios-Ipsos (10 000 respondents). In addition, the Center for Disease Control (CDC) collected vaccination data from different States and local health centres to serve as a baseline for the study.
However, the results turned out to be extremely variable. Compared to the CDC data, which estimated numbers of vaccinated people at 53%, Delphi over-estimated by 17%, Census by 14% and Axios by 4,2%. These studies had different recruitment methods, which introduced some selection bias into the estimates. This went far beyond the statistical errors that the size of the studies was supposed to avoid, and the results did not reflect the real behaviour patterns of the population.
The smallest study was also the most reliable. The researchers re-validated their observations using 3 additional small- to medium-sized studies (Data For Progress, Morning Consult, Harris Poll), 2 of which still diverged widely from those of the CDC.
The studies analysed here all suffer from “Big Data Paradox”. Large-scale studies, which minimise the risk of certain errors, tend, paradoxically, to amplify small biases (on which less attention was focused) as the data size increases. In this case, a study involving 250 000 respondents per week is not more reliable than a study that uses only 10 respondents taken at random. If the initial data is not set up correctly, then the error cannot be compensated by a large amount of data.
In the light of the importance of this type of study, investing in improvements in the quality of data seems more sensible than minimising errors by increasing the study size. In this case, the erroneous estimation of 70% vaccination coverage could have justified politicians relaxing public health measures, possibly leading to a new wave of the epidemic in certain States.