Vishal Misra - July 29, 2020

Video Category 1:

Vishal Misra, PhD, Professor, Dept. of Computer Science, Columbia University “A synthetic controls analysis of post Memorial Day COVID outbreak in the US”

Synthetic controls is an empirical methodology for causal inference using observational data. In this analysis, a synthetic version of the treatment unit is created via untreated donor units linearly and the correlation between different units, i.e. synthetic and actual data is exploited. Virus reached different areas at different times, and for the purposes of COVID-19 analysis, time series aren’t aligned in terms of absolute dates. Therefore, the data can be aligned based on lockdown dates provided by IHME (Institute for Health Metrics and Evaluation). Using this approach, New York cases and deaths can be predicted utilizing Western Europe as a donor pool for creating synthetic versions; after 5 days of training, 120 days of predictions can be created accurately. In addition, it can analyze counterfactual analysis to predict how it would affect the number of cases and deaths if lockdown measures were implemented earlier. The spread of COVID-19 cases in the United States has 4 different patterns depending on the geography; they created cluster regions according to how mobility changed after lockdown and regions were clustered according to post-lockdown mobility data. To perform post-Memorial Day analysis, the model was trained for a county based on a donor pool of counties with intervention date of Memorial Day, the donor pool was filtered to only include counties at similar stages of COVID-19 spread and synthetic model of county with actual behavior was compared. The reasons for a deviation from predictions, for instance, predictions in Florida or Georgia, might be due to AC usage, implementing mask usage or herd immunity, i.e. when enough people are infected, the spread is slower. Overall, synthetic control analysis is a powerful technique to do counterfactual predictions enabling quick analysis of policy choices and number of cases/deaths. It can have a wide range of applications from analyzing the impact of tariffs on consumer goods, optimizing the drug discovery/clinical trials process, predicting sports scores to predicting network traffic.