COVID-19 has tested and transformed many elements of societies around the world. Some may be temporary, some perhaps not - in the nebula of a pandemic, much is uncertain. But what is clearer than ever is the importance of data guiding decision-making.
“Flatten the curve,” one of the most ubiquitous phrases and missions during the early stages of the pandemic, has put a bell curve front and center in the minds of the public. Every local government is sharing data, both hopeful and harrowing, real-time with their regional populations.
At Pecan, we employ no epidemiologists, so displaying data related to health information around COVID-19 would be irresponsible. At the same time, we know many of our clients are concerned about the impact of COVID-19 on their businesses, and as previously stated data is the clear guiding light throughout this crisis.
So we decided to test our predictive models with COVID-19 data related to economic recovery, specifically addressing the easing of lockdowns by region.
Several months into a pandemic, businesses are asking the question: how do I plan for an uncertain future? From the combined efforts of healthcare workers and data people around the world, the initial stage of the pandemic saw an unprecedented proliferation of critical public data, from the official World Health Organization COVID-19 data to John Hopkins University’s widely-used datasets.
Over the course of 2 days, we built a model using public data to predict which regions/markets in Europe were likely to soon start easing lockdowns.
Follow our process as an example of using predictive analytics to weather this crisis proactively.
Ideally, we would like to issue predictions for all cities and regions in Europe. However, for this kind of analysis limiting the scope of our initial investigations can be a highly valuable strategy in structuring Machine Learning models.
With that in mind, we chose to issue predictions to all main capitals in Europe, both to streamline our analysis and because capital cities are more likely to have COVID-19 data readily accessible.
Using the Pecan platform (see the snapshot above), we simply drag-and-dropped the table of European cities to the Entity field, checking the columns we want to use as identifiers and using simple SQL conditions for filtering the capitals.
Initially, we decided on predicting lockdown easing announcements one week ahead.
After considering many variables, we decided to focus on nitrous dioxide (NO2) pollution levels, as it generally translates indications of levels of quarantine., Moreover, pollutant measurements are easily accessible and can be later generalized to many cities in the globe, provided they have air quality measurements.
To double check, we compared a few other air quality indicators for a handful of cities. NO2 levels indeed correlate better to the COVID-19 spread timeline. For example see the graph below for different pollutants.
As it is clear above, pollution data contains a lot of noise as consequence from other variables’ interference that make it hard to check their overall evolution.
For instance, wind speeds might quickly disperse pollutants in the atmosphere; an observed day being a holiday might affect the number of cars releasing pollutants in the air, hence altering NO2 concentration. For that matter, it is important to connect all that data to our Entities in the model. Pecan automatically spots correlations and selects which features from the data are relevant for the prediction we are trying to achieve
Deducting noise (or the residual component) and the normal weekly pollution levels fluctuations (the seasonality component), we are left with a line that can be used to represent overall behaviour (or trend) in pollution levels across time.
The trend component then reveals an early declining pattern in NO2 levels for all cities above since the COVID-19 pandemic hit Europe (around Dec 2019 and Jan 2020). Even for Stockholm, Sweden, where a herd immunity approach was adopted over lockdowns, the trend component revealed a (lighter) decline.
On the Pecan platform, we simply created a connector to the original parquet file containing daily weather measurements for many cities worldwide, dragged-and-dropped the data onto the Target field, naming the no2_median column as our Label.
And that was it! We were ready to advance to the next step! No need for complex data cleaning, outlier detection, merging issues or other adjustments: the platform performs all that on the fly.
Since the start of the pandemic, Pecan has been collecting third party datasets that could help our clients enrich their models and boost their predictive capabilities. These datasets involve both directly COVID-related data (number of hospitalizations, deaths and so on) and indirectly-related data (pollution levels, weather, traffic, etc.)
Once having the Entity and Target defined (as we did in our previous steps), we connect them with one different data source (Attribute) at a time, checking for performance improvements and model integrity at each iteration.
After adding more than 10 different data sources, we felt confident enough to move on and check for preliminary results. Adding more data sources is then simply a decision of the user!
Once our model completed training, we can analyze the results for NO2 levels in the next week:
Good, so we have initial predictions for a week ahead in record time!
But different cities had different levels of pollution before COVID-19 started, so how can we have a single unified cross-city measurement to indicate economic activity? In other words, what is our baseline?
We turned to our pollutant levels measurements dataset, and calculated the median daily NO2 emissions for each city in December 2019 as a simple baseline, indicating “normalcy” in a right-before COVID-19 scenario. Then, we divided our predicted values by that baseline.
The result is: the closer our predicted values are to 1, the more “normal” they are. In other words, quarantine is likely to be eased.
In order to get rid of noise in predictions, we calculated the median of emissions throughout all days of the coming week.
See below a preliminary table of results for our model, issuing predictions at 2020-04-19 for a week after (2020-04-26):
Due to our accuracy levels, relying on specific ranking placement or to-the-digit values might not be a fruitful analysis: instead, looking at each city or section in the ranking in bulk provides valuable insights. Indeed, we were able to corroborate predicted values with information in the media:
The top 3 cities announced measures to quickly relieve lockdown: Helsinki lifting roadblocks, Tallinn opening schools and healthcare services and Athens being praised by the way they dealt with the lockdown.
The bottom 3 cities announced measures extending lockdown: Zagreb having the misfortune of a big earthquake during lockdown announced extending lockdown and studying ease, Paris kept the lockdown facing riots and Madrid facing lockdown extensions on top of being hit hard by COVID-19.
In times of uncertainty, analyzing data is more important than ever. With the Pecan platform, our customers can enrich their predictive models with COVID-19 data by uncovering hidden correlations, patterns and trends that could affect aspects of their business (regardless of how subtle they might be).
Not in recent memory has planning for the very next month seemed so critical. As we saw in our example, even one week of foresight can help businesses prepare in a turbulent economic landscape that changes day by day. It’s this kind of data-driven adaptability both businesses and governments need to employ to hasten the economic recovery globally.