How COVID-19 changed our machine learning algorithms

The fight against coronavirus has data science at its very core. Only by analysing the data, can we understand how the virus spreads and predict potential outcomes of various actions (or inactions).

Data science is one of the most important tools available to governments and healthcare bodies around the world right now, but it’s also of vital importance to public transport operators that want to manage profitability, support key workers and plan for recovery.

Based on evidence from countries inflicted by coronavirus earlier than the UK, it was clear that passenger demand would drop quickly and significantly. The scale would be on a different level to anything UK transport networks had previously experienced.

Data science challenges

This introduced new challenges for CitySwift's data scientists. There was no historical data that could accurately inform our algorithms, and many of the metrics usually relied on by bus networks were no longer relevant.

To help our clients through the pandemic, we had to be able to provide accurate, relevant insights that would aid their decision-making in a constantly-changing environment. This would require new techniques and daily re-training of our machine learning models as additional data became available.

Our forecasts accurately predicted that the decline would bottom out at the end of March, with passenger demand averaging at less than 15% of the seasonal norm. Based on this ‘new normal’, we updated our deep learning architecture and set about qualifying the correlation between demand, run time and dwell time, along with external big data sources such as confirmed Covid-19 cases, news feeds and transport data from cities across Europe and Asia.

Journey times had the potential to become much shorter, due to the huge drops in traffic congestion and the lower number of passengers on/offboarding. However, without proper timetable adjustments, all that would happen would be a wasteful increase in dwell time as drivers waited at stops to avoid getting ahead of schedule.

Journey times had the potential to become much shorter, due to the huge drops in traffic congestion and the lower number of passengers on/offboarding.

An important consideration was the ability to support NHS staff, so our analysts examined passenger demand and ticketing data for all bus routes servicing hospitals. This provided insight into workers’ shift patterns. A simple switch to Sunday timetables would not necessarily ensure buses were available where and when they were most needed, so it was important for emergency timetables to take this information into account.

To support key workers, allow for social distancing on board (via lower target load factors) and save valuable operating hours, we encouraged clients to optimise their timetables using SwiftSchedule, which we updated to predict run times based on our new algorithms.

The future

As lockdown restrictions begin to lift, it will be important for clients to continue using SwiftSchedule on a regular basis, so that they can make incremental changes to their timetables based on the gradual re-growth in passenger demand and a corresponding drop in bus speeds as traffic levels start to return to pre-coronavirus levels.

From a data science perspective, the process of tagging dates with key events has been vital to our understanding of their impact, particularly given the short timeframes involved and magnitude of their effect. Key stimuli such as schools restarting; shops, bars and restaurants reopening; and staged worker reinstatement will all affect predicted passenger demand and runtime. 

It’s crucial that bus operators are ready to act on these changes. To ensure the right levels of passenger capacity are reintroduced at the right times, we will be incorporating recovery data sources from around the world into our predictive modelling.

To ensure the right levels of passenger capacity are reintroduced at the right times, we will be incorporating recovery data sources from around the world into our predictive modelling.

Looking forward, the work conducted (from home) during the coronavirus pandemic has enriched our machine learning capabilities, making our predictions more robust to extreme events and their impact better understood. Data modelled during this highly unusual period will act as baselines for fastest runtimes and lowest demand, with the insights gained held in memorandum as code, sweat and predictions.

Matthew Doodes is a Senior Data Scientist at CitySwift. He has previously helped Barclays predict fraud using big data and worked on a project to optimise the logistics of the UK’s fastest growing port and deep-sea container terminal.