It’s crazy how much TfL can learn about us from our mobile data (JamesO’Malley)

Big Brother meets Big Data

It’s a well known maxim in the tech industry that if you’re not paying for the product, then you are the product. We get to use incredible services like Gmail, Facebook and Twitter1 for free – and in return, the big tech firms sell access to our eyeballs to advertisers2.

But this isn’t always the case. Sometimes, even when we pay for a service, we’re also the product being sold.

For example, something that EE, O2 and Vodafone all do, but don’t really love to shout about is sell anonymised, aggregated data on our physical movements to local authorities, transit agencies and any other companies with a chequebook large enough.

And that’s why today I’m going to tell you about some of the really mad things that Transport for London (TfL) can figure out about us by using our location data, provided by the O2 mobile network.

Using the Freedom of Information Act, I’ve managed to obtain the Data Protection Impact Assessment, and the Statement of Work for TfL’s Project EDMOND – which stands for “Estimating Demand from Mobile Network Data”3.

That’s right, this week’s newsletter is dangerously close to being actual reporting instead of just my usual bloviating. And having now fallen down the rabbit-hole digging into it, I’m amazed by the quality of information it gives transport planners and policy makers. And honestly, I’m a little freaked out.

So let’s dive in and explore it together.

Careful now

The way EDMOND works is very clever. TfL isn’t actually monitoring all of our phones all of the time, presumably because it knows that to do so would be hugely controversial.

So instead, it contracts with O24 to license data over shorter periods of time. For example, in 2023, it took data from ‘up to’ 40 normal weekdays between the start of April and end of June, when nothing weird was happening like school holidays or bank holidays5.

This is an enormous dataset, with potentially up to 25 million phones included in it6, but it still doesn’t include everyone in London because some people use other networks like EE, Vodafone, and so on.

So it’s crucial to understand that EDMOND isn’t just a pile of data – it is a model, where TfL has taken the data from O2, and has done some clever maths to scale it up to estimate the the movements of everyone in London over the age of 12.

There is also the elephant in the room. Though it might be surprising to learn that O2 is selling data insights on its users, it is not selling personal data7. What’s being sold by O2 and licensed by TfL is aggregated, anonymised data.

This means TfL can’t see the movements of individual people, and of course everything is fully GDPR-compliant and above board – as you’d expect for a major corporation and a transport agency.

In fact, according to the 2018 Travel in London report, any time the data suggests there were fewer than ten phones in a given statistical area, the data was automatically excluded so to avoid inadvertently unmasking people based on their metadata.

So to be absolutely clear, there’s no big scandal here8. In fact, using this sort of data is increasingly routine for local authorities and others9. To the extent that O2 even has a brand name for this line of its business – “O2 Motion”.

But that doesn’t mean what’s happening isn’t interesting. In fact, I’m willing to bet that most people outside of the mobile industry are completely unaware their movement data is being used in this way.

What TfL knows

Now let’s get to the good stuff. What does all of this data do for TfL, and what data do they have to play with?

Because of the aforementioned privacy restrictions, they don’t simply get dots on the map show them where everyone was. Instead, the data is broken down into hundreds of “Medium Super Output Areas (MSOAs)” – this is a statistical standard that divides up the country into groups of between 2000 and 6000 homes.

Here’s a map showing London’s MSOAs:

Looking at this, you can see why data on this level might be useful.

Using the aggregated data from O2, TfL can see which areas of London people are travelling from and where they are travelling to – which is exactly the sort of information you might need if you were, for example, planning where to run buses or impose an Ultra-Low Emissions Zone that disincentivises car use.

Read on