When half a million participants signed up to UK Biobank, a major health database containing genetic and medical information from volunteers, they thought they were giving up access to their data to help improve academic research and to unlock new insights into how we get ill — and, crucially, how we can prevent it. So it was a shock when researchers discovered attempts to sell data on third-party platforms, including Chinese online marketplaces. The controversy has raised broader concerns about consent, oversight, and the blurred line between public interest research and commercial use.
UK Biobank’s data wasn’t directly identifiable, but critics argue that the incident risked undermining public trust in large-scale health data projects. “When people originally signed up for UK Biobank, they were told that data was for non-profit research, and then they found that it was sold to industry,” says Luc Rocher, associate professor at the Oxford Internet Institute, who has tracked the repeated posting of Biobank data online outside the ways those who signed up might have expected. “So it’s really about setting a line and telling people that this line will be held.”
For its part, Professor Sir Rory Collins, CEO and principal investigator at UK Biobank, said in a statement when it was revealed that data was being sold on Chinese marketplaces that it was “a clear breach of the contract signed” by several academic institutions who had their access suspended as a result. “We are sorry that this incident has occurred and hope you are reassured by the swift and decisive action we have taken,” Collins added.
The incident was damaging — to UK Biobank, and to the broader cause of digitising healthcare data in order to try and improve the speed at which people can be treated within the NHS. The linking up of healthcare data and attachment to an identity is an important solution to one of the key stumbling blocks of the present-day NHS: continuity of care. A patient who arrives unconscious at A&E shouldn’t have to rely on memory, family members or a delayed GP letter for doctors to know they are allergic to a drug. Such an approach has widespread public support, with 95 per cent of patients comfortable with data being used to improve individual care.
NHS England has plans for a Single Patient Record, announced in the King’s Speech yesterday, that will bring medical history, test results, treatments and prescriptions into one place, accessible through the NHS app from 2028. A similar approach underpins the NHS Federated Data Platform, which NHS England says is already live in 123 hospital trusts and is being used to coordinate theatres and waiting lists. Yet it is feared that incidents like the UK Biobank brouhaha could challenge public perception of the NHS’s data transformation. Some 83 per cent of people trust the NHS to keep their data secure, according to a May 2024 survey by the NHS.
That trust is jeopardised by Biobank’s issues, and the drip-drip of headlines including NHS England giving staff from places like Palantir, the US software company founded by controversial democracy sceptic Peter Thiel, access to patient data while working on parts of the Federated Data Platform. Both NHS England and Palantir have said they are accessing data strictly in line with policies put in place.
“Few people would dispute the benefits of lawful and responsible use of data to drive health improvements and efficiencies,” says Jon Baines, senior data protection specialist at law firm Mishcon de Reya. “And few could argue that UK Biobank does not present continuing huge potential benefits for the NHS and for health research more widely.”
Perhaps it’s about accepting that the process of digitising sensitive healthcare information was never going to be seamless. “It is crucial that all involved are aware not just of the risks, but of the technological and legal complexities,” says Baines.
Finding the data gold standard
Giving that access without breaking people’s privacy isn’t impossible, Rocher points out. “There are programmes around the world that allow researchers to access very sensitive, but also very useful, data, like financial records and healthcare data in a lot of countries, and often they’re not in the news because there’s no security breach of the system.” Rocher believes we need to look at those gold standard approaches and try to understand what they’re doing differently to us — and adopt the elements of their model that work.
“Those who allow their data to be used for research must be able to trust that their rights will be safeguarded”
Rebuilding trust will also be crucial to ensure that people believe in the potential of linking up their data across different health services, says Baines. “Those who allow their data to be used for research must be able to trust that their rights will be safeguarded — as long as that trust can be achieved, then the NHS Data Strategy should not be too threatened by the concerns over exposure of Biobank information.”
Rocher says a safe model is one where approved researchers can analyse sensitive datasets without being able to download the original files. Within the NHS, traceable systems that record who accessed which record, when, and for what reason would be a similar approach. By proving that paper trail and showing access can be trusted, it’s possible to win back what might have become a sceptical public after a publicly embarrassing incident.
“People are not stupid,” Rocher says. “The public can see the difference between a good scheme and a scheme that has poor security practices.”