Opening Clinical Trial Data Vaults

One and a half years ago, this blog has already discussed some problematic aspects of making patient-level clinical trial data available to anybody beyond the segment that can see them today — i.e., to parties other than the sponsor who pays for their generation, the statisticians analyzing them, and the regulatory authorities who base their decisions on them. Matters have taken their course since then.

“Clinical Data Transparency” Is Near

Mandatory clinical data disclosure is definitely going to happen, under the “transparency” and “Open Data” labels. Those who have designed a clinical study and generated the data, at very considerable cost to them alone, will no longer own their data. There are too many powerful parties interested in getting their hands on this treasure trove of information: healthcare providers; insurance companies; activists with various agendas; lawyers serving these parties (and themselves); and of course the media (also serving themselves). Obviously the motivations differ wildly, but the combined thrust of these players swings an awful lot of power in societies such as ours.

Very few companies, such as AbbVie and Roche, are still fighting rearguard actions. Most others – not only the Biggies but also niche players such as Leo Pharmaceuticals — have resigned themselves to the view that there is nothing to be won by resisting, and hope that some momentum can be salvaged by taking the lead in organizing data disclosures on their own terms before being forced to do so. “This shows the power of the media and academia, the BMJ and the Cochrane group,” bragged Tom Jefferson, an epidemiologist with the Cochrane Collaboration which has a reputation for taking the stance that the majority of drugs is essentially worthless.

All Data Can Be Leaked Or Hacked…

Sure, whatever compulsory disclosure is going to be implemented will come with administrative checks and balances. Not everybody will be given access to all raw data, and only anonymized data will be released. But what does clinical data security mean in todays world of hackers and cybersurveillance – in a world where the NSA listens in on the German chancellor’s “secure” phone and monitors the Mexican and Brazilian president’s “secure” email, stopping only when confronted with proof beyond all deniability? In a world where third-rating government contractors such as Mr. Snowden, or army corporals such as Mr. (now Mrs.) Manning, can copy the most sensitive files, take them home, and disclose them to the media?

Once patient-level clinical study data are distributed beyond what passes as “secure datarooms” – lets say, into the academic community where cybersecurity tends not to be at its greatest -, these data are essentially up for grabs. Not only by the occasional whistleblowers but more so by hackers with above-average skills and stamina doing commissioned data raids. (Remember Climategate. How do you think did climate change skeptics gain access to thousands of internal emails and document files from the University of East Anglia?)

But lets assume – just for the sake of argument – that hackers will go elsewhere or see the errors of their ways; and lets assume that the whistles would somehow be taken from those who feel compelled to blow them. Lets pretend that, by some miraculous touch of a magic wand, person-level clinical trial raw data could actually remain secure after authorized and controlled distribution.

Obviously there is a lot of legitimate research that can be done with such anonymized data, and making this type of research possible is the lever that Open Data advocates use to lobby for mandatory clinical trial data disclosure. In a recent editorial in the New England Journal of Medicine, the European Medicines Agency’s Senior Medical Officer and others argue that “Contrary to industry fears, access to full — though appropriately deidentified — data sets from clinical trials will benefit the research-based biopharmaceutical industry.”

This is quite possible, although it means corporate-level socialism: companies who have not paid for the clinical data will get them for free. You can debate this any way you want, but please forget all talk about anonymity and de-identification.

…And Hacked Or Not, All Data Are Analyzed

The fact that a given person has participated in a particular clinical trial can be deduced with high probability without having to “hack” (or decrypt) anything. All you need to do is correlate the key data of the trial (as disclosed in public databases such as clinicaltrials.gov or the EU Clinical Trials Register) plus the de-identified person-level clinical trial data with the person’s communication data as collected by the NSA et al.

This works nicely with only the famed “metadata” — the logged connection data for phone

Plug in anonymized clinical trial data here…

calls, electronic text messages of all sorts, and even from the envelopes of snail-mail letters. Scanning these databases against each other is sufficient for Big Brother’s advanced algorithms to join the dots, without cracking the content of the communication. Throw in the person’s web access data for good measure, and things become even better.

Knowing that a certain person has participated in a certain clinical trial can provide anything from a clue to hard legal leverage. First, it establishes that the person has a particular medical condition. Second, it implies exposure to certain drugs and/or drug candidates, with a degree of probability that follows from the randomization ratio between investigational drug, “gold standard” comparator drug, or placebo as defined in the trial design. All that should be very interesting for insurance companies and employers. Did the person discontinue his or her participation in the study prematurely? If so, what can be deduced from this? An adverse drug reaction? Family reasons? A lack of perseverance – weakness of character?

Welcome to 1984, with 30 years delay.