> This strategic partnership will combine Google’s cloud and AI capabilities and Mayo’s world-leading clinical expertise to improve the health of people—and entire communities—through the transformative impact of understanding insights at scale. Ultimately, we will work together to solve humanity’s most serious and complex medical challenges.
I'm sorry but this is just pure silicon valley speak. Are Mayo's patients really going to know what Google is doing with their clinical data? When I hear "partnering with Google to create machine-learning models for serious and complex disease", I have a hard time believing Mayo patients know what they are signing away when they consent to this (if at all, which is not mentioned?)
I have 2 rare diseases, and one of which was discovered via NIH funds at the Mayo Clinic in the early 2000s. I also have type 1 diabetes, and I can attest to the veracity of the claims made on the blog post.
For somebody like me, the situation is unwinnable, if I want to live. HIPAA is a joke because it is perfectly legal to combine other data with the HIPAA anonymized source to identify the individual. Every day, leaving the US looks better.
How would you combine HIPAA with another data source to identify the individual? Not suggesting it can't be done, just wondering how one might do that? Being able to link data that can identify a person to some de-identified would only be possible if the original data was not properly de-identified right?
There is no such thing as "proper de-identification" in general; it's all the matter of what other data sets the re-identifying party has at its disposal.
Consider the following de-identified data sets:
- [date, time, clinic, procedure or test being done, insurer] - as collected by the clinic chain so that it can get money from insurers
- [month, clinic, test name, test result] - for all tests made in the last year, collected for statistical purposes
- [date, time, latitude, longitude, phone number] - because AFAIR telcos sell this data
- [name, surname, phone number, ...] - some insurance company's list of customers
If you can get your hands on these datasets, you can trivially de-identify patients and even assign test results to them with high probability (that depends on how many tests of a given type are made in any given clinic per the unit of time used to group the second data set).
Real-world data sets may be less clear-cut than this, but there is more of it, and you can apply statistical methods to find correlations. You don't need to be 100% sure customer X has diabetes for the information to be useful to you; 70% or 60% is useful too.
"The following identifiers of the individual or of relatives, employers, or household members of the individual, are removed:
(B) All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code
(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
This the "Safe Harbor" method.
You could use the "Expert Determination" method. However, date + time + location attached to health information in your first data set definitely doesn't meet the criteria. I'll eat my hat if you find a supposed "non-PHI" data set with those.
In fact, the criteria for expert determination is literally that re-identification cannot be performed (without already having PHI-type information).
> HIPAA is a joke because it is perfectly legal to combine other data with the HIPAA anonymized source to identify the individual.
HIPAA may be a joke, but not for this reason.
If information can be re-identified as PHI in any way (including matching phone numbers, birth date, IP addresses, patient account #s, etc.) it doesn't meet the de-identification standard.
You must remove the 20 types of identifiers, or receive a certification:
"A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:
Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information;"
Moreover, your information can only be used for research if you give written permission (Section 164.508). If you have given this permission, you may revoke it for the future.
I gotta say, I wonder at what the balance of any discoveries will actually be. I am not a doctor, but I wonder if Machine Learning will identify anything other than geographic clusters of patients? I suppose that it is possible that this kind of info could help in diagnosis, but, machine learning seems unlikely to result in cures given the lack of any actual trials. Of course, improved incidence data might truly help earmark funds for research.
personally, I hope that my data could help people. I am not so concerned with my privacy that I would withhold my medical records if they could help. But, in exchange for that contribution I do want protection against misuse of that data by my insurers or other institutions. I recognize it isn't possible for me to enforce that use in any other way than withholding my data entirely.
The reality is that we cannot, as a society, afford to have no trust in our institutions. We need instead to focus on providing oversight and guidance of them such that they can actively contribute to the public good. Private contributions to institutions is mandatory for the public to succeed. maintaining that trust is the responsibilities of those institutions and they should seek whatever audits and oversights they can to keep that trust and fulfill their charters.
I’m curious, are you worried that if everyone in your position doesn’t share data there will never be research on your condition and therefore no cures or better treatments?
I worry that one tradeoff of more privacy and less trust is that researchers won’t get the information they need to produce cures and treatments. It’s a faustian bargain that people who are sick have to either risk having their information leak, or risk science ignoring their condition entirely.
> I’m curious, are you worried that if everyone in your position doesn’t share data there will never be research on your condition and therefore no cures or better treatments?
You have data sharers to blame for that. They're the ones that are destroying possible cooperation (everyone with that condition sharing data).
> It’s a faustian bargain that people who are sick have to either risk having their information leak, or risk science ignoring their condition entirely.
Indeed, and I think the way to solve it is to go after the leakers and the sharers and the "entrepreneurs". If I give my data for medical research, I mean bona fide research, as in scientists and labs and tax-funded scientific papers, and not "research" into lowering operational costs by selling data, or "research" done by startups partnering with the clinic.
I opted out of EHCR here (apparently I was one of the very few who did from a conversation with my GP receptionist) because I simply don't trust "It'll save us money" as a reason (I also don't trust my gov to be competent).
ML won't be used to find cures, but to diagnose and classify disease. For example, CNN features could be applied to medical imaging to learn to detect various diseases, and that algorithm could replace the "quick read" that physicians order when the Radiologist isn't available to do a full read.
Another possible use would be to identify septic or nearly septic patients and alert a clinician to intervene.
I read this more as: oh boy, two heavily corrupt industries are forming a partnership. I'm not throwing shade at Mayo Clinic sspecifically (the research there is vital), but the healthcare industry in general. I can tell you I don't want a data mining and advertising giant focused only on revenue in an industry that can directly effect my longevity and is already focused only on revenue. Healthcare is corrupt enough, I don't want it to get worse (and it will).
Neither tech nor healthcare (pharma, insurance, and many hospitals) want anything more than money and fending off sharks is not something I want to deal with when I'm at my most vulnerable points imaginable (most will be fairly sick at some point in their life even if it's only near death).
I "love" how Google just throws in "rare diseases" in to this, via their press release, as somebody who lives with 2 rare diseases that affect my peripheral nervous system. Actually, one of the diseases I have was discovered in the early 2000s on NIH grant funds at the Mayo Clinic.
This sounds wild, but it is true: Rare diseases are an absolute cash cow, and everyone should watch this. Our healthcare system in the US will be unsustainable if orphan drugs are not regulated (Which is why I naturalized as an European Union citizen, in addition to being American. I fret and worry about getting proper access to medical care every single day.): https://www.nytimes.com/2019/08/23/the-weekly/rare-diseases-...
(I do not believe that healthcare for all is unsustainable, but an unregulated free market will make it unsustainable.)
Just in case anyone was wondering, it is common to have a rare disease, and they are unfathomly expensive to have. In the US, the definition of rare disease (which really should be called "orphan conditions" based on the law) is tied to "orphan drugs" which in theory can collectively benefit 10% of the general population. There are a ton of orphan drugs being approved at the moment, which cost between hundreds of thousands of dollars per year to millions per year, in the US. The European estimate on rare diseases is more realistic and 6-8% of the general population has a rare disease.
So, do not think think that it cannot happen to you. You are naive to believe otherwise.
> The European estimate on rare diseases is more realistic and 6-8% of the general population has a rare disease.
But not the same rare disease.
Orphan drugs collectively may benefit 10% of the population, but not individually.
Not that I disagree with what I think is your main point: a profit driven medical industry hurts those at the extremes, relative to those in the median. I don't know how awful that is or isn't. 100 years ago, those folks would just have suffered. It's a profit motive, in some part at least, which has fueled advanced treatments. If all healthcare and healthcare research were socialized, maybe the expensive treatments wouldn't exist at all.
What I'm not sure about is if you are critical of, or in favor of, this collaboration. Bringing "commercial-grade" AI to healthcare sounds like a good thing to me on its face. I've read here and there (perhaps it's sensational, but still) how some AI can be order or orders of magnitude more accurate than doctors when evaluating x-rays, or scans, or other diagnostics.
My worry here is in the profit motive of Google and the fact that, well, they suck these days in that they do not care about user privacy.
It sounds like you're saying saying that "orphan drugs" can benefit 10% of the population and are also "cash cows." Do I have that right? I ask because if that's true, it seems to me that this is a good thing. If it wasn't a cash cow, would companies still be incentivized to take on the risk of researching these drugs and going through the work of having them tested and approved?
Your post seems critical of Google, but I don't know if that makes sense. Isn't your main criticism with the healthcare system and laws surrounding it? I don't know if we can blame companies that operate within existing law. The blame should instead be passed to policy makers and their voters.
I’m pretty excited about this. Although it’s in Rochester it will boost the MN tech scene. There’s a lot of opportunity here for Google to learn about healthcare as well. Their AI tools should help improve outcomes. I can think of half a dozen promising projects just based on the current scientific literature. I don’t think google is going after EHRs here. Epic is too dominate. Also google could just buy an EHR company if they really wanted to get into that space. Interestingly Epic is pitching AI tools as well.
wouldn't medical data be covered under HIPAA? (I really don't know - it's possible anonymized data etc. might get around those restrictions). I would hope that when it comes to medical data laws would prevent some of the usual privacy concerns around googles data collection.
HIPAA anonymized data sets can be combined with any other data set(s) to reidentify the individual, and it is 100% legal to do so. In fact, data brokers (there are 4,000-8,000 of them in the US) will sell lists of people with, for example, 150 columns of data tied to them, with one of the columns being "presumed medical conditions". Social media companies and other marketers use these lists.
In theory, HIPAA anonymized data sets should be nearly impossible to reidentify in combination with any other data set.
In practice, that's probably still true for “safe harbor” deidentification, but less true for “expert determination” deidentification  that doesn't need the safe harbor rules. The latter option should be eliminated.
The practice spooky23 reports in the thread you cite appears to be blatantly illegal, and the assurance that it was not came from someone who was paid to protect the company; no complaint was made to anyone responsible for enforcing the law.
That's probably the biggest problem with HIPAA, not the law or supporting regs (which have problems, like the one I address upthread), but that most people's first and only complaint of a problem will be to the wrongdoer themselves, not anyone with an interest in enforcing the law. (In Spooky23’s case, there was some effort to go beyond that, but not to an entity actually responsible for enforcing the law in question, or even an agency of the right sovereign entity.)
In any case, while the practices spooky23 raises are, legal or not, a real concern, they in no way justify characterizing my criticism of the specific problems with HIPAA deidentificationn rules as naive in the context of a pre-existing discussion of reidentification of deidentified data (which is a completely different issue than sharing, legally or not, data which is not deidentified as is the issue in spooky23’s case.) Again, it's a real issue, just not a germane one to where it was agressively thrown into the discussion.
Did you read the second comment on that link from Spooky23? It is legal. Also the second linked article ascribes to the practice being legal. It was a Propublica report.:
"Yes, you are. The events surrounding what happened to my wife was very painful (an ectopic pregnancy that nearly killed her), and a thoughtless reminder was very unwelcome. I still feel violated and betrayed.
In our case, I found out the marketing list from Enfamil and bought it for my zip code. _I complained to the hospitals’ privacy officer and the state regulator and found that everything was legal._
In our case, the hospital pharmacy issued drugs to her indicative of a pregnancy. The pharmacy or insurer provides that information in real time to data brokers. The pharmaceutical companies assign quotas and send salespeople for certain drugs. There are other ways for data to get out that we’re not certain of. Perhaps the insurer “anonymizes” and sells subrogation information. Or the lab. In any case, they knew that my wife was admitted to an OB floor of a hospital, but didn’t know the outcome.
It’s not going away. The US government uses these same techniques with companies like Google to combat extremism or terrorist conversions — they actually use factors like this to target potential recruits with counter-information via ads. "
> Did you read the second comment on that link from Spooky23?
Yeah, as you can tell by the fact that I responded to the post pointing out that the two people complained to were:
(1) A person whose job it is to make sure the hospital doesn't get sued, who is never going to admit wrongdoing, and
(2) An official from the wrong agency (and even the wrong government) when it comes to the law in question.
Also, note that the link you've copied that isn't a 404 is only tangentially related, as it is about gathering and sharing data that never comes under the protection of HIPAA, not resharing PHI as addressed in spooky23’s post, which again is a different issue than reidentification of HIPAA deidentified data that I was responding to here. There are lots of different issued around health data, and or isn't helpful to conflate them, much less to hurl abuse at people for failing to conflate the different issues.
I am not so certain about this. SPecifically, the claim that HIPAA anonymous datasets can be used to reidentify- yes, we know this is technically possible. But, the implication that's it's legal- I don't think that is specifically correct. By the terms of the law (of which I am far too familiar), if you did this you would generate PHI, which would fall under the privacy rule (and could not be resold).
I don'tknow the specifics about the data brokers you're describing, this is a huge and complicared area, but I think it's correct to say that companies cannot re-identify de-identified data and then resell it as identified data, legally, under HIPAA.
Wouldn't the entities you're describing be Health Clearinghouses?
"""Health Care Clearinghouse – A public or private entity, including a billing service, repricing company, community health management information system or community health information system, and “valueadded” networks and switches that either process or facilitate the processing of health information received from another entity in a nonstandard format or containing nonstandard data content into standard data elements or a standard transaction, or receive a standard transaction from another entity and process or facilitate the processing of health information into a nonstandard format or nonstandard data content for the receiving entity."""
My read is that the entities I'm describing would fall under this. If you can point to a specific example which you believes violates this (not an anecdote, I'm talking about investigative journalism or a court case or an academic with credentials in this area), I'd love to hear about it.
> Wouldn't the entities you're describing be Health Clearinghouses?
No, a clearinghouse is (to summarize the definition you posted from the regs) an intermediary between providers and/or payers in handling transactions for which standards exist under HIPAA.
They receive PHI in either standard or nonstandard forms, transform it to or from standard forms if necessary and transmit it on; it'd PHI the whole time through that function.
An entity acquiring deidentified data (which is explicitly not PHI under HIPAA, that's the whole point of deidentification) is not (for that reason) a clearinghouse, and if they can get other data and reidentify the deidentified data, they can do whatever they want with it.
The theory of deidentification is that the risk of this is minimal (indeed, other than scrubbing virtually everything that could possibly be used to reassociate the data, the only way for PHI to be deidentified is to get a notionally-qualified expert to certify a very low risk of reidentification.)
The problem is that all such certifications are based on a faulty premise: if data is not completely scrubbed so that reidentification without having essentially the equivalent to the original PHI is impossible, the risk of reassociation is almost never very low, because the process is automatable and the marginal cost is near zero.
OK, so do you have examples of "An entity acquiring deidentified data, if they can get other data and reidentify the deidentified data, they can do whatever they want with it." actually happening, outside of academic articles?
Specifically: can I go to a data broker, today, in the US, and obtain records under my name that were derived from entirely de-identified data, that has been re-identified by the data broker?
I've been talking about legality, not what is in the wild (other people have made claims about what's happening in the wild, but some of those seem to be conflating direct release of PHI, reidentification of deidentified data, and other issues.)
> from entirely de-identified data
What do you mean by “entirely de-identified”? That sounds like you are referring to the HIPAA safe harbor option (which specifies an extensive array of things which must be completely purged), rather than the alternative HIPAA “expert certification of low risk” option. The problem is that the latter has the exact same legal effect as the former, though the only reason to ever use it is because the data isn't entirely de-identified.
The risk is with legally de-identified data, which is not restricted to entirely de-identified data.
The constitution protects against warrantless search and seizure specifically against "their persons, houses, papers, and effects". Courts in overwhelming number interpreted that to mean cops can take paper money from you without a warrant.
You trust them to do the right thing for HIPAA in regards to a multi billion dollar enterprise?
I have worked on medical/improving patient outcomes consulting projects three or four times - enough to get a feeling for how difficult it is to accomplish all of: keep patient data private, convince data partners to spend the resources to sanitize data, share benefit of research amount partners, and work with different data formats.
I heard a keynote at NACL a few years ago that was a call to arms to solve these problems.
They mention "digital transformation" at the end and man Google is not the right partner for that. They are a software company, not business consultants. I'm sure they have no respect for what it takes to make non-tech people change their way of thinking.
Healthcare has many forces driving less than optimal outcomes.
The for profit status of treatment in "western" medicine.
Laws like HIPPA that are well intention-ed, but written by lobbyists and out of field and out of date lawyers/politicians who don't understand the actual nature of data protection or the need for a patient to be in control of their medical records in meaningful ways.
There's also the lack of a national / international identity and legal / data security infrastructure: this makes it very difficult to associate government issued IDs to patient records and requests / authorizations for limited sharing of those records.
In a less crazy world the outcome might look something like this:
Everyone has a Digital ID; this is a government issued or signed PKI based contract approval key. It would be stored in a dedicated, open hardware, firmware and software, wallet that is used only for making strong signatures.
The Digital ID allows the patient to log in to government websites and associate their healthcare coverage (ideally single payer, but if they're rich and have a luxury plan that could be linked as well) at various medical centers to their (emergency) care records. They can also actively choose to, or passively allow, the sharing of specific records from one provider to anyone else, as well as obtain personal copies of all of their records from all of their providers. Any time a provider is no longer covering a given patient stewardship of those records transfers to the government agency providing this service (and is paid for out of a general fund based on taxing providers so they don't have to deal with this).
A management matrix might also allow for general records access approval, in the case that the patient just wants their entire medical history and ongoing updates to be provided to their pool of physicians.
Through that framework outside entities can also obtain access keys and links for the records at other providers which they are authorized to view the records at.
Also; of course, all of the records would be required to be in "open, patent free, free to implement record formats as standardized by the medical industry software and equipment providers"; a specific format wouldn't be legally mandated, but the use of formats that are intended to be interchangeable would be.
What you describe more or less exists here in Spain. The (mandatory) national ID is a NFC smart card with a PKI key . With that (among a lot of other things) I can login into my regional government website  and look at my (single-payer) healthcare medical history, download it, see who has requested access to it, restrict access to some parts of it, make an appointment, ask for a new doctor, etc. etc. Of course it is sadly more closed-source, proprietary, lowest-bidder work than your vision, but at least the idea is out there and available for millions of people already.
What if a "patient" happened to grab the records that were faxed from one office to another, or worse, someone intentionally got a common transposition of fax number for an office and captured medical records transferred by exemption?
There are many what-ifs. The intent of the system I outlined is to make good data-hygiene practices easier and thus more likely.
I'll also point out that most EHR systems aren't 'airgapped' like paper records of old, but are still connected to the internet at least loosely for security updates if not limited remote access.
If there's some specific attack scenario that you feel is worthy of discussing as a topic that positively enhances knowledge and the exchange of information please outline such a concern in a proper venue; which might or might not be this comments thread depending on the specific concerns. I merely provided a back-of-napkin idea to start from.
Trusting Mayo a bit to do the right thing here, I believe they will rein in Google's lust for reselling personal data, and limit the partnership to providing labeled data Google will use to train their latest pattern recognizers.
I suspect Mayo also hopes Google can break some new ground in analytics beyond munging mere patterns. No doubt Mayo would love to explore all kinds of advances in medical practice using novel monitoring and instrumentation, esp. in the clinic.
No telling what Google has proposed: maybe a lot, maybe only a little. Their announcement says nought.