AI Models Predict Breast Cancer with Radiologist-Level Accuracy

(ibm.com)

146 points | by tiagobraw 1766 days ago

13 comments

btilly 1766 days ago
This is not exactly new. I remember seeing models that did really well many years ago. And again caught many that humans had miss.
The problem is that they fail differently than humans do, in a way that humans wind up not trusting the results.
It turns out that there are parts of the breast that are easy to spot tumors in, and parts that are hard. A human scans quickly over the easy areas, and focuses on the hard. The result is that humans make careless errors on the easy areas, and catch hard tumors. Computers make no careless errors, but can't catch the hard ones. Thus when a human sees what the computer caught that the human did not, the mistake is easily dismissed. But when the human sees the ones that the computer missed, it becomes, "It doesn't know how to do the real work."
Ideally the two would be used together for better results than either alone. But humans wind up resenting the computer...
[-]
- taneq 1765 days ago
  > The problem is that they fail differently than humans do, in a way that humans wind up not trusting the results.
  Not only that, but human fallibility is accepted where machine fallibility is not. There's something about being a "person" which makes it acceptable for you to just take the blame for something. A senior radiologist makes a glaring error and "it happens, people make mistakes". A computer makes the same error and it's a problem which must be fixed before the computer can be trusted.
  Ultimately I believe this is a cognitive bias that we're just going to have to learn to let go of.
  [-]
  - semi-extrinsic 1765 days ago
    > Ultimately I believe this is a cognitive bias that we're just going to have to learn to let go of.
    Unfortunately I don't think this is merely cognitive bias. It's actually built into our legal system at a pretty fundamental level: machines are held to a higher standard than humans when it comes to failures with grave consequences.
    And keep in mind, this system only achieves the same accuracy as doctors. What is the wnd benefit, other than shifting where money flows?
    Do you really think this benefit is substantial enough that we will se major overhauls of tort law in all US states and in every country in the rest of the world?
    [-]
    - psychoslave 1765 days ago
      > Unfortunately I don't think this is merely cognitive bias. It's actually built into our legal system at a pretty fundamental level: machines are held to a higher standard than humans when it comes to failures with grave consequences.
      Well, laws reflect culturo-cognitive biases, don't they? And also they evolve.
      For the last part, that seems to generally be a wise "bias": when a machine fails, chances are good that all the identical machines will fail the same way.
      When a human err, chances are good that this won't be in a way you might expect to identically happen in all its peers, although common cultural biases can happen obviously.
      Also so far, our machine are far less expected to self-correct their behaviors when they err, especially regarding some untold social exceptions that were not met through some unexpected side effects.
      Ultimately, machines don't care of consequences of their acts because they don't have any feeling of responsibility to someone they love, not even themselves.
- ska 1765 days ago
  I worked on one of those systems. For some screening tasks, it has been at or beating average-radiologist performance since the mid or late 90s.
  There are a number of issues, but it's true that raw algorithmic performance is a small part of the whole picture.
  [-]
  - RosanaAnaDana 1765 days ago
    This is often the case in almost all ml-augmented workflows. The augmentation begins and ends with people looking at things, because the machine only knows what you told it, and doesn't care.
    The real work is building systems around people who do the interpretation and labeling to make their jobs easier.
    [-]
    - ska 1765 days ago
      Yes, very much this. And the importance of workflow impact cannot be overstated.
      The (clinical) system I mention above, the ML part was a few percent of the total effort, max.
  - agumonkey 1765 days ago
    can you tell us more about the rest ?
- neaden 1765 days ago
  Pigeons can do better than humans when they work as a team: https://www.scientificamerican.com/article/using-pigeons-to-... but obviously no one is going to trust pigeon diagnosis anytime soon. Computers have a bit more credibility.
- twanvl 1765 days ago
  > The problem is that they fail differently than humans do
  That is a great argument for giving such a model as an aid to a human doctor. Together they will be better then either one alone.
  [-]
  - onlyrealcuzzo 1765 days ago
    In Thinking, Fast and Slow -- the author details a double blind trial where the did this. It was worse with humans and AI than with just AI. Humans think they can use AI as a guide and move it in the right direction. But the movements they made, on average, were bad.
    [-]
    - notahacker 1765 days ago
      Surely in this type of instance (looking at a scan to answer a yes/no question) the human and AI act independently, with the computer being a useful aid because it separately picks up a few of the human's false negatives. Assuming false negatives are a lot worse than false positives, this can only be a good thing.
      [-]
      - chongli 1765 days ago
        If they lead to an unnecessary mastectomy then false positives are pretty bad. Not as bad as dying, obviously, but still a severe blow to a woman's identity and sense of self worth.
        It's going to be a hard pill to swallow if you have to tell a woman "sorry, we removed your healthy breast because the computer made a mistake."
        [-]
        glangdale 1765 days ago
        I think the idea of "screening" is that you don't just race off to a mastectomy the minute some AI model goes off. Of course, putting more false positives through a fallible process of review does run the risk you speak of.
        [-]
        AstralStorm 1765 days ago
        It does cause unnecessary biopsies for sure. And some stress on the patients.
        irq11 1765 days ago
        Even a false positive that leads to telling the patient that they may have cancer is bad. It leads to a life-long anxiety for many people.
    - stubish 1765 days ago
      It sounds like a smart hospital would run a patient through both human and AI screenings separately, and a different doctor to examine both results and evaluate the discrepancies. This way you would keep the strengths of both approaches, lowering the failure rates, and depending on the countries health care funding can be good business from the hospital's POV as they get to charge for the extra work as well as the better success rates to drive business.
      And I wonder what happens if you apply machine learning to looking at the difference between AI and human screening results.
  - bitL 1765 days ago
    Radiologists are really bad at detection, even after many years of study. That's quite often due to coarse level of details of scans when only large tumors can be observed or recognized with some certainty. Surpassing humans there is not so difficult, but improving accuracy from e.g. 32% to 34% doesn't really sound like a win :(
    [-]
    - carlmr 1765 days ago
      2% more accuracy could still be millions of people if it's a common enough cancer like breast cancer.
    - schwurb 1765 days ago
      > 32% to 34% doesn't really sound like a win :( We are talking about human lives here, not about beating some CPU benchmark. Detection improvement by 2% is huge in almost any sickness.
  - baq 1765 days ago
    remember when ensembles were the cool word before they got erased from collective consciousness and replaced with deep things? it can't even be a decade, was it 2012 or something?
    [-]
    - krisoft 1765 days ago
      They haven't got erased, but more like subsumed? If you use dropout to train your model that is basically equivalent with using an ensemble of deep neural networks.
      [-]
      - irq11 1765 days ago
        That is not even close to the same thing.
        If you train an ensemble of models with random dropout, you have an ensemble. Models trained with dropout will still have significant variation from run to run.
        [-]
        sooheon 1765 days ago
        > That is not even close to the same thing.
        It's a common interpretation: https://arxiv.org/abs/1706.06859
        [-]
        irq11 1764 days ago
        There may be a paper on it, but it’s not a common view.
        In particular, this paper neglected to do the obvious thing: ensemble networks trained with dropout. It improves performance over dropout alone.
    - rerx 1765 days ago
      Why shouldn't you employ an ensemble of deep neural networks?
      [-]
      - salty_biscuits 1765 days ago
        Correlated errors. Naive averaging will lead to overconfidence and it is not trivial to model the correlation. Boosting is worth a shot though.
        [-]
        rerx 1764 days ago
        My point was that ensembles of deep neural networks are commonly used and yield higher accuracies.
      - AstralStorm 1765 days ago
        More importantly, what happens if you put a radiologist opinion (or multiple) in such an ensemble?
  - acangiano 1765 days ago
    This. The argument was never to replace doctors. These are valuable tools that augment what doctors can do.
    [-]
    - TeMPOraL 1765 days ago
      No, the point very much is to eventually replace doctors. You just can't easily get there before first going through a doctor-machine cooperation period.
      Automation is a friend of society, but is not a friend of individuals working particular jobs. I think doctors are acutely aware of that.
- WhitneyLand 1765 days ago
  I don't get the relevance of comments like this which I hear all the time. Say everything in the comment is true. It's all within the noise of a decade of research from now till 2030.
  The big picture is, this patchy performance is the writing is on the wall. It's over for radiologists, for the most part.
  The nature of this problem is a great fit for ML and it will in short order (10 years) be superior in the vast majority of scenarios to expert level humans.
  People say, but psychology, fear, unknowns, will require human supervision indefinitely. Of course that's true.
  The problem is, radiologists will effectively be relegated to proofreaders. The number of minutes required of them per patient will plummet and so will their job market (unless changes allow many more untreated people to get imaged).
  What about the researchers? Even they will take a hit as the imaging analysis part of radiology research moves more along the spectrum toward yet another computer science problem.
  [-]
  - btilly 1765 days ago
    Algorithms have been competitive with humans for 20 years on radiology.
    And yet humans have not yet been replaced.
    Tell me what you expect to be different about the next 10 years that wasn't in the last 20? I'm open to being convinced. But you have to not just say that computers are going to be better - you have to explain why there wasn't already a switch.
    [-]
    - WhitneyLand 1765 days ago
      Have you every done anything entrepreneurial in healthcare or just tried to introduce a new or different treatment standard?
      1) It just takes longer in this field. Takes longer, but not necessarily more difficult (technically).
      It's a nightmare and when I helped consult to a couple teams for a bit I was shocked at the slow speed everything gets done at. Many reasons for that, but it's hard to describe how unlike a typical fast moving startup it is (unless your not changing patient protocols).
      2) ML is moving faster than it was 10 year ago. Way faster. So comparing work done 20 years ago is difficult, they just didn't have the same resources, ecosystem, and momentum.
      3) It's only a hard problem in practice not in principle and well suited to ML. Unlike some problems where difficulty in practice doesn't match difficulty in principle (like warp drive space travel), the remaining engineering and open research questions don't suggest anything that will hit a wall, or prevent ultimate success on the order of the time frame suggested.
      [-]
      - rstuart4133 1764 days ago
        > It just takes longer in this field. Takes longer, but not necessarily more difficult (technically).
        This immediately made me think "yeah they still use faxes", yet I don't think anyone doubts faxes are on the way out. Still "it just takes longer" understates the just how slow it is. It looks to me like a critical mass of older fax using doctors will have die off before the change can happen.
        It does make you wonder how they get away with this. In most industries competition and cost pressures will force the change. Not so in medicine, apparently.
  - rscho 1765 days ago
    Over for radiologists?
    Call me back when your machine can produce a comprehensive analysis of an anatomic configuration in relation to every element in the patient's file.
    The real problem with those ML systems is that people don't understand what a radiologist is. Let's perhaps solve that problem first?
    [-]
    - WhitneyLand 1765 days ago
      1) I understand there's more to it than that which is why I called out imaging analysis at some point during my post.
      2) I said over "for the most part". There are still travel agents today you know.
      Please understand, the argument is not that they will not exist, but that there will be a greatly diminished job market unless imaging demand grows tremendously. And also that the nature of the job will be much different and probably see salaries stagnate.
      3) Given your "call me when" scenario I would need ask specifics to properly respond. However at a superficial level I'm not sure I see any kind of millennium prize class problem at first glance.
      [-]
      - rscho 1765 days ago
        The job of a radiologist is to construct a story that holds water around the diagnosis and communicate that to other specialists, so they can use that to understand the problem at hand.
        Concretely, a radiologist may say "there is a lesion compatible with your presumption of diagnosis X, however another neighbouring lesion Y speaks against diagnosis X. In light of patient history, further examination of lesion Y by modality Z is advised". That's the minimum we expect, otherwise we wouldn't need radiologists because many specialists can analyze the images relevant to their specialty themselves.
        I don't deny it will be possible to automate in the future, but currently not possible. Radiologists are useful, and their job lies beyond matters of image description.
        [-]
        WhitneyLand 1764 days ago
        Sounds like we're in perfect agreement on the critical points and the only debate would be around the timeframe (10 years or ?).
        - It will be possible to automate a large % of a radiologists job in the future, it's currently not possible.
        - Radiologists are useful and their job lies beyond matters of image description
  - module0000 1765 days ago
    > People say, but psychology, fear, unknowns, will require human supervision indefinitely. Of course that's true.
    The humans that supervise that will become the new radiologists. The "best" of those humans will have cross-field disciplines in ML model development and traditional radiology. My capitalist side sees a huge opportunity here in consulting and helping the existing radiology departments(who are interested) bridge the gap from current practice to a hybrid approach.
    Adapt or die, etc.
    [-]
    - WhitneyLand 1765 days ago
      - What about the point that the total headcount demand for these supervisors/proofreaders will likely be drastically reduced?
      - The idea of cross-training radiologists to understand ML model development might be an ideal, but it's hard to see more than single digit percentages of folks who could make that transition. The jobs are way different and require years of ramp up. Even the mindset and the cultural approaches to work seems different, based on only a couple couple times I've worked on cross discipline teams.
      But hey, it's still unfolding, I could be wrong about the consulting. If you figure out a model and how to sell it ping me, I'd be glad to provide feedback to help you refine it quickly.
- wjn0 1765 days ago
  This is a crucial point. There's a good chance these are problems that could be solved with more data. However, I think philosophically the challenge of integrating these models in a modern healthcare system is more fundamentally related to explainability. A crucial part of a radiologist's job is the "why?" - why did you make this diagnosis, why did you flag this patient, etc. While there are models (especially in image processing) that tackle this problem, I'm not personally aware of them being used in a clinical setting. It's difficult from not only a machine learning perspective, but also in terms of HCI/UX.
- hathawsh 1765 days ago
  That's a helpful perspective. Would it be possible for IBM to create a service that allows patients to submit their own pictures for scanning?
  [-]
  - lostlogin 1765 days ago
    We are at a tricky stage with this. Images are too big to be emailed and many people no longer have CD drives. There is increasing use of tomographic imaging in examinations too, and the files are pretty big.
    [-]
    - hathawsh 1765 days ago
      On the other hand, cheap micro SD cards and thumb drives hold far more than CDs or DVDs.
      [-]
      - lostlogin 1765 days ago
        We prefer to send the images to the PACS the patient wants them on, as for most imaging that’s all that’s wanted. Obstetric imaging is one massive exception.
    - toomuchtodo 1765 days ago
      Auth workflow so whomever does the imaging stores the image, and the patient can grant access and a pointer to systems consuming the imaging.
- cma 1765 days ago
  > Thus when a human sees what the computer caught that the human did not, the mistake is easily dismissed.
  You mean dismissed as in not believing the result or not being impressive? (unclear if that's what you mean later by resenting and if it is tied to this statement)
  [-]
  - btilly 1765 days ago
    I mean dismissed as in not being impressive. "Yeah, yeah, I should have seen that. It is obvious. No big deal."
    And if you use the software as a backup, people don't respond well to their dumb mistakes being pointed out. And the result is that people put effort in to not making dumb mistakes...and therefore either slow down or have less time for what humans can do better than computers.
    In theory it should work a lot better than it actually does.
    [-]
    - inimino 1765 days ago
      What if you use the human as the backup? So they are looking for things the ML system missed. (As well as confirming the things it found.)
- Symmetry 1765 days ago
  So a lot like the centaur era in chess competitions.
- sadness2 1765 days ago
  > humans wind up resenting the computer...
  basis for this assertion?
- benatkin 1765 days ago
  I think this could be a new milestone, compared to what you saw, because it uses deep neural networks.
  I also am not sure saying computers can't catch the hard ones is true in light of this. It seems like a deep neural network would be useful to catch the hard ones.
  I agree with using the two together, but I don't see why the two can't be different AI subsystems. That seems to be what they're going for at IBM. Some of the power comes from a scientific model, while more power comes from well-trained deep learning networks.
  Before Alpha Zero, advances in self-driving cars, and face detection advances, I'd have agreed with you.
rayuela 1765 days ago
Anything related to AI coming out of IBM should be viewed with a huge dose of skepticism. They're honestly one of the worst offenders in overselling the capabilities of their products, bordering out outright fraud. There is certainly a lot of promise to the application of recent computer vision algos on medical imaging data, but I wouldn't bet much on IBM being anything close to a leader in this space.
[-]
- Myrmornis 1765 days ago
  I don't doubt what you say. Just want to point out that this is published in a peer-reviewed journal, so hopefully the academic community will judge it objectively.
  https://pubs.rsna.org/doi/10.1148/radiol.2019182622
  [-]
  - dragandj 1765 days ago
    OTOH, do even radiologists (or anyone else) can predict cancer at all before it happens? I thought that radiologists diagnose cancer once it is already there.
    [-]
    - thaumasiotes 1765 days ago
      > do even radiologists (or anyone else) can predict cancer at all before it happens?
      Sure, some of the time it's easy.
      Let's all recall the words of my mother's medical school instructor, "There's a bit of cancer in everyone's prostate".
      (The context was a lab exercise in which medical students were supposed to find which of a set of slides of prostate tissue was cancerous. The reminder was necessary because many of the slides were cancerous, just not at levels high enough to be considered medically alarming.)
      Predicting that a man will develop prostate cancer is basically the same thing as predicting that he'll experience old age.
    - the8472 1765 days ago
      not all tumors are malignant
baybal2 1765 days ago
I was lucky to date a girl who was into math, and who was coding those "machine learning" algorithms for a radiology startup here in Shenzhen.
She had a lot of scepticism for what she did. One of biggest showstoppers she said was the unpredictability of errors.
An algo can catch 99% tumors, including tiny ones, bur can randomly pass over very obvious ones which a human radiologist will spot with his eyes closed.
They had a demo day with radiologists, and them throwing tricky edge case xrays at the computer. Edge cases were all ok, but one radiologist pulled his own xray from his bag, with a 100% obvious, terminal stage tumor, and to company's embarrassment, the algo failed to detect it no mater how they twisted and scaled the xray. The guy then just walked out.
[-]
- ska 1765 days ago
  Had a similar problem ages ago, and ended up adding a "blindingly obvious tumor" detector pass before the regular pipeline, just to avoid this cognitive dissonance.
  This is one of the (many) reasons that practical classification systems, as against research systems, tend to become Frankenstein's monsterish over time. It's naive to think that a single approach and pipeline will cover your domain well.
- WalterBright 1765 days ago
  It seems to me the use case should be to have the radiologist look at a scan for tumors. Then, the algo should look. If they disagree, then the radiologist should look at the difference.
  It'll be the best of both.
  And in the scans where the algo is wrong, have the scan added to the machine learning database of the algo.
  [-]
  - ScottFree 1765 days ago
    Unfortunately, a lot of times hospitals can only afford one or the other. These systems are very expensive and radiologists and cytologists aren't exactly cheap either. But, I agree, both would be good, especially considering the volume a Cytologist is expected to screen in a single day.
    [-]
    - WalterBright 1765 days ago
      > These systems are very expensive
      Seems like a business opportunity for a cloud AI screening provider.
      [-]
      - ScottFree 1765 days ago
        You point out another business opportunity: a developer who understands exactly what regulatory hurdles you need to jump in order to release medical software. I'm not sure exactly what's required in this case, but I'm doubtful there are many cloud providers who are HIPAA compliant.
        [-]
        WalterBright 1765 days ago
        I'm not sure it would need to be regulated, any more than a medical textbook needs to be regulated. The radiologist would be making any decisions. As for privacy, an x-ray is sent. No personal information whatsoever.
  - ttlei 1765 days ago
    If the radiologist has to look at and double check every scan that algo looked at, then what is the point of the algo? Seems like a useless middleman that get in the way.
    [-]
    - ska 1765 days ago
      Screening is hard work and tedious, so even trained professionals regularly miss things. TP incidence rate is under 1% in the screening population.
      There have been studies showing significant improvement from double-reading mammo, for example (i.e. two radiologists, independently). Using an ML approach is trying to give you some or all of this benefit without the cost of redundant reads.
    - navigatesol 1765 days ago
      >then what is the point of the algo?
      The point is that the algorithm can improve results. This isn't ad placement, it's peoples' lives. Checking and double checking should be the norm.
    - telchar 1765 days ago
      Better to implement a system with a high rate of false positives (more importantly, low rate of false negatives) from the machine learning component, with all positive findings passed onto the radiologist. If the system can reliably (big if) filter out 98% of the chaff then the radiologist can spend a lot more time separating the false positives from the true positives. This approach has worked well for me so far.
      [-]
      - ska 1765 days ago
        This approach is problematic in medical screening applications. Mainly because you don't want to increase the work up rate for false positives since if they involve biopsy and a large screening population, eventually you will kill people this way (indirectly) so there is a pressure to control FP rate.
    - bradstewart 1765 days ago
      Because the scan check by the radiologist becomes a _double_ check.
- StreamBright 1765 days ago
  I guess this falls into a category that the ML algo learns a particular type very well and cannot recognise obvious cases if they look different than the training data. Human intelligence is a mix of pattern matching and attention focus that is hard to provide with a single pattern matching ML project. Isn't there projects try to use multiple pattern matching models combined to decrease the amount of false negatives?
- navigatesol 1765 days ago
```
  1. 99% > 95%, or whatever the radiologist's accuracy is.
  2. Combine both systems for obvious gains.
```
dontreact 1765 days ago
The reason I'm skeptical of this is that there is no actual comparison to human level performance. I.E. they didn't have radiologist actually read their images to compare against the model. Notice that the title of the paper is "Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms" it's only in the press release that they seem to imply a comparison to radiologists was actually done.
[-]
- thatcantbeit 1765 days ago
  https://pubs.rsna.org/doi/full/10.1148/radiol.2016161174
  This is their comparison point for actual radiologists. Citation number 6. It doesn't look comparable, though. Radiologists are around 90% specificity and sensitivity, which varies a good amount from the model's 77.3% and 87%, respectively.
  [-]
  - dontreact 1765 days ago
    This is not on this dataset though (right?), so not really a solid comparison point. Plus lik you mentioned, they seem to be doing worse than this benchmark.
michaelhoffman 1765 days ago
No positive predictive value reported, imbalanced test data, IBM. Garbage.
nkurz 1765 days ago
What makes this an "AI Model" instead of just a "Model"? That is, in what way does it have "artificial intelligence"?
[-]
- TuringNYC 1765 days ago
  "Model" ---> [[marketing department]] --> "AI Model"
  [-]
  - avgDev 1765 days ago
    [Student in school] Implemented MinMax algorithm in checkers ---> [student looking for work] Implemented state of the art AI algorithm, which successfully will beat the human opponent EVERY time. ---> [HR/Marketing dept at some corp] Wow you are HIRED!!!!!!!! ---> [Lead dev] Oof this guy can't program for shit.
    [-]
    - TeMPOraL 1765 days ago
      ---> [student about to get fired from work] Why on Earth did they put "AI expertise" in the job requirements if all they want me to do is to shovel CSS and JS, and the closest thing to AI they have in the office is a 1960s thermostat?
  - raxxorrax 1765 days ago
    Can I call myself an AI specialist if I successfully fed a plain support vector machine once or twice for diagnostic support? Feels like driving an old timer here...
- Myrmornis 1765 days ago
  "AI" has been used routinely as a synonym for machine learning for the last decade or more, hasn't it? This employed neural networks.
- amelius 1765 days ago
  Probably because it was obtained using some kind of machine learning.
- onemoresoop 1765 days ago
  Yes, the AI part is marketing/fluff
professorTuring 1765 days ago
The real problem here is when the society will allow a machine to diagnose them and if the society is ready to believe that most diagnostics are probabilistically made.
Up to date we allow humans to be at a 70% error level without problems, but we ask machines to be 100% effective.
The very same happens with autopilot, the big numbers say they drive better than humans but...
[-]
- petschge 1765 days ago
  I remember seeing a statistical analysis here on HN that said the numbers for Tesla autopilots are neither great compared to drivers of Teslas nor do they seem to be fair. (They found a case where a human driver had a crash in what would have been counted as 0 miles in the analysis, indicating that something is inflating the "crashes per miles" metric)
  [-]
  - AstralStorm 1765 days ago
    Using autopilot in parking?
- stubish 1765 days ago
  This isn't about group think. It is when individuals will allow diagnosis. If you give a woman two options, diagnosis by machine with a record of 80% or a human with a record of 70%, it is a really easy decision to make. The desire to not suffer cancer is strong enough to override almost all emotional arguments. And if you can afford it, you will likely choose both or get a second opinion.
- anthony_doan 1765 days ago
  These type of algorithms don't give a confidence interval for their predictions so I don't believe these diagnostics are base on probability at all.
  Having a confusion matrix for what the model predict correct or not is not the same as having a CI for the model's prediction.
  [-]
  - zone411 1765 days ago
    Isn't it possible to derive a CI from the confusion matrix? https://stats.stackexchange.com/questions/363382/confidence-... or using bootstrap? The authors of this research also provide CIs.
blueyes 1765 days ago
Old news from a major source of AI hype.
Here's some previous results https://med.stanford.edu/news/all-news/2018/11/ai-outperform...
Myrmornis 1765 days ago
@moderators: would it make sense to change the link to the journal article rather than IBM's article? It's free access.
https://pubs.rsna.org/doi/10.1148/radiol.2019182622
NikolaeVarius 1766 days ago
Is this new AI Model under Watson?
[-]
- layoutIfNeeded 1765 days ago
  There’s no such thing as “Watson”. IBM have put the Watson name on basically everything, to the point where its information content was reduced to zero bits.
  Watson for IBM is like the i-prefix for Apple.
- noelsusman 1765 days ago
  Watson is a brand, so that doesn't really mean anything. If Watson refers to anything it would be the NLP functionality that IBM sells, and that's not relevant here.
mtgx 1766 days ago
What's the False Positive Rate?
[-]
- ska 1765 days ago
  FP = system (or person) flags this as true when it is not
  TP = ... flags as true and it is
  FN = ... flags as false but it is true
  TN = ... flags as false and it is false
  To turn these into rates, you normalize them.
  e.g. TPR = TP/P = TP/(TP + FN) = 1 - FNR
  etc.
  These are characteristics of a classification system
  You will also hear sensitivity (TPR) vs. specificity (TNR) often, particularly in medical contexts. In other contexts you'll hear Type I (FP) vs. Type II (FN) error.
  In most cases you a set of trade offs in your algorithm, and will need to pick a balance between sensitivity and specificity.
  c.f. ROC: https://en.wikipedia.org/wiki/Receiver_operating_characteris...
  [-]
  - adyavanapalli 1765 days ago
    I think the OP is asking about the _value_ of the FPR instead of the definition.
    [-]
    - ska 1765 days ago
      Ah, if I misread then from the results section of the linked paper:
      For the malignancy prediction objective, the algorithm obtained an area under the receiver operating characteristic curve (AUC) of 0.91 (95% CI: 0.89, 0.93), with specificity of 77.3% (95% CI: 69.2%, 85.4%) at a sensitivity of 87%.
      I haven't read the papers methods, but the data set size is small-ish for this sort of analysis.
    - RosanaAnaDana 1765 days ago
      The value of the false positive rate is that it lets you know the probability of a true-miss. Depending on the classification exercise, you may be concerned with false positives, where the consequence of a missed call is significantly greater than an unwarranted checkup from a human doctor.
      [-]
      - shawnz 1765 days ago
        The numeric value
- TuringNYC 1765 days ago
  from the paper: https://pubs.rsna.org/doi/10.1148/radiol.2019182622
  AUC is 0.78
  Sensitivity-Specificity Graph is here: http://images.rsna.org/index.html?doi=10.1148/radiol.2019182...
  [-]
  - mdorazio 1765 days ago
    Thanks for linking. If I'm reading that correctly, it's pretty bad in comparison to an average human radiologist at ~6% false positive rate [1]. There's probably a bias factor in there, however, where humans are hesitant to predict potential cancer due to the cost/time involved in follow-up screening.
    [1] https://www.ncbi.nlm.nih.gov/pubmed/21643887
mlcrypto 1765 days ago
Doctors will be some of the first to be replaced by AI. My physicians walk around with a computer already checking all the boxes for symptoms and seeing what it says. I wish I could find one with a true intuition for medicine
[-]
- caraffle 1765 days ago
  Apparently you're not familiar with the documentation burden in the medical field. EHR's don't diagnose for you.
  There is no "true" intuition in medicine, just years of study and practice leading to quick recognition of common problems like any other field.