7 comments

  • atq2119 1625 days ago
    The headline feels a bit too click-baity, but this really is serious stuff.

    One of the most fundamental instructions of x86, the Jcc conditional jump, can apparently be executed incorrectly by most Intel CPUs out there.

    What's worse, part of Intel's strategy for mitigation is to push compiler toolchain changes that, if deployed as they desire, will punish the performance of everybody on x86, including those who are using unaffected processors (cough AMD cough).

    • damageboy 1625 days ago
      OP from twitter here. I'll take the blame for the click-baity-ness, but I was really shocked and this was in real-time in my defense.

      I'm assuming it will get resolved, as the errata suggests by shoving lots of 0x2c segment overrides to re-align those jumps.

      The price won't be 0, but will definitely less than the very extreme 20% edge case I stumbled upon.

      I'm more "worried" or annoyed by older binaries that will never get updated. But hey... (;⌣̀_⌣́)

    • my123 1625 days ago
      If only the microcode workaround affected just Jcc...

      > The MCU prevents jump instructions from being cached in the Decoded ICache when the jump instructions cross a 32-byte boundary or when they end on a 32-byte boundary. In this context, Jump Instructions include all jumptypes: conditional jump (Jcc), macro-fusedop-Jcc(where opis one ofcmp, test, add, sub, and, inc, ordec), direct unconditional jump, indirect jump, direct/indirect call, and return

      These patches are just to recover some of the performance penalty...

    • cesarb 1624 days ago
      > including those who are using unaffected processors (cough AMD cough).

      A more recent Phoronix article ("Intel's Assembler Changes For JCC Erratum Are Not Hurting AMD", https://www.phoronix.com/scan.php?page=news_item&px=AMD-With...) says otherwise.

    • icebraining 1625 days ago
      I wonder if it'll start making sense to ship different binaries for Intel CPUs.
      • bArray 1625 days ago
        It'll confuse the hell out of people, but I think it's probably the way to go. I hope for example that Linux based package managers split the "amd64" so that we all don't get hit by this crap.
        • yabadabadoes 1625 days ago
          The naming AMD64 correctly was visionary, but what do we call broken x64?
          • cassianoleal 1625 days ago
            Clearly, intel64
            • saagarjha 1624 days ago
              Why not “Intel Architecture 64-bit” to be clear? You can even shorten it to “IA-64”.
              • roblabla 1624 days ago
                Unsure if this is a joke and I'm about to be wooshed, but for anyone who doesn't know:

                Unfortunately, IA-64 is already taken for the itanium architecture. That's why the Intel Software Developer Manual names it IA32-E and not IA64. IA32 is x86, IA32-E is x86 Extended, or x86_64.

                Thanks intel, I guess.

          • ajb 1624 days ago
            x64_broken. Tell it like it is.
    • smitty1e 1625 days ago
      This is why we must protect our right to bear ARMs.
      • cesarb 1624 days ago
        ARM is no better. One of the most popular ARM boards (the Raspberry Pi) uses a core which is affected by an errata which also needs a workaround on the assembler/linker. I don't have my RPi with me at the moment, so I don't know the errata number (it's one of the two printed by the kernel during boot).
        • yabadabadoes 1624 days ago
          One errata on a broadcom(?) chip makes ARM no better than Intel? I think RPi has had to explain a few times why they choose problematic chips.
          • cesarb 1624 days ago
            The errata is not from Broadcom, it's on the Cortex-A53 core from ARM it uses. From a quick search, it's probably errata 843419 (see https://static.docs.arm.com/epm048406/200/Cortex-A53_MPCore_...), worked around by passing the --fix-cortex-a53-843419 parameter to the linker.

            And like this Intel errata, it's instruction-address-dependent: one of the conditions to trigger it is when the relevant instruction is near the end of a 4K page. One workaround being "put the instruction somewhere else".

            • yabadabadoes 1624 days ago
              Well that is an ARM errata, but if I understand correctly it's basically if you drop caching in a way only a privileged thing like hypervisor is allowed to do so this workaround is in some specific code..

              The initial reports and secrecy around the Intel issue makes it sound like they need to be mitigating all the time in normal modes.. But I guess it will take some time to see what is really going on.

  • mmastrac 1625 days ago
    This was pretty terrible. I wouldn't be surprised if this was responsible for a good number of heisenbugs on a few platforms.

    Not only can you not use a jump instruction ending or crossing a 32-byte boundary, but you can't use a "macro-fused" pair of instructions like cmp+jump! [1]

    The fix is _relatively_ simple [2] on affected platforms as they will likely be able to pad an appropriate level of NOPs to avoid that boundary before emitting a problematic set of instructions, but you will be taking a perf hit equivalent to losing a bunch of cache memory to useless NOPs and a (likely small) amount of power/time to skip those NOPs in the instruction stream.

    Where this gets ugly: people shipping binaries will have the choice of mitigating this for everyone (padding out code+small perf impact on non-affected platforms) or hitting a very slow code path on affected systems.

    [1] https://www.intel.com/content/dam/support/us/en/documents/pr...

    [2] Edit: the recommended fix from the whitepaper appears to be padding out earlier instructions via a "benign" prefix rather than adding NOPs.

  • lousken 1624 days ago
    Intel is becoming a meme with those 30-50 page erratas in their documents. When they can't improve IPC or nm tech why don't they at least focus on fixing bugs?
    • raxxorrax 1624 days ago
      Well, if they fix this bug the hardware will become slower in any case. Honestly I don't understand the issue and wonder why it didn't manifest constantly.

      > unpredictable behavior could happen when jump instructions cross cache lines

      Under which kind of circumstances? What does it mean anyway? The jump-instruction itself with arguments crossing cache lines? Wouldn't that happen quite often?

  • giomasce 1625 days ago
    I am not really following the details, but it seems that Intel is taking a blunder after the other. Will we ever able to trust their CPUs again?
    • orbital-decay 1625 days ago
      We have been able to, and pretty quickly, after the F00F and FDIV bugs.
      • giomasce 1624 days ago
        Yeah, but those two were two isolated issues in a decade. Now we're having a continuous stream of microarchitectural bugs that require more and more complicated fixes in toolchain, kernel and microcode. I am an outsider, but the impression is that the thing is exploding.
        • mschuster91 1624 days ago
          > I am an outsider, but the impression is that the thing is exploding.

          To be honest it's the entire architecture that's exploding. There's a reason why ARM has all but taken over the smartphone market (and it's not just power efficiency), the only thing keeping x86 alive is the massive demand for backwards compatibility. Good riddance to x86/x64 once someone makes an ARM core capable of current Ryzen performance and attaches a runtime binary translator to it.

          • Crinus 1624 days ago
            > Good riddance to x86/x64 once someone makes an ARM core capable of current Ryzen performance and attaches a runtime binary translator to it.

            What you expect to happen: developers will target fastARM for new code and users will use the x86/x64 translator for existing/old code.

            What will really happen: outside of open source circles (where they do not need the translator anyway) developers will keep targeting x86/x64 for new code since their existing users will keep using their existing x86/x64 machines and anyone using the new fastARM machines will still be compatible thanks to the translator.

            This of course assumes that fastARM will be fast enough and the x86/x64 translator to provide better performance than a real x86/x64 machine. Otherwise users wont have much of an incentive to upgrade as all of their existing software will become slower. At least unless they are forced to (see Apple).

  • executesorder66 1625 days ago
    What is the purpose of "censoring" the word shitstorm, if it is undoubtedly obvious which word you meant to the point that it makes no difference either way?

    Or to phrase my question differently:

      Wh*t is t*e pur**se of "cens*ring" th* w*rd sh**storm, *f *t is u*doub*edly ob*ious wh*ch wo*d y*u me*nt to t*e poin*t th*t i* mak*s n* di**erence ei*her w*y?
    • bil7 1625 days ago
      seems to me like you are questioning the whole concept of censoring expletives
      • chongli 1625 days ago
        I think the GP’s point is this: if you’re against the use of an expletive, then don’t use it. Putting it behind a fig leaf is embarrassing and silly.
      • lidHanteyk 1625 days ago
        Sure, Bowdlerization is harmful. It is a lossy transformation, it doesn't protect children, and ultimately the only people protected are the sort of people that don't want other people to be free to explore the world, leading to censorious and stifling behaviors.
    • grzte 1625 days ago
      Americanisms. Like how European reality shows are starting to bleep expletives when that's complete unnecessary here.
  • stefan_ 1625 days ago
    Instead of another fucking Twitter thread, just read the article from Phoronix from yesterday:

    https://www.phoronix.com/scan.php?page=news_item&px=GNU-Asse...

  • platz 1625 days ago
    I don't appreciate the aggressiveness and clear attempt to induce outrage with the title. This is the outrage algorithm working.

    * Edit - the previous title referenced a Twitter thread, not phoronix. The new title is fine.

    • omnimus 1625 days ago
      I do appreciate it. Its quite serious.
      • platz 1624 days ago
        Good for you. Please don't send any more "s__tstorms" my way, deal with them yourself, thanks.