What can we learn from the matrix.org compromise?

(medium.com)

84 points | by cyber 1827 days ago

11 comments

  • pm90 1827 days ago
    This is such a poorly written article:

    * no detailed analysis of how the attack was undertaken. Its not even clear how the attacker managed to get in (was it a publicly exposed Jenkins? vulnerable bastion? what?)

    * no analysis of what the existing matrix.org security perimeter looked like or how it could be made better.

    * repetition of security tropes. Use VPN. Use Github Enterprise (wait wtf? Why not private repos in Github?). Don't use Ansible, use salt.

    Ridiculous. I was looking forward to a nice long read about how this breach was undertaken. Hugely disappointed.

    • bifrost 1827 days ago
      If you click through to the GH Issues I linked to there are some pretty good data points as to what happened. I didn't feel the need to copypasta.

      But yes, publicly exposed jenkins and repos lead to the compromise, not an uncommon story unfortunately.

      Perimeter - I didn't see much evidence of one existing and I didn't go probing their networks to find out.

      Security tropes are real for a reason, you don't have to believe me though.

      Private repos in GitHub are still publicly hosted and are orders of magnitude easier to get into than having an in perimeter repo. They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens.

      • pm90 1827 days ago
        > They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens

        Can you provide some actual instances of this happening? Genuinely curious, as my org is currently migrating from enterprise to cloud.

        • bifrost 1827 days ago
          I've mostly seen this reported in forums and during discussion, if you Google around you'll find some pretty useful hits.

          Here's a good one from reddit: https://www.reddit.com/r/github/comments/9odnvw/someone_fork...

          Its also discussed reasonably well in the infosec community. Basically GitHub is a great place to find other people's passwords and API keys.

          • baroffoos 1827 days ago
            Thats unrelated to github though. It sounds like the person did a git clone and then created a new repo and pushed it. You could do that with a self hosted git repo as well. To stop that you would have to have your git server block logins from non company machines and have some serious logging on all company machines to stop anyone moving it off via usb
          • bifrost 1827 days ago
            baroffoos: Thats pretty close to what I'm suggesting, no public repo access. It works.
      • bobwaycott 1827 days ago
        > But yes, publicly exposed jenkins and repos lead to the compromise …

        You mean the past-tense verb led, not its metallic homonym lead. :)

    • nobatron 1827 days ago
      There's a lot wrong with this article.

      Firstly having a private network for your infrastructure isn't a one stop solution for keeping attackers out.

      Secondly using Github Enterprise or self hosted GitLab doesn't make up for storing secrets in Git.

      Looking forwards to the proper write up.

      • bifrost 1826 days ago
        I've never claimed it was a "one stop", but it certainly keeps the random internet users to a minimum.

        And yes, using GHE or self hosted GitLab doesn't make up for storing secrets, but it at least keeps them out of the public eye so the effects are less brutal. Its still bad to store secrets in a code repository.

        My whole point is that you can reduce risks easily, yet some people don't for some reason.

    • netsectoday 1827 days ago
      * this idiot claimed "Ansible was used to keep the attacker in the system" which in all reality Ansible did what it was supposed to by altering the correct authorized_keys file and the attacker leveraged an old default in the sshd config. This is a sshd config issue, not Ansible.

      The sales-pitch for Salt (against Ansible) is ridiculous and misguided.

      I just checked out the Salt SSH module and even if they used salt they would still have this issue. Then answer here is to not use the default /etc/ssh/sshd_config value of #AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2. Uncomment and remove authorized_keys2.

      • viralpoetry 1827 days ago
        Why are you downvoted? Your Ansible explanation is indeed correct. It's SSH default problem not deployment toolkit one.
  • KirinDave 1827 days ago
    Why aren't people reporting the fact that Matrix.org actually lost control of their network a second time within hours of their first all clear sounding?

    I feel like this is an important part of the story for anyone looking for teachable infosec moments.

    • bifrost 1827 days ago
      I guess I technically glossed over that but I did say "One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system". The attacker was persisted via CM and their public repo, I'm actually surprised this doesn't happen more often.
      • bifrost 1827 days ago
        I should clarify this comment a bit since it seems to be the most controversial.

        When I say the attacker was persisted via CM, I'm pointing at his own notes, nodding to broken CM, the requirements of supporting the CM and availability of the config files.

        I also sanity checked the sshd_config file on my systems, they're all set to a sane default:

        "AuthorizedKeysFile .ssh/authorized_keys"

        FWIW I prefer to treat CM data as "valuable" information for this reason.

    • driminicus 1827 days ago
      Because the second tine was a dns hijack, not a network compromise. I'm a little fuzzy on the details, but it had something to do with cloudflares API not revoking some access token.

      Either way, a DNS hijack is not great, but not nearly as bad as the initial compromise.

      • bifrost 1827 days ago
        It wasn't CloudFlare's API not revoking a token, they just didn't revoke all the tokens. Basically human error.

        "The API key was known compromised in the original attack, and during the rebuild the key was theoretically replaced. However, unfortunately only personal keys were rotated, enabling the defacement."

      • KirinDave 1827 days ago
        See, I'd like to know more too.
    • Arathorn 1827 days ago
      The rebuilt infra wasn’t compromised; what happened was that we rotated the cloudflare API key whilst logged into CF with a personal account but then masquerading as the master admin user. Turns out that rotating the API key rotates your personal one, not the one you’re masquerading as, and we didn’t think to manually compare the secret before confirming it had the right value. Hence the attacker was able to briefly hijack DNS to their defacement site until we fixed it.

      We will write this up in a full postmortem in the next 1-2 weeks.

  • Arathorn 1827 days ago
    If it wasn’t clear, this article wasn’t written by the Matrix.org team, nor did the author discuss any of it with us to our knowledge.

    We’ll publish our own full post-mortem in the next 1-2 weeks.

    • Arathorn 1827 days ago
      also, reading this article more carefully, much of this just plain wrong:

      > One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system.

      Fwiw the infra that was compromised was not managed by Ansible; if it had been we would likely have spotted the malicious changes much sooner.

  • nisa 1827 days ago
    It's been a few years since I last used Saltstack but if you have access to the master you have instant root on all minions or did that somehow change? salt '*' cmd.run 'find / -delete' and game-over?
    • bifrost 1827 days ago
      Very true, however I'd rather have that problem than an ever multiplying number of user accounts on systems that can su/sudo.
      • verdverm 1827 days ago
        Make golden images with packer, or something similar, and then roll your fleet over.

        You should not be running package managers on production servers. Or any of the other things salt, ansible, chef, puppet can do.

        • yjftsjthsd-h 1827 days ago
          As in, no human should run a package manager in prod? (But salt/ansible/etc. running it is fine) Same idea as "if you're SSHing to prod, something is wrong" (where provisioning tools make all changes, logs are all aggregated and delivered in their own tool, and even debugging is built into the app or logging system).
          • verdverm 1827 days ago
            More as, you should not modify images running in production. By human or machine.

            Rather build new images and roll over the fleet. If you need to debug, remove from production (quarantine) and work on it there.

            Don't run master / agent setups for ansible / salt anymore. You can still use them for creating images, which are later turned into running VMs. Think about it like containers. Do you update the contents of your running containers, or log into your containers to make changes?

            Better yet, use OSes that cannot be modified.

            • Spivak 1827 days ago
              But golden images on Linux are, well, messy. It's very annoying to make a clean VM template without some post-provisioning like cloud-init. And for most shops if you're running cloud-init you could do that post-provisioning with Ansible or Salt. And since your images are built with Ansible/Salt in the first place you might as well just build each VM fresh and use the vendor's ISO. One less thing to maintain and update.

              Plus when you're in a pinch, which never happens of course, you can make changes without having to roll your VMs.

              I feel like Atomic distributions are basically the happy medium between the two worlds.

              • verdverm 1827 days ago
                It's a trade-off. The point is the matrix security lapse turned worse because they ran this master / agent setup. You can still use ansible (or similar), just do it localhost during the build process.

                Yea it's easier to not do these things, because good security posture takes work to set up. Once you're on the immutable train, you'll find it's not actually harder day to day. You learn to deal with issues in the pinch another way.

                On the point of building VMs fresh each time vs building golden images, you'll find you boot time reduced, your roll over more reliable and autoscaling more responsive. Why build the same thing dozens or hundreds of times? What happens if a remote package is updated in the middle of your upgrade? Does this sound messier to you?

        • bifrost 1827 days ago
          I can tell you, this is not something that is rare (humans in prod). Ansible often enables this behavior rather than removes it.
      • _frkl 1827 days ago
        How does saltstack do tasks that require root access? Use the root user directly?
        • bifrost 1827 days ago
          Correct, the minion runs as root and doesn't require interactive ssh access. Its controlled by a remote master that you ideally properly protect.
          • throwaway_391 1827 days ago
            So it's a less audited application than *SSH that the author is recommending over SSH because it doesn't require user authentication but runs in a daemon with root privs?
            • bifrost 1827 days ago
              Not quite. In this case Ansible is enabled by a user logging into the system; The user should not be allowed to login to the system in the first place. Ideally you want your configuration system indepenent of user logins. Ansible has its place, my argument is that it doesn't belong here.
  • ubercow13 1827 days ago
    Why is it considered safer to expose a VPN to the internet than SSH? Is it just that there is one exposed service for the organisation rather than one per machine?
    • bifrost 1827 days ago
      SSH tunneling is handy but if you want to push anything else over it, its a pain for the "layperson". You're not going to have a great time supporting people with it. I've done it, it sucks. Scripts and special SSH config files are the pits. VPNs are way easier, they can support multiple access levels and roles, are often not blocked by other people's packet filters and firewalls and the good ones can even validate that a host is in "compliance" before they're allowed onto the network.
    • closeparen 1827 days ago
      You can expose one SSH box per organization (a “bastion”) and deploy SSH configs to clients that make it look like you have direct access to the hosts behind it.
    • acct1771 1827 days ago
      That'd probably be a solid question that the people implementing WireGuard in Linux kernel/supporting that can cover.
  • krupan 1827 days ago
    Can anyone explain the Jenkins vulnerability that was used to initially gain access? Reading the CVEs didn't give me the impression that they enabled remote exploits
    • bifrost 1827 days ago
      My 5 second lazy summaries of the CVEs:

      CVE-2019-1003001, CVE-2019-1003002 -> Anyone with read access to Jenkins can own the build environment.

      CVE-2019-1003000 -> I didn't get a lot of the details on this but it basically looks like "broken sandboxing, you can run bad scripts".

      This is also a good resource: https://packetstormsecurity.com/files/152132/Jenkins-ACL-Byp...

  • zimbatm 1827 days ago
    The attacker gained network access through Jenkins.

    Don't deploy a public-facing Jenkins, especially if it has credentials attached to it. It's really hard to secure, especially if pull-requests can run arbitrary code on your agents.

    Jenkins / CI is the sudo access to most organizations.

    • bifrost 1826 days ago
      I agree with you 100% here, I would not deploy any CI publicly unless its heavily fenced off into "read only" territory.
  • r1ch 1827 days ago
    One thing I learned was where to modify the pageant source code (Windows equivalent of ssh-agent) to make my agent prompt before signing (with the default focus on "no"). This feels much safer and is a very minor inconvenience. I wonder why more agents don't have this built in.

    Example: https://twitter.com/R1CH_TL/status/1118559239084158977

  • forgotmypw 1826 days ago
    I'd like to take this opportunity to plug my in-development decentralized, distributed, completely open forum, using PGP as the "account" system, and text files as the data store.

    So any reasonably competent hacker can re-validate the entire forum's content and votes, reasonably quickly reimplement the whole thing, and/or fork the forum at any time.

    http://shitmyself.com/

    • ficklepickle 1826 days ago
      This is very interesting! I have so many questions. If you see this, kindly send me an email. It's in my profile. I love the idea!
    • bifrost 1826 days ago
      Very Cool! I'll check it out!
  • mjevans 1827 days ago
    That medium.com has a paywall and doesn't want to share content? (is what I learned)
  • inetknght 1827 days ago
    I have gone on some long verbal rants about the dark patterns (bordering on malicious behavior) exhibited by key agents such as SSH agent, GPG agent, Pageant, and the like.

    What can you learn from the compromise? Never use an agent. Kill it with fire^H^H^H^H -9.

    • bifrost 1827 days ago
      The attacker still would have gotten their key in. TBH if you kill the agent people are just going to copy their keys with no passphrases around. Ask me how I know...
      • inetknght 1826 days ago
        If only a password manager could add a sane agent and UI
        • bifrost 1826 days ago
          Someday we will have tools that save us from human problems!
    • nine_k 1827 days ago
      How about using hardware tokens instead? With a right setup, private keys never leave it.
      • inetknght 1827 days ago
        Hardware tokens are pretty alright until you need to use GPG-agent to enable their use.
      • bifrost 1827 days ago
        Hardware tokens are great but most people don't know how to use them, so they don't.
      • scurvy 1827 days ago
        Smart cards? They were designed for this.
        • bifrost 1827 days ago
          Ever seen anyone working in a Coffee shop using one? Me neither.

          Good security technology exists, the problem is that people don't want to use it because its easier to ignore it.

          • andrewshadura 1827 days ago
            I have, and I have myself used one until it was stolen from me with a bag it was in.
    • yjftsjthsd-h 1827 days ago
      Okay, I'll bite. What are you calling a dark pattern in assorted agents? Especially given that dark pattern implies intent to harm. (And I say this as someone looking at using an agent: If there's a gotcha, I'd like to know about it)
      • inetknght 1827 days ago
        1) SSH agent will cache your passphrase. While that's the whole purpose of SSH agent, remember that nothing is more insecure than an unlocked secret.

        2) SSH agent often starts automatically, frequently without user interaction (even if you specify `-i keyfile`). SSH client and DBus are both culprits here; there's also other culprits too.

        3) There are often multiple different agents installed on Linux desktop systems. For example, ssh-agent, gnome-keyring, seahorse, gpg-agent ... the list goes on. Good luck auditing that.

        4) Without `-i keyfile`, SSH will present to the remote server all keys, in sequence, cached in your agent (and will cause trouble with active firewalls from too many authentication attempts)

        5) If the keyfile you specified in `-i keyfile` does not authenticate, then SSH will fall-back to using keys cached in your agent. That's especially frustrating since you might want to know that the key you specified was rejected!

        6) Removing the executable flag from ssh-agent is not a permanent solution: updates will often overwrite the program with a new file and reset the executable bit. Obviously the same goes for renaming the program (that one causes a hell of a lot more noise in logs btw; programs seem to complain more if a program can't be found instead of just not being executable)

        7) See also (related) concerns I posted about GPG agent on Stack Overflow [1]

        Last, but not least: 8) Hope you don't use a system where agent forwarding or agent caching is turned on in the system settings!

        [1] https://stackoverflow.com/q/47273922/1111557

        • naniwaduni 1827 days ago
          > 1) SSH agent will cache your passphrase. While that's the whole purpose of SSH agent, remember that nothing is more insecure than an unlocked secret.

          There's one thing more insecure than an unlocked secret: a "secret" sitting in plain text on the filesystem.

          Which is a common outcome if you advise people against using an agent and they don't share your opsec priorities.

        • tialaramex 1827 days ago
          For (4) and (5) set IdentitiesOnly as well as Identity (IdentityFile or IdentityAgent). This tells SSH that you've specified the exact identity you want used, not just a hint at an identity that might help.

          Note that having "trouble with active firewalls" is a sign that the security posture is garbage, those aren't "authentication attempts" the SSH protocol explicitly has a step where the client proposes authentication keys it's interested in trying WITHOUT authenticating, counting each such key as an "attempt" is like counting up how many keys a person has in their pockets and arresting them for attempted burglary if they have more than ten different keys.

          • inetknght 1827 days ago
            That's an interesting perspective. Nonetheless, if you load up your agent with a dozen keys and try to log in to a remote server, it will deny you after (typically) three keys being presented. That will show up in the logs as a failed login attempt. Something such as fail2ban will then spot failed login attempts and take action.

            Edit: +1 about IdentitesOnly and Identity. I use that in my ssh_config, particularly when I need to alias one name to another.

            • tialaramex 1826 days ago
              By default OpenSSH _logs_ after three keys but it only gives up (if you don't have a fail2ban script blowing everything up) after six keys. And you can reconfigure the server as appropriate, unlike whatever this "active firewall" (which by the sound of things may just be a fail2ban script) does.

              This is a bad fail2ban script, it's inconveniencing real users rather than targeting the bots you care about since they are doing password guessing anyway.

        • andrewshadura 1827 days ago
          Seahorse is not an agent.
          • inetknght 1827 days ago
            https://linux.die.net/man/1/seahorse-daemon

            Seahorse is a GNOME application for managing encryption keys. This is the daemon program which provides services to other parts of Seahorse, and shares your keys over the network if so desired.

            It doesn't have agent in its name but it sure sounds like the behavior of a key agent.

      • giggles_giggles 1827 days ago
        If you use ssh-agent with default settings it's very easy to accidentally expose access to systems you would not expect via the agent.

        This seems to be a good post about the problem: https://heipei.io/2015/02/26/SSH-Agent-Forwarding-considered...

        The key takeaway is that using ssh -A with default settings allows root on the system you've connected to "to impersonate you to any host as long as you’re connected".

        • yjftsjthsd-h 1827 days ago
          Ah, okay; so it's not the agent that's the problem, but the agent with forwarding. That is a fair point, and probably needs saner defaults or messaging. That said, since I don't use forwarding that should be fine for me.
          • deathanatos 1827 days ago
            Agent forwarding defaults to off, AFAIK. You have to ask for it specifically by either requesting it at the CLI with -A, or adding it to .ssh/config.

            It would be nice if people understood the consequences of it, and I do find in my conversations with developers that people generally do not understand that by forwarding the agent, anyone with sufficient access on the remote can use the agent. (E.g., another user w/ sudo.)

            • tialaramex 1827 days ago
              It would be nice if GUI desktop environments that already have miscellaneous notification APIs would give you a transient notification when the agent gets a request. That's a low impact change (you can just ignore it) that highlights to users what's actually going on. It improves security passively by giving users awareness.

              Your agent _should_ always be where you are (ie not inside a container, a bastion host, or whatever else) because otherwise that means you aren't actually in possession of the key material and there's plenty of opportunity for much _worse_ surprises than with SSH agents if somebody else has the key material.

              Because it's where you are, and you're probably not on a 1970s video terminal link but a laptop or something, the agent could just ask you to OK each request out of band, e.g. popping up a "Really log into machine X?" request. Once such a mechanism existed it could be refined (should it let you say "Yes always to requests from machine X" ? How about "Yes always for the next five minutes" ?) and if necessary SSH auth could even be tweaked to better support any real world behaviours that are popular (e.g. I don't recall off the top of my head if the agent can tell from what it's signing either where you're signing in, or where that sign-in was used, but the binding mechanism in SSH auth could certainly enforce either of those if they're determined to be important and don't exist today)

              • inetknght 1827 days ago
                It's not just giving you a transient notification with the agent gets a request. I'd go several steps further:

                1) clearly display what "local" machine is making the connection, under what user

                2) clearly display what remote machine is the local connecting, to what user

                3) allow me to select a specific key for that connection pair, and only present one key to the remote

                4) if the key is unlocked (or, gasp not passphrase-protected), then allow me to accept/deny the agent request

                5) give me some mechanism to permanently disable the agent for my user if I decide I don't want to risk some software "accidentally" forwarding an agent (pebcak, bug, malice, whatever)

          • schoen 1827 days ago
            And probably it should have been described as an antipattern rather than a dark pattern.
            • inetknght 1827 days ago
              I think you're correct, that would be a better description. Unfortunately I think it's too late to edit my original comment: the edit link doesn't show up for me.
            • bifrost 1827 days ago
              I'll vote yes for this one :)
        • bifrost 1827 days ago
          Yep! Or worse if you're in a VM/Container...