* no detailed analysis of how the attack was undertaken. Its not even clear how the attacker managed to get in (was it a publicly exposed Jenkins? vulnerable bastion? what?)
* no analysis of what the existing matrix.org security perimeter looked like or how it could be made better.
* repetition of security tropes. Use VPN. Use Github Enterprise (wait wtf? Why not private repos in Github?). Don't use Ansible, use salt.
Ridiculous. I was looking forward to a nice long read about how this breach was undertaken. Hugely disappointed.
If you click through to the GH Issues I linked to there are some pretty good data points as to what happened. I didn't feel the need to copypasta.
But yes, publicly exposed jenkins and repos lead to the compromise, not an uncommon story unfortunately.
Perimeter - I didn't see much evidence of one existing and I didn't go probing their networks to find out.
Security tropes are real for a reason, you don't have to believe me though.
Private repos in GitHub are still publicly hosted and are orders of magnitude easier to get into than having an in perimeter repo. They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens.
> They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens
Can you provide some actual instances of this happening? Genuinely curious, as my org is currently migrating from enterprise to cloud.
Thats unrelated to github though. It sounds like the person did a git clone and then created a new repo and pushed it. You could do that with a self hosted git repo as well. To stop that you would have to have your git server block logins from non company machines and have some serious logging on all company machines to stop anyone moving it off via usb
I've never claimed it was a "one stop", but it certainly keeps the random internet users to a minimum.
And yes, using GHE or self hosted GitLab doesn't make up for storing secrets, but it at least keeps them out of the public eye so the effects are less brutal. Its still bad to store secrets in a code repository.
My whole point is that you can reduce risks easily, yet some people don't for some reason.
* this idiot claimed "Ansible was used to keep the attacker in the system" which in all reality Ansible did what it was supposed to by altering the correct authorized_keys file and the attacker leveraged an old default in the sshd config. This is a sshd config issue, not Ansible.
The sales-pitch for Salt (against Ansible) is ridiculous and misguided.
I just checked out the Salt SSH module and even if they used salt they would still have this issue. Then answer here is to not use the default /etc/ssh/sshd_config value of #AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2. Uncomment and remove authorized_keys2.
Why aren't people reporting the fact that Matrix.org actually lost control of their network a second time within hours of their first all clear sounding?
I feel like this is an important part of the story for anyone looking for teachable infosec moments.
I guess I technically glossed over that but I did say "One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system".
The attacker was persisted via CM and their public repo, I'm actually surprised this doesn't happen more often.
I should clarify this comment a bit since it seems to be the most controversial.
When I say the attacker was persisted via CM, I'm pointing at his own notes, nodding to broken CM, the requirements of supporting the CM and availability of the config files.
I also sanity checked the sshd_config file on my systems, they're all set to a sane default:
"AuthorizedKeysFile .ssh/authorized_keys"
FWIW I prefer to treat CM data as "valuable" information for this reason.
Because the second tine was a dns hijack, not a network compromise. I'm a little fuzzy on the details, but it had something to do with cloudflares API not revoking some access token.
Either way, a DNS hijack is not great, but not nearly as bad as the initial compromise.
It wasn't CloudFlare's API not revoking a token, they just didn't revoke all the tokens. Basically human error.
"The API key was known compromised in the original attack, and during the rebuild the key was theoretically replaced. However, unfortunately only personal keys were rotated, enabling the defacement."
The rebuilt infra wasn’t compromised; what happened was that we rotated the cloudflare API key whilst logged into CF with a personal account but then masquerading as the master admin user. Turns out that rotating the API key rotates your personal one, not the one you’re masquerading as, and we didn’t think to manually compare the secret before confirming it had the right value. Hence the attacker was able to briefly hijack DNS to their defacement site until we fixed it.
We will write this up in a full postmortem in the next 1-2 weeks.
It's been a few years since I last used Saltstack but if you have access to the master you have instant root on all minions or did that somehow change? salt '*' cmd.run 'find / -delete' and game-over?
As in, no human should run a package manager in prod? (But salt/ansible/etc. running it is fine) Same idea as "if you're SSHing to prod, something is wrong" (where provisioning tools make all changes, logs are all aggregated and delivered in their own tool, and even debugging is built into the app or logging system).
More as, you should not modify images running in production. By human or machine.
Rather build new images and roll over the fleet. If you need to debug, remove from production (quarantine) and work on it there.
Don't run master / agent setups for ansible / salt anymore. You can still use them for creating images, which are later turned into running VMs. Think about it like containers. Do you update the contents of your running containers, or log into your containers to make changes?
But golden images on Linux are, well, messy. It's very annoying to make a clean VM template without some post-provisioning like cloud-init. And for most shops if you're running cloud-init you could do that post-provisioning with Ansible or Salt. And since your images are built with Ansible/Salt in the first place you might as well just build each VM fresh and use the vendor's ISO. One less thing to maintain and update.
Plus when you're in a pinch, which never happens of course, you can make changes without having to roll your VMs.
I feel like Atomic distributions are basically the happy medium between the two worlds.
It's a trade-off. The point is the matrix security lapse turned worse because they ran this master / agent setup. You can still use ansible (or similar), just do it localhost during the build process.
Yea it's easier to not do these things, because good security posture takes work to set up. Once you're on the immutable train, you'll find it's not actually harder day to day. You learn to deal with issues in the pinch another way.
On the point of building VMs fresh each time vs building golden images, you'll find you boot time reduced, your roll over more reliable and autoscaling more responsive. Why build the same thing dozens or hundreds of times? What happens if a remote package is updated in the middle of your upgrade? Does this sound messier to you?
So it's a less audited application than *SSH that the author is recommending over SSH because it doesn't require user authentication but runs in a daemon with root privs?
Not quite.
In this case Ansible is enabled by a user logging into the system; The user should not be allowed to login to the system in the first place. Ideally you want your configuration system indepenent of user logins.
Ansible has its place, my argument is that it doesn't belong here.
Why is it considered safer to expose a VPN to the internet than SSH? Is it just that there is one exposed service for the organisation rather than one per machine?
SSH tunneling is handy but if you want to push anything else over it, its a pain for the "layperson". You're not going to have a great time supporting people with it. I've done it, it sucks. Scripts and special SSH config files are the pits. VPNs are way easier, they can support multiple access levels and roles, are often not blocked by other people's packet filters and firewalls and the good ones can even validate that a host is in "compliance" before they're allowed onto the network.
You can expose one SSH box per organization (a “bastion”) and deploy SSH configs to clients that make it look like you have direct access to the hosts behind it.
Can anyone explain the Jenkins vulnerability that was used to initially gain access? Reading the CVEs didn't give me the impression that they enabled remote exploits
The attacker gained network access through Jenkins.
Don't deploy a public-facing Jenkins, especially if it has credentials attached to it. It's really hard to secure, especially if pull-requests can run arbitrary code on your agents.
Jenkins / CI is the sudo access to most organizations.
One thing I learned was where to modify the pageant source code (Windows equivalent of ssh-agent) to make my agent prompt before signing (with the default focus on "no"). This feels much safer and is a very minor inconvenience. I wonder why more agents don't have this built in.
I'd like to take this opportunity to plug my in-development decentralized, distributed, completely open forum, using PGP as the "account" system, and text files as the data store.
So any reasonably competent hacker can re-validate the entire forum's content and votes, reasonably quickly reimplement the whole thing, and/or fork the forum at any time.
I have gone on some long verbal rants about the dark patterns (bordering on malicious behavior) exhibited by key agents such as SSH agent, GPG agent, Pageant, and the like.
What can you learn from the compromise? Never use an agent. Kill it with fire^H^H^H^H -9.
The attacker still would have gotten their key in.
TBH if you kill the agent people are just going to copy their keys with no passphrases around. Ask me how I know...
Okay, I'll bite. What are you calling a dark pattern in assorted agents? Especially given that dark pattern implies intent to harm. (And I say this as someone looking at using an agent: If there's a gotcha, I'd like to know about it)
1) SSH agent will cache your passphrase. While that's the whole purpose of SSH agent, remember that nothing is more insecure than an unlocked secret.
2) SSH agent often starts automatically, frequently without user interaction (even if you specify `-i keyfile`). SSH client and DBus are both culprits here; there's also other culprits too.
3) There are often multiple different agents installed on Linux desktop systems. For example, ssh-agent, gnome-keyring, seahorse, gpg-agent ... the list goes on. Good luck auditing that.
4) Without `-i keyfile`, SSH will present to the remote server all keys, in sequence, cached in your agent (and will cause trouble with active firewalls from too many authentication attempts)
5) If the keyfile you specified in `-i keyfile` does not authenticate, then SSH will fall-back to using keys cached in your agent. That's especially frustrating since you might want to know that the key you specified was rejected!
6) Removing the executable flag from ssh-agent is not a permanent solution: updates will often overwrite the program with a new file and reset the executable bit. Obviously the same goes for renaming the program (that one causes a hell of a lot more noise in logs btw; programs seem to complain more if a program can't be found instead of just not being executable)
7) See also (related) concerns I posted about GPG agent on Stack Overflow [1]
Last, but not least:
8) Hope you don't use a system where agent forwarding or agent caching is turned on in the system settings!
> 1) SSH agent will cache your passphrase. While that's the whole purpose of SSH agent, remember that nothing is more insecure than an unlocked secret.
There's one thing more insecure than an unlocked secret: a "secret" sitting in plain text on the filesystem.
Which is a common outcome if you advise people against using an agent and they don't share your opsec priorities.
For (4) and (5) set IdentitiesOnly as well as Identity (IdentityFile or IdentityAgent). This tells SSH that you've specified the exact identity you want used, not just a hint at an identity that might help.
Note that having "trouble with active firewalls" is a sign that the security posture is garbage, those aren't "authentication attempts" the SSH protocol explicitly has a step where the client proposes authentication keys it's interested in trying WITHOUT authenticating, counting each such key as an "attempt" is like counting up how many keys a person has in their pockets and arresting them for attempted burglary if they have more than ten different keys.
That's an interesting perspective. Nonetheless, if you load up your agent with a dozen keys and try to log in to a remote server, it will deny you after (typically) three keys being presented. That will show up in the logs as a failed login attempt. Something such as fail2ban will then spot failed login attempts and take action.
Edit: +1 about IdentitesOnly and Identity. I use that in my ssh_config, particularly when I need to alias one name to another.
By default OpenSSH _logs_ after three keys but it only gives up (if you don't have a fail2ban script blowing everything up) after six keys. And you can reconfigure the server as appropriate, unlike whatever this "active firewall" (which by the sound of things may just be a fail2ban script) does.
This is a bad fail2ban script, it's inconveniencing real users rather than targeting the bots you care about since they are doing password guessing anyway.
Seahorse is a GNOME application for managing encryption keys. This is the daemon program which provides services to other parts of Seahorse, and shares your keys over the network if so desired.
It doesn't have agent in its name but it sure sounds like the behavior of a key agent.
The key takeaway is that using ssh -A with default settings allows root on the system you've connected to "to impersonate you to any host as long as you’re connected".
Ah, okay; so it's not the agent that's the problem, but the agent with forwarding. That is a fair point, and probably needs saner defaults or messaging. That said, since I don't use forwarding that should be fine for me.
Agent forwarding defaults to off, AFAIK. You have to ask for it specifically by either requesting it at the CLI with -A, or adding it to .ssh/config.
It would be nice if people understood the consequences of it, and I do find in my conversations with developers that people generally do not understand that by forwarding the agent, anyone with sufficient access on the remote can use the agent. (E.g., another user w/ sudo.)
It would be nice if GUI desktop environments that already have miscellaneous notification APIs would give you a transient notification when the agent gets a request. That's a low impact change (you can just ignore it) that highlights to users what's actually going on. It improves security passively by giving users awareness.
Your agent _should_ always be where you are (ie not inside a container, a bastion host, or whatever else) because otherwise that means you aren't actually in possession of the key material and there's plenty of opportunity for much _worse_ surprises than with SSH agents if somebody else has the key material.
Because it's where you are, and you're probably not on a 1970s video terminal link but a laptop or something, the agent could just ask you to OK each request out of band, e.g. popping up a "Really log into machine X?" request. Once such a mechanism existed it could be refined (should it let you say "Yes always to requests from machine X" ? How about "Yes always for the next five minutes" ?) and if necessary SSH auth could even be tweaked to better support any real world behaviours that are popular (e.g. I don't recall off the top of my head if the agent can tell from what it's signing either where you're signing in, or where that sign-in was used, but the binding mechanism in SSH auth could certainly enforce either of those if they're determined to be important and don't exist today)
It's not just giving you a transient notification with the agent gets a request. I'd go several steps further:
1) clearly display what "local" machine is making the connection, under what user
2) clearly display what remote machine is the local connecting, to what user
3) allow me to select a specific key for that connection pair, and only present one key to the remote
4) if the key is unlocked (or, gasp not passphrase-protected), then allow me to accept/deny the agent request
5) give me some mechanism to permanently disable the agent for my user if I decide I don't want to risk some software "accidentally" forwarding an agent (pebcak, bug, malice, whatever)
I think you're correct, that would be a better description. Unfortunately I think it's too late to edit my original comment: the edit link doesn't show up for me.
* no detailed analysis of how the attack was undertaken. Its not even clear how the attacker managed to get in (was it a publicly exposed Jenkins? vulnerable bastion? what?)
* no analysis of what the existing matrix.org security perimeter looked like or how it could be made better.
* repetition of security tropes. Use VPN. Use Github Enterprise (wait wtf? Why not private repos in Github?). Don't use Ansible, use salt.
Ridiculous. I was looking forward to a nice long read about how this breach was undertaken. Hugely disappointed.
But yes, publicly exposed jenkins and repos lead to the compromise, not an uncommon story unfortunately.
Perimeter - I didn't see much evidence of one existing and I didn't go probing their networks to find out.
Security tropes are real for a reason, you don't have to believe me though.
Private repos in GitHub are still publicly hosted and are orders of magnitude easier to get into than having an in perimeter repo. They've leaked before and they'll keep on leaking. GitHub even made it harder for people to fork private repos to their own public accounts but it still happens.
Can you provide some actual instances of this happening? Genuinely curious, as my org is currently migrating from enterprise to cloud.
Here's a good one from reddit: https://www.reddit.com/r/github/comments/9odnvw/someone_fork...
Its also discussed reasonably well in the infosec community. Basically GitHub is a great place to find other people's passwords and API keys.
You mean the past-tense verb led, not its metallic homonym lead. :)
Firstly having a private network for your infrastructure isn't a one stop solution for keeping attackers out.
Secondly using Github Enterprise or self hosted GitLab doesn't make up for storing secrets in Git.
Looking forwards to the proper write up.
And yes, using GHE or self hosted GitLab doesn't make up for storing secrets, but it at least keeps them out of the public eye so the effects are less brutal. Its still bad to store secrets in a code repository.
My whole point is that you can reduce risks easily, yet some people don't for some reason.
The sales-pitch for Salt (against Ansible) is ridiculous and misguided.
I just checked out the Salt SSH module and even if they used salt they would still have this issue. Then answer here is to not use the default /etc/ssh/sshd_config value of #AuthorizedKeysFile .ssh/authorized_keys .ssh/authorized_keys2. Uncomment and remove authorized_keys2.
I feel like this is an important part of the story for anyone looking for teachable infosec moments.
When I say the attacker was persisted via CM, I'm pointing at his own notes, nodding to broken CM, the requirements of supporting the CM and availability of the config files.
I also sanity checked the sshd_config file on my systems, they're all set to a sane default:
"AuthorizedKeysFile .ssh/authorized_keys"
FWIW I prefer to treat CM data as "valuable" information for this reason.
Either way, a DNS hijack is not great, but not nearly as bad as the initial compromise.
"The API key was known compromised in the original attack, and during the rebuild the key was theoretically replaced. However, unfortunately only personal keys were rotated, enabling the defacement."
We will write this up in a full postmortem in the next 1-2 weeks.
We’ll publish our own full post-mortem in the next 1-2 weeks.
> One of the more interesting pieces of this was how Ansible was used to keep the attacker in the system.
Fwiw the infra that was compromised was not managed by Ansible; if it had been we would likely have spotted the malicious changes much sooner.
You should not be running package managers on production servers. Or any of the other things salt, ansible, chef, puppet can do.
Rather build new images and roll over the fleet. If you need to debug, remove from production (quarantine) and work on it there.
Don't run master / agent setups for ansible / salt anymore. You can still use them for creating images, which are later turned into running VMs. Think about it like containers. Do you update the contents of your running containers, or log into your containers to make changes?
Better yet, use OSes that cannot be modified.
Plus when you're in a pinch, which never happens of course, you can make changes without having to roll your VMs.
I feel like Atomic distributions are basically the happy medium between the two worlds.
Yea it's easier to not do these things, because good security posture takes work to set up. Once you're on the immutable train, you'll find it's not actually harder day to day. You learn to deal with issues in the pinch another way.
On the point of building VMs fresh each time vs building golden images, you'll find you boot time reduced, your roll over more reliable and autoscaling more responsive. Why build the same thing dozens or hundreds of times? What happens if a remote package is updated in the middle of your upgrade? Does this sound messier to you?
CVE-2019-1003001, CVE-2019-1003002 -> Anyone with read access to Jenkins can own the build environment.
CVE-2019-1003000 -> I didn't get a lot of the details on this but it basically looks like "broken sandboxing, you can run bad scripts".
This is also a good resource: https://packetstormsecurity.com/files/152132/Jenkins-ACL-Byp...
Don't deploy a public-facing Jenkins, especially if it has credentials attached to it. It's really hard to secure, especially if pull-requests can run arbitrary code on your agents.
Jenkins / CI is the sudo access to most organizations.
Example: https://twitter.com/R1CH_TL/status/1118559239084158977
So any reasonably competent hacker can re-validate the entire forum's content and votes, reasonably quickly reimplement the whole thing, and/or fork the forum at any time.
http://shitmyself.com/
What can you learn from the compromise? Never use an agent. Kill it with fire^H^H^H^H -9.
Good security technology exists, the problem is that people don't want to use it because its easier to ignore it.
2) SSH agent often starts automatically, frequently without user interaction (even if you specify `-i keyfile`). SSH client and DBus are both culprits here; there's also other culprits too.
3) There are often multiple different agents installed on Linux desktop systems. For example, ssh-agent, gnome-keyring, seahorse, gpg-agent ... the list goes on. Good luck auditing that.
4) Without `-i keyfile`, SSH will present to the remote server all keys, in sequence, cached in your agent (and will cause trouble with active firewalls from too many authentication attempts)
5) If the keyfile you specified in `-i keyfile` does not authenticate, then SSH will fall-back to using keys cached in your agent. That's especially frustrating since you might want to know that the key you specified was rejected!
6) Removing the executable flag from ssh-agent is not a permanent solution: updates will often overwrite the program with a new file and reset the executable bit. Obviously the same goes for renaming the program (that one causes a hell of a lot more noise in logs btw; programs seem to complain more if a program can't be found instead of just not being executable)
7) See also (related) concerns I posted about GPG agent on Stack Overflow [1]
Last, but not least: 8) Hope you don't use a system where agent forwarding or agent caching is turned on in the system settings!
[1] https://stackoverflow.com/q/47273922/1111557
There's one thing more insecure than an unlocked secret: a "secret" sitting in plain text on the filesystem.
Which is a common outcome if you advise people against using an agent and they don't share your opsec priorities.
Note that having "trouble with active firewalls" is a sign that the security posture is garbage, those aren't "authentication attempts" the SSH protocol explicitly has a step where the client proposes authentication keys it's interested in trying WITHOUT authenticating, counting each such key as an "attempt" is like counting up how many keys a person has in their pockets and arresting them for attempted burglary if they have more than ten different keys.
Edit: +1 about IdentitesOnly and Identity. I use that in my ssh_config, particularly when I need to alias one name to another.
This is a bad fail2ban script, it's inconveniencing real users rather than targeting the bots you care about since they are doing password guessing anyway.
Seahorse is a GNOME application for managing encryption keys. This is the daemon program which provides services to other parts of Seahorse, and shares your keys over the network if so desired.
It doesn't have agent in its name but it sure sounds like the behavior of a key agent.
This seems to be a good post about the problem: https://heipei.io/2015/02/26/SSH-Agent-Forwarding-considered...
The key takeaway is that using ssh -A with default settings allows root on the system you've connected to "to impersonate you to any host as long as you’re connected".
It would be nice if people understood the consequences of it, and I do find in my conversations with developers that people generally do not understand that by forwarding the agent, anyone with sufficient access on the remote can use the agent. (E.g., another user w/ sudo.)
Your agent _should_ always be where you are (ie not inside a container, a bastion host, or whatever else) because otherwise that means you aren't actually in possession of the key material and there's plenty of opportunity for much _worse_ surprises than with SSH agents if somebody else has the key material.
Because it's where you are, and you're probably not on a 1970s video terminal link but a laptop or something, the agent could just ask you to OK each request out of band, e.g. popping up a "Really log into machine X?" request. Once such a mechanism existed it could be refined (should it let you say "Yes always to requests from machine X" ? How about "Yes always for the next five minutes" ?) and if necessary SSH auth could even be tweaked to better support any real world behaviours that are popular (e.g. I don't recall off the top of my head if the agent can tell from what it's signing either where you're signing in, or where that sign-in was used, but the binding mechanism in SSH auth could certainly enforce either of those if they're determined to be important and don't exist today)
1) clearly display what "local" machine is making the connection, under what user
2) clearly display what remote machine is the local connecting, to what user
3) allow me to select a specific key for that connection pair, and only present one key to the remote
4) if the key is unlocked (or, gasp not passphrase-protected), then allow me to accept/deny the agent request
5) give me some mechanism to permanently disable the agent for my user if I decide I don't want to risk some software "accidentally" forwarding an agent (pebcak, bug, malice, whatever)