The current solution in this space, that actually works really well at scale, is ElastAlert[0]. The problem is that ElastAlert is kind of a mess to work with. Lots of documentation, but you need to get into the weeds with it to figure out how it really functions.
Once you get it going it's a great tool. Scaling it out (we run hundreds of rules pretty frequently - upwards of 15 times an hour) is just standing up more instances with their own separate rules.
Heard the same about elastalert that it is difficult to manage. Any idea how much time/effort is spent per day/week/month to manage the elastalert rules and what level of expertise is required?
I always find that Elastic assumes decent familiarity with their products even when "introducing" them. Everything looks beautiful but I can't quite tell what different tools do.
A SIEM is one of those things where you'll know if you need it. Most of the companies I know using a SIEM do it because audit and compliance requires them to have one.
Just heard about this today at reinforce conf. Looks pretty interesting and integrates well with cloud logs or on prem. Definitely going to look into it more.
Just a heads up; there are some rough edges. That said to get it going you can just do a pull and then a `make run`. It's got zero auth in this mode, but if it's ephemeral whatever.
Seems like the natural evolution of the already popular ELK stack. I hope they add popular siem features like archiving, alerting, central configuration management etc. I'll stick with graylog for now.
I grew up on Splunk and can’t seem to figure out how to get the same level of aggregations and analysis of ad hoc data out of ELK. Sometimes I think I’d be better served with Jupyter and Spark or similar.
Thats because most of the ElasticSearch data model is materialized indexes. You need to reindex (or use one of the other ops which amounts to building a composite index) to create different aggregates. Otherwise you need to use constructed JSON queries instead of adhoc lucene string searches to build the more complicated searches on those fields. Kibana provides tools that can help visualize building those if you don't do it from scratch, but its definitely a different workflow than Splunk or something implementing a more traditional query language.
In my day job I'm a security consultant and I work mostly with SIEM technologies. The other comments are right, but maybe I can give a bigger view. Basically you send syslog from your Linux servers, Security Event logs from your Windows server, access logs from your critical infrastructure, authentication logs from Active Directory, IDS/IPS and firewall logs from your firewalls, etc. The SIEM collects all this data, but what separates a SIEM from a simple log aggregator is the intelligence it uses. Many popular SIEMs have rules you can define (or are pre-defined) that fire alerts when a potential security breach is detected.
For example, if someone hacks your Internet-facing web server, your IDS might detect that. They then brute force the password to your production database server, which Active Directory might see. They then use nmap to trace your internal network, which would show up on your internal firewalls. Then they hop server-to-server until they get to a critical server. They then download a payload, infect that server (which your AV might pick up) and start exfiltrating data (which the firewalls and proxy might pick up).
You have all of these security tools, but without some intelligence linking all the logs together and correlating the data, you're stuck tailing these logs individually, hoping you catch the right log at the right time and can remember everything you've ever seen. And we're talking tens of thousands of events every second. A SIEM takes all of that data, does the searching for you, correlates all of the events across different security technologies and vendors, and alerts you when it detects someone doing something they should not be doing.
Popular SIEMs are tools like Splunk, QRadar, ArcSight, LogRhythm, etc.
It is two technologies combined: SIM (security information management) & SEM (security event management) == SIEM
The SIM came from the requirement to collect events (operations).
SEM stems from the desire to monitor / detect (security).
When we (the operators & security people) realized we where both collecting sort of the same data, the plan was hedged to combine resources and build something that satisfied both our needs. Marketing realized this as a great way to pitch a story that actually made sense to everyone involved: the SIEM market exploded.
It's more than just log management. SIEMs usually implies a case management system and some kind of rules based system for correlating different events into a case/incident, thus the name.
Search has been suffering under Elastic for a long time. Only a small percentage of Elasticsearch users use it for search. Then again, Elastic employs many Lucene committers, so they indirectly help search by being a major maintainer of Lucene.
If you want search, most of the NLP crowd is using Solr.
The current solution in this space, that actually works really well at scale, is ElastAlert[0]. The problem is that ElastAlert is kind of a mess to work with. Lots of documentation, but you need to get into the weeds with it to figure out how it really functions.
Once you get it going it's a great tool. Scaling it out (we run hundreds of rules pretty frequently - upwards of 15 times an hour) is just standing up more instances with their own separate rules.
[0] https://github.com/Yelp/elastalert
https://mozdef.readthedocs.io/en/latest/
They have a set of docker containers which I find very handy for spinning up deploy specific logging sinks or full on SIEMs.
ELK wants you to massage the data more first.
For example, if someone hacks your Internet-facing web server, your IDS might detect that. They then brute force the password to your production database server, which Active Directory might see. They then use nmap to trace your internal network, which would show up on your internal firewalls. Then they hop server-to-server until they get to a critical server. They then download a payload, infect that server (which your AV might pick up) and start exfiltrating data (which the firewalls and proxy might pick up).
You have all of these security tools, but without some intelligence linking all the logs together and correlating the data, you're stuck tailing these logs individually, hoping you catch the right log at the right time and can remember everything you've ever seen. And we're talking tens of thousands of events every second. A SIEM takes all of that data, does the searching for you, correlates all of the events across different security technologies and vendors, and alerts you when it detects someone doing something they should not be doing.
Popular SIEMs are tools like Splunk, QRadar, ArcSight, LogRhythm, etc.
The SIM came from the requirement to collect events (operations).
SEM stems from the desire to monitor / detect (security).
When we (the operators & security people) realized we where both collecting sort of the same data, the plan was hedged to combine resources and build something that satisfied both our needs. Marketing realized this as a great way to pitch a story that actually made sense to everyone involved: the SIEM market exploded.
Seems a logical progression from Kibana and Logstash - but sometimes I worry search will suffer for all this other stuff.
If you want search, most of the NLP crowd is using Solr.