Category: SIEM

Understanding Your License Model

SIEM license models can vary significantly. Some are simply based on the average ingested data per day, while others can have multiple factors such as ingested data per day, amount of end users, and the amount of devices it collects data from. Regardless of the license model, it’s critical to understand how it works to ensure you don’t under-allocate sufficient funds for it. A misunderstanding of your license model can unexpectedly consume more security budget than anticipated, and thus increase risk to your organization by limiting resources available for both the SIEM and other security services.

Additionally, as most companies are constantly growing and changing, it’s pivotal to understand how the license model can be augmented, changed, and what the penalties are for any violations.

While the simpler the license model the better, there’s nothing wrong with a license model with various factors as long as it’s well understood and meets your organization’s requirements. After a requirements gathering exercise, you should be able to tell your vendor the expected ingestion rates per day, how many users there will be, and the expected growth rates.

There are other less-obvious factors that can also significantly affect license models. Two often overlooked factors are how the vendor charges for filtering/dropping unneeded data, and if the ingested data rates are based on raw or aggregated/coalesced amounts. For example, if you’re planning on dropping a significant amount of data by the Processing Layer, Product A (which doesn’t charge for dropped data) would have lower license costs than Product B (which can drop data, but includes the dropped amount in license costs), all else equal. Product C, which aggregates/coalesces data and determines license costs based on the aggregated/coalesced EPS, would have lower license costs than Product D, which aggregates/coalesces data but determines license costs based on raw EPS rates, all else equal.

If you’re comparing different SIEMs, you should ensure that you’re performing an accurate comparison, as SIEMs can vary significantly. A license model for a full SIEM solution from Company A is likely to be more expensive than a log management-only solution from Company B.

SIEMs can be expensive and consume a significant portion of your security budget. Misunderstanding your requirements and then signing a contract with a license model that’s unclear or difficult to understand is a major risk. Reduce that risk by spending the resources necessary to understand it and choose one that aligns best with your organization.

Timestamp Management

One of the most critical components of a SIEM environment is accurate timestamps. In order to perform any investigation, analysts will need to know when all the applicable events occurred. Given that SIEM applications can have many timestamps, it’s critical for staff to know which are used for what. Improper use of timestamps can have many repercussions and add risk to your organization.

The two most important timestamps in an event are the time which the event was generated on the data source, and the time the event was received by the SIEM. The time which the event was generated on the data source is commonly known as the Event Time. The time the event was received by the SIEM is commonly known as the Receipt Time.

A common question posed by junior staff is why they can’t find the events they’re looking for. Once I rule out a case-sensitivity searching issue, I then check to see if the correct dates/times are being used in the search. For example, a brute-force alert was generated after a user generated 50 failed logins. While the alert was generated in the past 24 hours, analysts can’t find any failed logins for the user that triggered the alert. Upon closer inspection, we see that the alert was generated off the Receipt Time, but the Event Time timestamps are from one week ago. After a quick investigation, we discovered that the system owner took the system offline last week, and just reconnected it to the network 24 hours ago.

Timestamp discrepancies can be even more severe when using dashboards or monitoring for alerts. For example, a SOC uses a dashboard that monitors for particular IPS alerts that were setup to detect an exploit attempt of a new, high-priority vulnerability. The SIEM has been experiencing caching lately, and events are currently delayed 28 hours. The dashboard the SOC analysts are using is configured to use the Event Time timestamp and shows events from the past 24 hours. An alert that occurred 26 hours ago finally clears the cache queue but fails to show up on the dashboard because it doesn’t match the timestamp criteria. Thus, the critical alert goes unnoticed.

While this can be a severe issue, the fix is simple. The SOC analysts could configure the dashboard to use the Receipt Time timestamp instead, so all alerts received in the past 24 hours would show up and be noticeable regardless of when it was generated. Dashboards in general should have both timestamps shown, and staff in general should be aware of the various timestamps used by the SIEM. Another advantage of using both timestamps is allowing staff to be aware when there are delays in receiving log data. Minor delays can be expected, but significant delays can be caught immediately and actioned before large caches are formed.

Some timestamp discrepancies can be normal, especially in large organizations. You’re bound to have at least some out of thousands of servers with misconfigured dates, development machines logging to the SIEM when they shouldn’t be, or network outages that cause transmission delays. Having staff aware of the various timestamps and potential discrepancies can reduce the risk that they turn into larger issues in your organization.

Monitor for Caching

Caching is a sign that the system is unable to keep up with the volume of data. While some caching can be expected and considered normal, frequent occurrences are an indication of an undersized architecture or application misconfigurations. Excessive caching can result in major delays in receiving log data and ultimately data loss.

Given the risks of data caching, most SIEMs come with monitoring capabilities to alert when caching occurs. These should be implemented to alert when caching is beyond what is considered acceptable. For example, you may expect some minor caching during the day at peak hours, and thus don’t need alerts during this time, but alerts should be generated whenever there is caching outside this period.

Caching can also be detected from the server operating system, where you would see cache files build up in the applicable application directory. Thus, if your SIEM application doesn’t support alerting when caching occurs, you should be able to detect and alert via the OS.

Regardless of how it’s implemented, ensure your environment has appropriate alerting when caching is detected.

Calculate and Configure Caches

Until someone invents a technology that guarantees one hundred percent uptime, we’ll need to accept that at some point in a SIEM environment there will an application or system failure. Additionally, we’ll need to take the application offline at least a few times per year for scheduled maintenance and upgrades. While most SIEM applications have caching capabilities built into them, it’s critical to ensure the environment has appropriate cache sizes configured and sufficient storage. Insufficient storage or misconfigured cache configurations can result in data loss.

Typically in SIEM environments, the Processing Layer (Connectors/Collectors/Parser Layer) is designed to send to the Analytics Layer via TCP, and if it’s unavailable, data will be cached locally on the Processing Layer servers until the Analytics Layer is available again. Thus, the Processing Layer servers will need sufficient local disk space to house the expected caches.

In order to determine what an appropriate cache size is, we need to look at your organization’s requirements, SLAs, and other factors that will help us determine how long an outage can last, and how long it typically takes to resolve issues within your IT department. If you’re certain an outage would last no longer than 3 days, then we need to ensure the Processing Layer servers can support 3 days’ worth of cached log data. Caches can also get large quickly as it’s typically raw, uncompressed data.

To calculate how much storage we’ll need for caching, we can simply take the Average Sustained 24h EPS rate, and then multiply it by the average event size and the amount of seconds per day. For example, if your Average Sustained 24h EPS is 5,000, and your normalized event size is 2,000 bytes, then we’ll need about 864 GB of space per day. So if we have 2 servers in the Processing Layer and we expect an outage to last no longer than 3 days, then we’ll need 1.3 TB of free storage per server to meeting the cache space requirements (864 GB/Day X 3 days = 2.6 TB, or 1.3TB across 2 Servers).

We’ll also need to ensure the application is configured to use the appropriate cache size as well. Many SIEM applications are configured with a default cache size, which may not be sufficient for your environment.

Log Source Verification

A critical function of any SIEM environment is verifying that the intended systems are logging successfully. Systems that are believed to be logging to a SIEM that aren’t pose a significant risk to an organization, creating a false sense of security and limiting the amount of data available for an investigation.

A common mistake that can be made while verifying if a particular system is logging to a SIEM is using the incorrect fields for confirmation. For example, most SIEMs have several IP address and hostname fields, ranging from source IP address, destination IP address, device IP address, and others. Given the multiple fields, it can be confusing to know which is used for what, especially for new staff. This leaves a possibility that staff are pulling incorrect data and providing inaccurate results when performing verification or searching in general.

As an example, Company A is implementing a new Linux server, and the SOC is being asked if they can see logs coming from it, 172.16.2.1. The SIEM application has three IP address fields: Device IP Address, Source IP Address, and Destination IP Address. The Device IP Address field contains the IP address of the server generating the log event. The Source IP Address is the source of the event, and the Destination IP Address field is the target of the event.

One of the SOC analysts performs the verification, searches for “172.16.2.1,” and gets one result:

Event Name: Accept
Source IP Address: 10.1.1.1
Source Port: 22
Device IP Address: 172.16.50.25
Destination IP Address: 172.16.2.1
Destination Port: 22

Without paying attention to the field names, the analyst mistakenly mentions that the new Linux server is logging, when in fact what he’s looking at is an accept traffic event generated from firewall 172.16.50.25, not an event from the new server. The project to implement the new server is now considered completed, and Company A now has a security gap.

While this can be a major issue and add risk to an organization, a simple process can be followed to show staff which fields to use for verification. Your SIEM vendor can also easily tell you which fields to use as well. Learning and education sessions on searching can also be used to address this and ensure staff know how to search effectively.

Alerting On Quiet Log Sources

Data sources that stop logging to your SIEM put your organization at risk. If one of your organization’s firewalls stops logging to the SIEM, your SOC will be blind to malicious traffic traversing it. If your endpoint protection application stops logging, your analysts won’t be able to see if malicious files are being executed on one of your billing servers.

In a perfect world, your SIEM should alert when any data source stops logging to your SIEM. While this is feasible in smaller organizations, it can become daunting in large organizations. It’s easier for your SOC to follow up with one system owner who sits a few cubicles over than with 100 system owners from different lines of businesses. The task of remediating several hundred systems not logging to a SIEM can easily consume an entire resource. In large organizations, network outages, system upgrades and maintenance windows can be a regular occurrence. Should you alert on any data source that stops logging to your SIEM in a predefined period, you could easily end up flooding your SOC, and in a worst case scenario, your analysts will develop a practice of ignoring these alerts.

As a best practice, especially in large organizations, a SIEM should be configured to alert when critical data sources stop logging. The data sources should at minimum include critical servers to the business (e.g. client-facing applications), firewalls, proxies, and security applications. A threshold of less than an hour in your organization may generate excessive alerts, as some sources that are file-based may be delayed by design, for example by copying the file to the SIEM every 30 minutes. However, data sources that haven’t logged in one hour may warrant an alert in your organization.

Another thing to consider when remediating systems not logging to the SIEM is that malware experts and threat intelligence specialists may not be the best resources to chase system owners down. While they may not mind the odd alert for this, they’re not likely going to have time to chase down and get 100 system owners to configure their systems properly, or have the patience to continuously follow up with them. Thus, in larger organizations, project management may be a good fit for this task.

Having all your systems log to your SIEM is a critical part of reducing your organization’s risk. Having a practical, manageable task for remediating systems that stop logging will ensure the process is followed and the risk is reduced.

Disable Unused Content

When building new SIEM environments or working with existing ones, one of the quickest ways you can improve the performance and stability of the environment is to remove unused content. While this may seem obvious to experienced SIEM resources, it’s common to find reports or rules running in the background that don’t serve a purpose. In some environments, unused content can be slowing the system down and contributing to application instability. Unused content is especially common in environments that don’t have enough staff to manage the SIEM.

Default rules provided by the vendor are often enabled but unused. The first indication that a rule is unused is if it doesn’t have an action or it isn’t used for informational purposes. If a rule isn’t alerted to the attention of an investigator or SIEM engineer, it may be that the rule is simply running in the background consuming system sources. A rule to trigger an alert when someone logs into the SIEM may be useful, but an ad-hoc report to obtain the same information may suffice. A significant amount of inefficient rules that match a large percentage of events can adversely affect the performance of the environment.

Reports can be another source of unused content. In many environments, I find reports that were originally setup to be used temporarily, but are no longer being used by the recipient. It’s often for the recipient to forget to follow up with the SIEM staff to note the reports are no longer required. Over a period of several years, this can easily amount to several dozen reports running on a regular basis, putting a significant strain on the system for no benefit.

All SIEM environments are different, and there’s no set of content that must be enabled or disabled. But there’s very likely content in your environment that can be disabled, and the system resources can instead be used to provide security analysts better search response times. So on a regular basis or whenever there’s a complaint about search response times or application instability, determine if there’s any content that can be safely disabled.

Effective Searching

There are two critical reasons end users should learn how to search their SIEM effectively. Ineffective searching is a risk to your organization, where end users can produce inaccurate data, and thus provide inaccurate investigation results. Ineffective searching can also degrade the SIEM’s performance, increasing the amount of time required for analysts to obtain data, while affecting the overall stability of the system.

If a security analyst is asked to perform an investigation and searches incorrectly, the results for a query on malicious traffic may return null when in fact there are matches. A compromised user account may be generating significant log data, but your analysts can’t obtain logs for it because they are searching for “jsmith” instead of a case-sensitive “JSmith.” End users can also match on incorrect fields, believing they are finding the correct data when they are not.

Ineffective queries can lengthen the amount of time required to complete them, and increase the system resources used by the SIEM. Many queries can improved to significantly increase their performance, making the end user happier with a faster response time, and a healthier system that has more CPU and RAM to work on other tasks. A simple rule of thumb is to match as early in the query as possible to limit the amount of data the system searches through. Searching for data in particular fields rather than searching all fields is also a way to reduce the amount of processing the system must do. Additionally, some SIEM tools allow you to easily check for poorly performing queries. For example, Splunk’s Search Job Inspector can not only show you which queries are taking the longest, but even which parts of the query are taking longer than others, allowing you to optimize accordingly.

It’s also common for security analysts to get requests for excessive data. In many cases requestors will ask for more information than is required in order to let them drill-down into the information they need, instead of having to submit multiple requests for data. For example, there may be a request to pull log data on a user for the past two months, when all that is required is some proxy traffic for a few days. These types of requests can be resource-intensive on SIEMs, especially if there are multiple queries running simultaneously. The impact can be more severe when the queries are scheduled reports. Scheduling multiple, large, inefficient queries on a regular basis can consume a significant amount of system resources. With a few inquiries to the requestor, the security analyst may be able to significantly reduce the amount of data searched for.

While ineffective searching is a risk, it’s a simple one to reduce. Training sessions, lunch-and-learns, or workshops can significantly reduce the risks of analysts searching incorrectly and consuming unnecessary system resources. I find a simple three-page deck can provide enough information to assist analysts with searching, highlighting the tool’s case sensitivity, common fields, and sample queries. Nearly all SIEM vendors offer complimentary documentation that will show you how to search best with their product. Thus, a few hours of effort can reduce searching risks while optimizing your SIEM environment.

SSISS: A SIEM Requirements Gathering Case Study

Streamlined Seamless Integrated Security Solutions (SSISS): A SIEM Requirements Gathering Case Study

Implementing a SIEM is a challenge for any organization. As SIEMs can take years to implement, it’s critical to build them appropriately, as even minor changes can result in months of re-work. SIEMs can be resource-intensive applications, reading and writing several thousand events per second, and thus an inadequate amount of RAM, CPU, or slow storage can mean poor performance and application instability. SIEM environments can have many stakeholders, so changing them can require approvals from many departments and subject matter experts within an organization.

To reduce risks associated with a SIEM implementation, a thorough requirements gathering exercise should be performed. We need to know who is going to be using the tool, how often it will be used, what type of data it will collect, how long the data will be stored for, and more. This will allow solution designers to best architect a solution, determine what product is most suitable for your organization, select an appropriate licensing model, and in general eliminate any assumptions made in the design. We also need to provide the requirements to the vendors in order for them to provide an accurate, high-performing solution.

One of the major things to keep in mind during a requirements gathering exercise is to challenge all requirements. If a requirement significantly raises the cost of the solution or makes it more difficult to maintain, then there should be a justification for it. Common misconceptions with using log data as legal evidence and data retention periods can make solution designers unnecessarily increase the complexity and cost of a SIEM. For example, some organizations mandate that log data must be retained for seven years, but this is typically a requirement for transactional data, not system log data.

Let’s dive into a sample requirements gathering exercise to highlight some of the major items we need to capture in order to design an appropriate solution. SSISS (Streamlined Seamless Integrated Security Solutions, not a real company to my knowledge, but what a name), one of the industry’s leading IT security companies, has hired me to assist one of their clients, Company A. Numerous problems exist within the environment and Company A isn’t sure where to even start.

The following Monday I arrive at Company A and meet with the VP of Security Operations. I learned that the SIEM was implemented a couple of years ago by a provider Company A is no longer doing business with. Complaints about the SIEM range from slow search response times, data loss, application instability, to audit deficiencies. The VP has budget to build a new SIEM, and doesn’t want to spend efforts on fixing the existing one. I ask the VP for all stakeholders and a point of contact within each group that I can work with. The stakeholders are the security operations team, network team, server team, and the compliance team.

I first meet with the security operations team. Their main complaints with the system are that it’s unstable, slow, and overall difficult to get work done. Searches and reports time out, and the system generally stops working at least once a week that forces a reboot. As I note the issues down, I ask if there’s anything else the system doesn’t do today. They reply that the data sources they need are there and they can create the required correlation rules, it’s just that the system is slow and unstable. Log data growth of 20% over the next 5 years seems reasonable, and they don’t currently forward any of the log data to another application, nor do they plan to. They agree to send me an email with their documented requirements.

Next is a session with the network team. They use the SIEM mainly to troubleshoot firewall and router issues, and note the searching is slow. They are mandated to log all of their devices to the SIEM, but there have been no integration issues to date. Log data growth of 20% over the next five years seems reasonable to them as well. I ask that they send me an email documenting what they need the SIEM to do along with the quantity of devices they intend to integrate.

I next meet with the server team. They generally don’t use the SIEM, but they are mandated to configure all servers to log to it. The only output they get from it is a monthly report to verify that all systems are logging. When asked about growth, they by chance give the same number used by the security operations and network teams. They agree to send me a note documenting the type and quantity of servers that they own.

Finally, a session with the compliance team lands me with some large requirements: seven years of log data and encryption of data at rest. I reply that most industry standards mandate one year of log data, and that seven is typically for transactional data, which is not stored in the SIEM. Additionally, there is no personally identifiable or financial data within the logs, so while many SIEMs offer masking capabilities, it can decrease performance. The compliance team replies that as long as there is no financial data, one year of data suffices, and access to data needs to be restricted, but not encrypted or masked. They agree to send me their documented requirements.

The next day, I receive responses from all teams. Everything looks okay with the security operations requirements:

I note that since we’ll be using Syslog for many data sources, the logs from the source to the Processing Layer will be unencrypted, but will be from the Processing Layer to the Analytics layer. The team confirms this will not be an issue. I also ask if there are any opportunities for filtering data, and discover the team doesn’t need any of the events I’m proposing to drop in the new solution. The team does a quick count and finds that dropping these events will reduce EPS rates by 20%. They also note that their current SIEM doesn’t do any aggregation.

The only additional request I have for them is to compile the distinct amount of systems logging to the SIEM, each device type’s average sustained and peak sustained EPS rates, and the total amount of correlation rules they intend to have in production. This will help me provide an accurate architecture and storage requirements.

Requirements for the network team look good with the quantity of network devices:

The server team requirements are documented as expected with the total quantity of servers:

Fortunately, the conversation with the compliance team worked in my favour, and the only requirement from them is to retain log data for one year.

The security operations team responds to my request a few days later, and I’m now able to populate my SIEM Architecture Sizing, Storage and Infrastructure Costs Calculator spreadsheet, which I’ll be sending to each vendor.

The first tab lists the data source requirements:

The two most important numbers in the above table are the Total Average Sustained 24h EPS (events per second) rate, which is the total amount of events received in a day divided by the amount of seconds in a day, and the Peak Sustained EPS rate, which is the maximum amount of EPS processed by the system in a day, typically during business hours. The Total Average Sustained 24h EPS rate will be used to determine storage requirements and licensing costs depending on the product licensing model, while the Peak Sustained EPS rate will be used more to size an appropriate architecture. One of the biggest mistakes many SIEM consultants commit is using the sustained EPS rates to size an architecture. This results in the system being undersized during peak hours. For example, proxy traffic at night can be a mere fraction of what it is during the day. Thus, an architecture designed using the sustained rate will cause the system to slow down and cache data during the day, when it’s dealing with 10,000 peak EPS rather than the sustained 1,500 EPS it was designed for.

The second table lists the functional requirements (what the system must do):

The third table lists the Environment Parameters, which are other design requirements that need to be built into the architecture.

The first item is the Processing Layer Forwarding Factor, which is how many copies of each event the Connectors/Collectors/Forwarders will forward to the Analytics Layer. Many SIEM products forward a second copy of each event to two destinations to make the solution highly available. If I want a highly available solution, I’m going to need to enter a value of two or greater. The second item is the Analytics Layer Replication Factor, which will indicate how many copies of each event will be copied to another server for high availability. Given that these two items can vary per SIEM product, I’m going to let the vendor enter these numbers.

The third item is the Analytics Layer Forwarding Factor, which indicates how many systems the Analytics Layer will need to forward to. As confirmed by the Security Operations team, we will not be forwarding data from the SIEM to another system. This is a critical design consideration, as missing this requirement can cause the architecture to be severely undersized.

The fourth item, Filtering Benefit, will reduce the amount of data sent from the Processing Layer to the Analytics layer by the entered percentage. I’m going to enter a value of 20%.

The fifth item, Aggregation Benefit, will depend if the SIEM product can aggregate. I’ll leave that value for the vendor as well.

The Processing Layer Spike Buffer and Analytics Layer Spike Buffer are designed to prevent caches from being formed when there are surges of log data. Why this is an important consideration is detailed in the article A SIEM Odyssey: How Albert Einstein Would Have Designed Your SIEM Architecture. I’m going to use a value of 25% for the Processing Layer and 15% for the Analytics Layer.

Finally, as discussed with many Company A teams, log data growth of 20% will be assumed.

After all the parameters are input, I get the following table from the Architecture Requirements tab. The two key values are the Total Processing Layer EPS Requirement and the Total Analytics Layer EPS Requirement. The SIEM architecture will need to be sized based on these numbers. However, these values may change per vendor depending on how their product works.

I’m now ready to send the SIEM Architecture Sizing and Storage Costs Calculator spreadsheet to the three shortlisted vendors.

While we should be focused on meeting Company A’s requirements, we shouldn’t forget to be equally focused on meeting the vendor’s requirements. We need to ensure the proper system requirements are met, including CPU, RAM, and storage. Failure to meet these requirements can result in numerous complications, including reduced search response times, application instability, data loss, operational nightmares, and ultimately increased risk to the organization.

The first vendor to respond is Vendor 1. Their SIEM product doesn’t structure logs during ingestion, doesn’t have the ability to aggregate data, but retaining the logs in raw format should keep the message size low at 700 bytes. Their product provides high availability by replicating data across multiple Analytics Layer servers. For the Non-HA solution, the Processing Layer will need to process 15,000 EPS, while the Analytics Layer will process 11,000 EPS. The average event size of 700 bytes produces offline storage costs of $21,300/year, while the online storage will be provided by the local disks on the server. For the HA solution, the Processing Layer requirement stays the same, but Analytics Layer Requirement jumps to 22,000 EPS, and storage requirements double to $42,600/year.

Vendor 1 recommends four servers in the Processing Layer, which is more than enough to handle the anticipated 15,000 EPS. This should prevent any caching or log data delays when there’s a surge of log data. For a highly available Processing Layer, they recommend to double the amount of Processing Layer servers, but note that if one of the servers is unavailable, the remaining three should still be more than sufficient to handle the 15,000 EPS. They also note a load balancer can be used in front of the Processor servers to provide balanced traffic and high availability. The 500GB of local storage on each server should provide 3.5 days of cached data in the event the Analytics Layer is unavailable, but can be bumped up if desired.

For the Analytics Layer, Vendor 1 recommends a large physical server with 48 cores, 256GB of RAM, and highly recommends a combination of solid state and local hard disks for the storage medium. Should all the server requirements be met, they have no concerns with meeting the search speed and correlation rule requirements. And to meet the high availability requirement, a second server can be added, and the primary Analytics server will replicate a copy of each event to the secondary server. Vendor 1 can also provide the entire solution as a cloud-based service.

The licensing model for Vendor 1 is based solely on stored GB per day. You can add as many servers or end users as needed without any additional costs.

Vendor 2 responds next, which provides a SIEM product that structures data as it’s ingested, can provide significant aggregation benefits, but will produce an average event size of 2,000 bytes. The product can provide high-availability by the Processing Layer sending a copy of each event to another destination, commonly known as “dual-feeding.” For the non-HA environment, the Processing Layer will need to process 15,000 EPS, but given the product’s aggregation functionality, the Analytics Layer will only need to process 5,500 EPS. However, the average event size of 2,000 bytes will bump up the offline storage costs to $30,400/year. For the HA solution, the Processing Layer requirement doubles to 30,000 EPS, Analytics Layer requirement to 11,000 EPS, and storage costs to $60,800/year.

For the non-HA solution, Vendor 2 claims four servers in the Processing Layer will be more than adequate to handle the expected 15,000 EPS, and can be augmented with a load balancer and additional servers for better performance and high availability.

The Analytics Layer is a single server with significant resources. Vendor 2 claims it should provide all required functionality and as well provide fast search response times for end users. To make the solution highly available we can simply add another server and configure the Processing Layer to “dual-feed.”

Vendor 2’s licensing model is a combination of CPU cores, peak EPS rates, and the amount of end users logged in simultaneously.

Last in line is Vendor 3, who provides an appliance-only SIEM solution that structures data as it’s ingested, can provide aggregation benefits, and produces the highest event size of 2,300 bytes due to the large field set the product uses. Like Vendor 2, they also anticipate an aggregation benefit of 50%, which will produce a non-HA Processing Layer requirement of 15,000 EPS and Analytics Layer requirement of 5,500 EPS, but the increased message size will bump up offline storage costs to $35,000/year. For HA, the values can simply be doubled.

Vendor 3 only proposed two servers in the Processing Layer, which they are certain can handle the proposed EPS rates, and can be scaled horizontally for high availability.

The Analytics layer is a single appliance the vendor claims can meet all the searching, reporting and analytics requirements. The high availability architecture is similar to Vendor 2, where a second Analytics Layer server can be added and the Processing Layer can be configured to “dual-feed.”

The licensing model is simply based on the average daily EPS rates on the Analytics Layer. Vendor 3 supports the entire appliance, from the hardware, OS, to the application, and issues patches for all of it. Patches and upgrades seem simple with the single files the vendor provides that can be uploaded at the click of a button.

The appliance-based solution sticks out in my mind, given the statements the VP of SecOps made about the revamping of the server teams. Previously, server issues could take months to resolve, and patching was practically non-existent due to the shortage of resources within the team. Additionally, there have been cases where the server resources promised were not delivered.

While I haven’t discussed cloud options with the VP of SecOps, I’m going to follow up with him to see if this will fit into his roadmap and verify if sufficient bandwidth is available to an external site. Vendor 1’s track record of managed Cloud SIEM environments is also impressive and will likely be the quickest to deploy. I’m also going to confirm that there are no other stakeholders that we missed, such as data scientists, to ensure the solution can handle any mega calculations if necessary.

All solutions appear to meet Company A’s requirements, so I’ve got some more work to do to see which will be the best fit. I like Vendor 1’s simple licensing model, architecture, low storage costs, and think the cloud solution may be the best bet for Company A. I like Vendor 2’s blazing fast searches, as there was nothing that frustrated me more as an analyst than slow search response times, making a simple investigation take hours. However, I’m confused with their licensing model, and I’ll have to follow up with them to understand it better. I like Vendor 3’s solution as well, similar to Vendor 2 but appliance-based with a simpler license model, but I’m not sure two servers in the Processing Layer will suffice.

When I look at the operational costs of all vendors, Vendor 1 produces the lowest storage costs. Vendor 3 produces the lowest support cost, but that’s due to the two servers in the Processing Layer compared to the four for the other vendors, which again I’m not convinced will be sufficient.

As you can see, selecting a SIEM product for your organization can be a complex task. There are many high-quality SIEM products on the market, but depending on your organization and how it operates, one product may be more suitable than another, and the cost can differ significantly. Minor functionality from how the solution will process the required data sources to the amount supported parsers can drastically make one solution more feasible than others.

A thorough requirements gathering exercise can reduce the risks associated with a SIEM implementation. It will help your organization select a product that maps best to your requirements. It will allow vendors to provide an accurate, high-performing solution. Ultimately, it will pave the way for an efficient and effective solution that lowers implementation and operational costs, reduces waste and rework, and provides end users with a tool that increases your organization’s security posture while providing business value.

What is SIEM and how it differs from other security tools

Now that we understand what log data is, let’s discuss the technology that will allow your organization to collect and use it.

Security Information and Event Management is a technology that will process log data from your various systems, analyze it, make it available for searching, and store it. SIEM itself is a combination of two more abbreviations: Security Information Management (SIM) and Security Event Management (SEM). SIM is focused on the collection of log data for investigative and compliance purposes. SEM is focused on alerting and analytics: threat detection, pattern anomalies, and correlating different data sources.

SIEM tools can vary in architecture, but generally have two layers: A Processing Layer and an Analytics Layer. The Processing Layer is where data is structured, aggregated, and forwarded to the Analytics Layer. The Analytics Layer is where data is stored, made available for searching, and where security analytics is performed.

Using the above diagram as an example, the data sources are your various systems that will produce log data and send (push) to your Processing Layer, or your Processing Layer will reach out to (pull) via a database or API call. Depending on the SIEM product, the Processing Layer will structure the log data, normalizing it into a standard format, aggregate it by combining similar events into one, or may simply add an index to it and forward to the Analytics Layer. The Processing Layer is strictly used for processing; the only data that is typically retained are caches when the Analytics Layer is unavailable.

The Analytics Layer is where end users will search for data, create reports and use cases. Depending on the SIEM product, it may structure log data, and may act as a long-term retention repository.

While SIEM is defined as a security application, it differs significantly from your other security tools. Your SIEM will process log data, while your proxy, IDS/IPS, and some malware detection tools will typically process network traffic. Packets going over your network use the same protocols, so you don’t need to customize your firewall or IPS to detect TCP traffic. Your SIEM will need to process log data in various formats, many which may not be supported by your SIEM vendor.

Your IDS/IPS and malware tools provide you with a list of signatures that will be automatically updated on a regular basis. Some IDS/IPS tools allow you to implement custom signatures, but for the most part your analysts won’t have to write custom signatures for known vulnerabilities, exploits, and attacks. Your SIEM staff will need to create custom use cases and update them regularly, as you’ll unlikely be using much of your SIEM vendor’s default content.

SIEM vendors support many log sources, but your engineers will need to ensure the right parsers are being used, update them regularly, and write any that are not supported by your SIEM vendor. This is in stark contrast to your network devices and other security tools that only have to work with limited protocols such as TCP/IP, HTTP, and HTTPS.

Staff will be logging into your proxy, firewall, IDS/IPS, and malware tools often, but it will mostly be for administrative purposes. Your SIEM will have many end users, ranging from admins to users searching for data. In large environments, it’s common to have several users searching for data simultaneously. For MSPs, your customers may be logging in to search for data as well.

While most of your other security tools can block a malicious host from egressing your network or block users going to an uncategorized site, SIEMs don’t have the capability to block.

The following table summarizes the differences between SIEM and your other security products.

As you can see from the above table, a SIEM differs significantly from your black-boxed IPS or malware tools. While it may seem that it’s simply a log aggregator, a SIEM is a complex tool that will need significant customization. The environment can have many stakeholders from security analysts, to compliance and access management teams. Ultimately, it will need to be implemented, operated, and maintained differently.