Category: Architecture-Design

SIEM Lists and Design Considerations


Those familiar with creating use cases in SIEMs have likely at some point worked with “Lists.” These are commonly known in SIEMs as “Active Lists” (ArcSight), “Reference Sets” (QRadar), “Lookups” (Splunk), “Lookup Tables” (Securonix, Devo), and similar in other tools. Lists are essentially tables of data, and you can think of them as an Excel-like table with multiple rows and columns. Lists are different in each of the SIEMs on the market. Some are simply a single column which you can use for e.g. IP Addresses, and others are up to 20 columns that can support a significant amount of data. Log retention policies typically don’t apply to Lists, so you can keep them for as long as needed.

SIEM Lists have three main drivers: limitations with real-time rule engines, limited retention policies, and external reference data.

Limitations with Real-Time Rule Engines

SIEMs with real-time rule engines have the advantage of triggering your use cases as data is ingested (versus running a query every 5 minutes). But the real-time advantage turns out to be a disadvantage when you have a use case that spans a greater timeframe. The longer the timeframe of the use case, the more system resources used by the real-time engine, thus making some use cases impossible. For example, real-time rule engines can easily detect 10 or more failed logins in 24 hours, but not over three months–that would be far too much data to keep in memory. To compensate for this, Lists can be used to store data required by use cases that can’t be done via the real-time rule engine. The List can store, for example, RDP logins over a much longer period, e.g. for one year, including the login type, username, hostname, IP address, and possibly more depending on your SIEM. You can then trigger an alert when the count for a particular user reaches the desired threshold based on the amount of entries in the List.

Limited Retention Policies

Limited retention policies were also a large driver for Lists. Most SIEM environments only have 3 months of online data. In order to access data older than that, it must be e.g. restored from an archive/backup, which is typically inconvenient enough that an analyst won’t even ask for it. With Lists, you can store selected data outside of your retention policies. If you want to store RDP logins for longer than your retention policy allows, you can simply add the values to a List.

External Reference Data

SIEMs are extremely effective at matching data in log files. The advent of threat intelligence data brought security teams massive lists of malicious IP addresses, hostnames, email addresses, and file hashes that needed to be correlated with firewall, proxy, and endpoint protection logs. These threat intel lists can be easily put into a List and then correlated with all applicable logs. Most (if not all) SIEM products support these types of Lists.

Other List Uses

Lists can often enhance your use case capabilities. If your SIEM product can’t meet all of the conditions of a use case with its real-time engine or query, you can sometimes use Lists to compensate. For example, you can put the result of the first use case into a List, and then use a second use case that uses both the real-time engine and values in the List.

Lists can be useful for whitelisting or suppressing duplicate alerts. For example, you can add the IP, username or hostname of the event that triggered an alert to a List (e.g. users/domains that are already being investigated), and use the List to suppress subsequent alerts from the same IP, username, or hostname.

Lists can also help simplify long and complicated queries. Instead of writing a single query, you can put the results of the first part of a query into a List, and then have the second query run against the values in the List.


As you can see, Lists can be very useful for SIEM end users. Overlooking List functionality during a SIEM design can have profound impacts. While List functionality differs per SIEM, it’s important to understand how your SIEM works and ensure it meets your requirements.

Standard-Size your SIEM HA and DR Environments

A common decision made when designing a highly available or disaster recovery enabled SIEM solution is to under-size the secondary environment with fewer resources than the main production environment. Many believe that losing a server to a hardware failure or application due to corruption is highly unlikely, and if such a situation were to occur, a system with fewer resources can suffice while the primary system is down. Thus, with a minute probability of having a server or application failure, many believe they can get away with fewer RAM, cores, and disk space on the HA or DR server(s). After all, none of us want to bet on something that isn’t likely to happen.

For many organizations, this may be an acceptable risk for many reasons, including budgetary restrictions, other security compensating controls, risk appetite, and others. For example, a small company simply may not have the budget for a highly available SIEM solution. For others, it may be that their other security applications provide compensating controls, where their analysts can obtain log data from other sources.

For organizations looking to implement high availability in some fashion, it’s important to understand how small differences can have a major impact.

To highlight what can happen, let’s use Company A as an example. Company A is designing a new SIEM estimated to process 10,000 EPS. The company has requested additional budget for HA capabilities but want to save some of the security budget for another investment. They find an unused server in the data centre that has fewer RAM, cores, and disk space, and decide to use it as the DR server. They ask their SIEM vendor for their thoughts, and the vendor replies that the reduced system resources should only result in slower search response times, and the 2 TB hard drive should provide just over four days of storage (given 10,000 EPS X 2,500 bytes per normalized event X 86,400 seconds/day X 80% compression = 432GB/day after compression). Company A accepts the risk, thinking that any system issue should be addressed within their 24-hour hardware SLA, and that they should have the application back up in two days at most.

A few months down the road, the Production SIEM system fails. The operations staff at Company A quickly reconfigure the SIEM Processing Layer (Connectors/Collectors/Forwarders) to point to the DR server. Log data begins ingesting into the DR server and is searchable by the end users. The security team pats themselves on the back for averting disaster.

However, things go sour when the server admins learn that the hardware vendor won’t be able to meet the 24-hour SLA. Three days pass and the main Production server still remains offline. While the DR server is still processing data and searches are slower but completing, the security team becomes anxious as the 2TB hard drive approaches 90% utilization on the fourth day.

When disk capacity is fully utilized early the following morning, the SIEM Analytics Layer (where data is stored) begins purging data greater than four days old, diminishing the security team’s visibility. The purging jobs are also adding stress to the already fully utilized cores, which are also now causing searches to time out. The Analytics Layer begins refusing new data, forcing the SIEM Processing Layer to cache data locally on the server. That is also a concern for the security team since the single Processing Layer server only has 500GB of disk space, and the 400GB of available cache at this rate would be fully utilized in roughly 4 hours (given 10,000 EPS X 2,500 bytes per normalized event X 14,400 seconds = 360 GB uncompressed) as the SIEM’s processing application can’t compress data (this SIEM can only compress at the Analytics Layer).

As you can see, overlooking a somewhat minor design consideration, making assumptions, relying on SLAs, and so on, can have major impacts on your environment and reduce the utility of your SIEM in a disaster. As SIEMs consist of multiple applications (Connectors, Analytics), the required high availability should be considered for the various components. Forgoing high availability may need to be done for budgetary or other reasons, but it’s critical to ensure that the environment is aligned to your requirements and risk appetite.

Managing Your SIEM Internally or Outsourcing

Given the advent of cloud computing, many companies are now outsourcing at least some parts of their IT infrastructure and applications to third parties, allowing them to focus on their core companies. Many organizations don’t want to invest in an IT department or data center, and can’t match the speed and efficiency that Amazon AWS or Microsoft Azure can provide.

While your SIEM environment is likely one of your more complicated security applications to manage, you can outsource one or more parts of it to third parties, including content development, application, and infrastructure management. For example, you can focus on developing content internally by your security analysts, have the application setup and maintained by the vendor or other third party, and have the infrastructure hosted in AWS or Azure.

Before embarking on an outsourcing initiative, there are some major considerations that need to be taken before outsourcing one or more parts of your SIEM environment.

Determining which parts to outsource

In order to determine which parts of your SIEM environment to keep or outsource, you’ll need to understand your competencies. Do you have a significant amount of staff with sufficient security competencies, but not SIEM specifically? Do you have a SOC that is proficient in both SIEM content development and investigations? Are you willing to invest in obtaining and retaining scarce SIEM staff? Does your line of business and its regulations force you to keep all data internally? Do you want to keep your SIEM staff internal for a few years and then consider outsourcing it? Can your data centre rack-and-stack a server within a day, or does it take six months?

While these can be difficult questions to answer, the good news is that you’ll be able to outsource any part of your SIEM environment. Many of the major consulting firms can offer you SIEM content developers, engineers, investigators, and application experts to manage everything from your alerts to your SIEM application.

A summary of the SIEM environment functions that can be outsourced:
Content Development
-SIEM correlation rules, reports, metrics, dashboards, and other analytics.
Investigations
-Performing security investigations, working with the outputs of alerts, reports, etc.
Engineering
-Integrating data into the SIEM, parser development.
Application Management & Support
-Installing, updating, and maintaining the SIEM application.
Infrastructure
-Setup and maintenance of SIEM hardware, servers, and storage.

Privacy Standards and Regulations

Depending on what country and jurisdiction your organization falls under, you may be subject to laws that restrict the processing and storing of data in particular geographic regions. Azure and AWS have expanded significantly and have infrastructure available in many countries, allowing your data to be stored “locally.” Additionally, Azure and AWS can provide secure, private networks, segregating your data from other organizations.

While it can seem risky to store your data within another entity, many organizations take advantage of Amazon’s and Microsoft’s network and security team rather than building the required teams internally.

Internet Pipe

You’ll need an adequate, fault-tolerant Internet pipe between your organization and your hosting company. The amount of bandwidth needed depends on the SIEM architecture. For example, you may want to collect log data locally by the Processing Layer, where it will be collected, structured/modified, and then sent to the Analytics Layer. If we’re using Product A, which structures log data into an average event size of 2,000 bytes, and our sustained EPS rate is 5,000, then we’ll need 10MB of available bandwidth in order to forward over the link without any latency.

SIEM application that supports your desired outsourced architecture.

Let’s say for example that you want to collect and consolidate log data locally, and then have that consolidated data sent to a SIEM in the cloud, and another copy to your local, long-term data store. Does the application that will be collecting and consolidating log data locally support this model? Is there a limitation that it can only send to one destination, and thus can’t meet the requirement to send to two destinations?

Other Considerations:
Ownership
-Regardless of how you build and operate your SIEM environment, ensure that there’s an owner with a direct interest in maintaining and improving its condition, especially if working with multiple vendors. Multiple parties can point the finger at each other if there are issues within the environment, so it’s critical to have an entity that can prevent stalemates, resolve issues, improve the SIEM’s condition, and ultimately increase the value it provides to your organization.
Contract Flexibility
-If you’re going to be using the RFP process, understand that the third parties you’re submitting the proposal to are there to win business and compete on price. As a result, some third parties may under-size a solution to create a price attractive to the purchaser. While this can simply be considered business as usual, it’s important to understand that your environment may need to be augmented or adjusted during its tenure, and the service provider may ask for additional funds to ensure the environment’s health. Additionally, requirements can change significantly in a short period of time, which can change the infrastructure required for the environment.

Overall, there are many ways to structure your SIEM environment to take advantage of outsourcing. There are many organizations that can help you manage any part of it. How to do it best will depend on your organization’s requirements, line of business, relationships with third parties, competencies, strategy, and growth.

Selecting a SIEM Storage Medium

Given that SIEMs can process tremendous amounts of log data, a critical foundation of your SIEM environment is storage. In order to provide analytical services and fast search response times, SIEMs need dedicated, high-performing storage. Inadequate storage will result in poor query response times, application instability, frustrated end users, and ultimately make the SIEM a bad investment for your organization.

Before running out and buying the latest and greatest storage medium, understanding your retention requirements should be the first step. Does your organization need the typical three months of online and nine months of offline data? Or do you have larger requirements, such as six months of online followed by two years of offline data? Do you want all of your data “hot”? Answering these questions first is critical to keep costs as low as possible, especially if you have large storage requirements.

Once we understand the storage requirements, we can then better understand which medium(s) to use for the environment. If we have a six-month online retention requirement for a 50,000 event per second processing rate, we’re going to need dedicated, high-speed storage to be able to handle the throughput. While we definitely need to meet the IOPS requirement vendors will request, we need to also ensure the storage medium is dedicated. Even if the storage has the required IOPS, if the application can’t access the storage medium when required, the IOPS will be irrelevant. Thus, if using a technology such as SAN, ensure that the application is dedicated to the required storage and that the SAN is configured accordingly.

Another factor to consider when designing your storage architecture for your SIEM environment is what storage will be used per SIEM layer. The Processing Layer (Connectors/Collectors/Forwarders) typically doesn’t store data locally unless the Analytics Layer (where data is stored) is unavailable. However, when the Analytics Layer is unavailable, the Processing Layer should have the appropriate storage to meet the processing requirements. Dedicated, high-speed storage should be used to process large EPS rates, and should have the required capacity to meet caching requirements.

To save on storage costs, slower, shared storage can be used to meet offline retention requirements. When needing access to historical data, the data can be copied back locally to the Analytics layer for searching.

Ensuring you have the right storage for your SIEM environment is a simple but fundamental task. As SIEMs can take years to fully implement and equally long to change, selecting the correct storage is critical. For medium-to-large enterprises, dedicated, high-speed storage should be used to obtain fast read and write performance. While smaller organizations should also make the same investment, there are cases where slower, more cost effective storage can be used for low processing rates and minimal end user usage of the SIEM.

Understanding Your License Model

SIEM license models can vary significantly. Some are simply based on the average ingested data per day, while others can have multiple factors such as ingested data per day, amount of end users, and the amount of devices it collects data from. Regardless of the license model, it’s critical to understand how it works to ensure you don’t under-allocate sufficient funds for it. A misunderstanding of your license model can unexpectedly consume more security budget than anticipated, and thus increase risk to your organization by limiting resources available for both the SIEM and other security services.

Additionally, as most companies are constantly growing and changing, it’s pivotal to understand how the license model can be augmented, changed, and what the penalties are for any violations.

While the simpler the license model the better, there’s nothing wrong with a license model with various factors as long as it’s well understood and meets your organization’s requirements. After a requirements gathering exercise, you should be able to tell your vendor the expected ingestion rates per day, how many users there will be, and the expected growth rates.

There are other less-obvious factors that can also significantly affect license models. Two often overlooked factors are how the vendor charges for filtering/dropping unneeded data, and if the ingested data rates are based on raw or aggregated/coalesced amounts. For example, if you’re planning on dropping a significant amount of data by the Processing Layer, Product A (which doesn’t charge for dropped data) would have lower license costs than Product B (which can drop data, but includes the dropped amount in license costs), all else equal. Product C, which aggregates/coalesces data and determines license costs based on the aggregated/coalesced EPS, would have lower license costs than Product D, which aggregates/coalesces data but determines license costs based on raw EPS rates, all else equal.

If you’re comparing different SIEMs, you should ensure that you’re performing an accurate comparison, as SIEMs can vary significantly. A license model for a full SIEM solution from Company A is likely to be more expensive than a log management-only solution from Company B.

SIEMs can be expensive and consume a significant portion of your security budget. Misunderstanding your requirements and then signing a contract with a license model that’s unclear or difficult to understand is a major risk. Reduce that risk by spending the resources necessary to understand it and choose one that aligns best with your organization.

Calculate and Configure Caches

Until someone invents a technology that guarantees one hundred percent uptime, we’ll need to accept that at some point in a SIEM environment there will an application or system failure. Additionally, we’ll need to take the application offline at least a few times per year for scheduled maintenance and upgrades. While most SIEM applications have caching capabilities built into them, it’s critical to ensure the environment has appropriate cache sizes configured and sufficient storage. Insufficient storage or misconfigured cache configurations can result in data loss.

Typically in SIEM environments, the Processing Layer (Connectors/Collectors/Parser Layer) is designed to send to the Analytics Layer via TCP, and if it’s unavailable, data will be cached locally on the Processing Layer servers until the Analytics Layer is available again. Thus, the Processing Layer servers will need sufficient local disk space to house the expected caches.

In order to determine what an appropriate cache size is, we need to look at your organization’s requirements, SLAs, and other factors that will help us determine how long an outage can last, and how long it typically takes to resolve issues within your IT department. If you’re certain an outage would last no longer than 3 days, then we need to ensure the Processing Layer servers can support 3 days’ worth of cached log data. Caches can also get large quickly as it’s typically raw, uncompressed data.

To calculate how much storage we’ll need for caching, we can simply take the Average Sustained 24h EPS rate, and then multiply it by the average event size and the amount of seconds per day. For example, if your Average Sustained 24h EPS is 5,000, and your normalized event size is 2,000 bytes, then we’ll need about 864 GB of space per day. So if we have 2 servers in the Processing Layer and we expect an outage to last no longer than 3 days, then we’ll need 1.3 TB of free storage per server to meeting the cache space requirements (864 GB/Day X 3 days = 2.6 TB, or 1.3TB across 2 Servers).

We’ll also need to ensure the application is configured to use the appropriate cache size as well. Many SIEM applications are configured with a default cache size, which may not be sufficient for your environment.

SSISS: A SIEM Requirements Gathering Case Study

Streamlined Seamless Integrated Security Solutions (SSISS): A SIEM Requirements Gathering Case Study

Implementing a SIEM is a challenge for any organization. As SIEMs can take years to implement, it’s critical to build them appropriately, as even minor changes can result in months of re-work. SIEMs can be resource-intensive applications, reading and writing several thousand events per second, and thus an inadequate amount of RAM, CPU, or slow storage can mean poor performance and application instability. SIEM environments can have many stakeholders, so changing them can require approvals from many departments and subject matter experts within an organization.

To reduce risks associated with a SIEM implementation, a thorough requirements gathering exercise should be performed. We need to know who is going to be using the tool, how often it will be used, what type of data it will collect, how long the data will be stored for, and more. This will allow solution designers to best architect a solution, determine what product is most suitable for your organization, select an appropriate licensing model, and in general eliminate any assumptions made in the design. We also need to provide the requirements to the vendors in order for them to provide an accurate, high-performing solution.

One of the major things to keep in mind during a requirements gathering exercise is to challenge all requirements. If a requirement significantly raises the cost of the solution or makes it more difficult to maintain, then there should be a justification for it. Common misconceptions with using log data as legal evidence and data retention periods can make solution designers unnecessarily increase the complexity and cost of a SIEM. For example, some organizations mandate that log data must be retained for seven years, but this is typically a requirement for transactional data, not system log data.

Let’s dive into a sample requirements gathering exercise to highlight some of the major items we need to capture in order to design an appropriate solution. SSISS (Streamlined Seamless Integrated Security Solutions, not a real company to my knowledge, but what a name), one of the industry’s leading IT security companies, has hired me to assist one of their clients, Company A. Numerous problems exist within the environment and Company A isn’t sure where to even start.

The following Monday I arrive at Company A and meet with the VP of Security Operations. I learned that the SIEM was implemented a couple of years ago by a provider Company A is no longer doing business with. Complaints about the SIEM range from slow search response times, data loss, application instability, to audit deficiencies. The VP has budget to build a new SIEM, and doesn’t want to spend efforts on fixing the existing one. I ask the VP for all stakeholders and a point of contact within each group that I can work with. The stakeholders are the security operations team, network team, server team, and the compliance team.

I first meet with the security operations team. Their main complaints with the system are that it’s unstable, slow, and overall difficult to get work done. Searches and reports time out, and the system generally stops working at least once a week that forces a reboot. As I note the issues down, I ask if there’s anything else the system doesn’t do today. They reply that the data sources they need are there and they can create the required correlation rules, it’s just that the system is slow and unstable. Log data growth of 20% over the next 5 years seems reasonable, and they don’t currently forward any of the log data to another application, nor do they plan to. They agree to send me an email with their documented requirements.

Next is a session with the network team. They use the SIEM mainly to troubleshoot firewall and router issues, and note the searching is slow. They are mandated to log all of their devices to the SIEM, but there have been no integration issues to date. Log data growth of 20% over the next five years seems reasonable to them as well. I ask that they send me an email documenting what they need the SIEM to do along with the quantity of devices they intend to integrate.

I next meet with the server team. They generally don’t use the SIEM, but they are mandated to configure all servers to log to it. The only output they get from it is a monthly report to verify that all systems are logging. When asked about growth, they by chance give the same number used by the security operations and network teams. They agree to send me a note documenting the type and quantity of servers that they own.

Finally, a session with the compliance team lands me with some large requirements: seven years of log data and encryption of data at rest. I reply that most industry standards mandate one year of log data, and that seven is typically for transactional data, which is not stored in the SIEM. Additionally, there is no personally identifiable or financial data within the logs, so while many SIEMs offer masking capabilities, it can decrease performance. The compliance team replies that as long as there is no financial data, one year of data suffices, and access to data needs to be restricted, but not encrypted or masked. They agree to send me their documented requirements.

The next day, I receive responses from all teams. Everything looks okay with the security operations requirements:

I note that since we’ll be using Syslog for many data sources, the logs from the source to the Processing Layer will be unencrypted, but will be from the Processing Layer to the Analytics layer. The team confirms this will not be an issue. I also ask if there are any opportunities for filtering data, and discover the team doesn’t need any of the events I’m proposing to drop in the new solution. The team does a quick count and finds that dropping these events will reduce EPS rates by 20%. They also note that their current SIEM doesn’t do any aggregation.

The only additional request I have for them is to compile the distinct amount of systems logging to the SIEM, each device type’s average sustained and peak sustained EPS rates, and the total amount of correlation rules they intend to have in production. This will help me provide an accurate architecture and storage requirements.

Requirements for the network team look good with the quantity of network devices:

The server team requirements are documented as expected with the total quantity of servers:

Fortunately, the conversation with the compliance team worked in my favour, and the only requirement from them is to retain log data for one year.

The security operations team responds to my request a few days later, and I’m now able to populate my SIEM Architecture Sizing, Storage and Infrastructure Costs Calculator spreadsheet, which I’ll be sending to each vendor.

The first tab lists the data source requirements:

The two most important numbers in the above table are the Total Average Sustained 24h EPS (events per second) rate, which is the total amount of events received in a day divided by the amount of seconds in a day, and the Peak Sustained EPS rate, which is the maximum amount of EPS processed by the system in a day, typically during business hours. The Total Average Sustained 24h EPS rate will be used to determine storage requirements and licensing costs depending on the product licensing model, while the Peak Sustained EPS rate will be used more to size an appropriate architecture. One of the biggest mistakes many SIEM consultants commit is using the sustained EPS rates to size an architecture. This results in the system being undersized during peak hours. For example, proxy traffic at night can be a mere fraction of what it is during the day. Thus, an architecture designed using the sustained rate will cause the system to slow down and cache data during the day, when it’s dealing with 10,000 peak EPS rather than the sustained 1,500 EPS it was designed for.

The second table lists the functional requirements (what the system must do):

The third table lists the Environment Parameters, which are other design requirements that need to be built into the architecture.

The first item is the Processing Layer Forwarding Factor, which is how many copies of each event the Connectors/Collectors/Forwarders will forward to the Analytics Layer. Many SIEM products forward a second copy of each event to two destinations to make the solution highly available. If I want a highly available solution, I’m going to need to enter a value of two or greater. The second item is the Analytics Layer Replication Factor, which will indicate how many copies of each event will be copied to another server for high availability. Given that these two items can vary per SIEM product, I’m going to let the vendor enter these numbers.

The third item is the Analytics Layer Forwarding Factor, which indicates how many systems the Analytics Layer will need to forward to. As confirmed by the Security Operations team, we will not be forwarding data from the SIEM to another system. This is a critical design consideration, as missing this requirement can cause the architecture to be severely undersized.

The fourth item, Filtering Benefit, will reduce the amount of data sent from the Processing Layer to the Analytics layer by the entered percentage. I’m going to enter a value of 20%.

The fifth item, Aggregation Benefit, will depend if the SIEM product can aggregate. I’ll leave that value for the vendor as well.

The Processing Layer Spike Buffer and Analytics Layer Spike Buffer are designed to prevent caches from being formed when there are surges of log data. Why this is an important consideration is detailed in the article A SIEM Odyssey: How Albert Einstein Would Have Designed Your SIEM Architecture. I’m going to use a value of 25% for the Processing Layer and 15% for the Analytics Layer.

Finally, as discussed with many Company A teams, log data growth of 20% will be assumed.

After all the parameters are input, I get the following table from the Architecture Requirements tab. The two key values are the Total Processing Layer EPS Requirement and the Total Analytics Layer EPS Requirement. The SIEM architecture will need to be sized based on these numbers. However, these values may change per vendor depending on how their product works.

I’m now ready to send the SIEM Architecture Sizing and Storage Costs Calculator spreadsheet to the three shortlisted vendors.

While we should be focused on meeting Company A’s requirements, we shouldn’t forget to be equally focused on meeting the vendor’s requirements. We need to ensure the proper system requirements are met, including CPU, RAM, and storage. Failure to meet these requirements can result in numerous complications, including reduced search response times, application instability, data loss, operational nightmares, and ultimately increased risk to the organization.

The first vendor to respond is Vendor 1. Their SIEM product doesn’t structure logs during ingestion, doesn’t have the ability to aggregate data, but retaining the logs in raw format should keep the message size low at 700 bytes. Their product provides high availability by replicating data across multiple Analytics Layer servers. For the Non-HA solution, the Processing Layer will need to process 15,000 EPS, while the Analytics Layer will process 11,000 EPS. The average event size of 700 bytes produces offline storage costs of $21,300/year, while the online storage will be provided by the local disks on the server. For the HA solution, the Processing Layer requirement stays the same, but Analytics Layer Requirement jumps to 22,000 EPS, and storage requirements double to $42,600/year.

Vendor 1 recommends four servers in the Processing Layer, which is more than enough to handle the anticipated 15,000 EPS. This should prevent any caching or log data delays when there’s a surge of log data. For a highly available Processing Layer, they recommend to double the amount of Processing Layer servers, but note that if one of the servers is unavailable, the remaining three should still be more than sufficient to handle the 15,000 EPS. They also note a load balancer can be used in front of the Processor servers to provide balanced traffic and high availability. The 500GB of local storage on each server should provide 3.5 days of cached data in the event the Analytics Layer is unavailable, but can be bumped up if desired.

For the Analytics Layer, Vendor 1 recommends a large physical server with 48 cores, 256GB of RAM, and highly recommends a combination of solid state and local hard disks for the storage medium. Should all the server requirements be met, they have no concerns with meeting the search speed and correlation rule requirements. And to meet the high availability requirement, a second server can be added, and the primary Analytics server will replicate a copy of each event to the secondary server. Vendor 1 can also provide the entire solution as a cloud-based service.

The licensing model for Vendor 1 is based solely on stored GB per day. You can add as many servers or end users as needed without any additional costs.

Vendor 2 responds next, which provides a SIEM product that structures data as it’s ingested, can provide significant aggregation benefits, but will produce an average event size of 2,000 bytes. The product can provide high-availability by the Processing Layer sending a copy of each event to another destination, commonly known as “dual-feeding.” For the non-HA environment, the Processing Layer will need to process 15,000 EPS, but given the product’s aggregation functionality, the Analytics Layer will only need to process 5,500 EPS. However, the average event size of 2,000 bytes will bump up the offline storage costs to $30,400/year. For the HA solution, the Processing Layer requirement doubles to 30,000 EPS, Analytics Layer requirement to 11,000 EPS, and storage costs to $60,800/year.

For the non-HA solution, Vendor 2 claims four servers in the Processing Layer will be more than adequate to handle the expected 15,000 EPS, and can be augmented with a load balancer and additional servers for better performance and high availability.

The Analytics Layer is a single server with significant resources. Vendor 2 claims it should provide all required functionality and as well provide fast search response times for end users. To make the solution highly available we can simply add another server and configure the Processing Layer to “dual-feed.”

Vendor 2’s licensing model is a combination of CPU cores, peak EPS rates, and the amount of end users logged in simultaneously.

Last in line is Vendor 3, who provides an appliance-only SIEM solution that structures data as it’s ingested, can provide aggregation benefits, and produces the highest event size of 2,300 bytes due to the large field set the product uses. Like Vendor 2, they also anticipate an aggregation benefit of 50%, which will produce a non-HA Processing Layer requirement of 15,000 EPS and Analytics Layer requirement of 5,500 EPS, but the increased message size will bump up offline storage costs to $35,000/year. For HA, the values can simply be doubled.

Vendor 3 only proposed two servers in the Processing Layer, which they are certain can handle the proposed EPS rates, and can be scaled horizontally for high availability.

The Analytics layer is a single appliance the vendor claims can meet all the searching, reporting and analytics requirements. The high availability architecture is similar to Vendor 2, where a second Analytics Layer server can be added and the Processing Layer can be configured to “dual-feed.”

The licensing model is simply based on the average daily EPS rates on the Analytics Layer. Vendor 3 supports the entire appliance, from the hardware, OS, to the application, and issues patches for all of it. Patches and upgrades seem simple with the single files the vendor provides that can be uploaded at the click of a button.

The appliance-based solution sticks out in my mind, given the statements the VP of SecOps made about the revamping of the server teams. Previously, server issues could take months to resolve, and patching was practically non-existent due to the shortage of resources within the team. Additionally, there have been cases where the server resources promised were not delivered.

While I haven’t discussed cloud options with the VP of SecOps, I’m going to follow up with him to see if this will fit into his roadmap and verify if sufficient bandwidth is available to an external site. Vendor 1’s track record of managed Cloud SIEM environments is also impressive and will likely be the quickest to deploy. I’m also going to confirm that there are no other stakeholders that we missed, such as data scientists, to ensure the solution can handle any mega calculations if necessary.

All solutions appear to meet Company A’s requirements, so I’ve got some more work to do to see which will be the best fit. I like Vendor 1’s simple licensing model, architecture, low storage costs, and think the cloud solution may be the best bet for Company A. I like Vendor 2’s blazing fast searches, as there was nothing that frustrated me more as an analyst than slow search response times, making a simple investigation take hours. However, I’m confused with their licensing model, and I’ll have to follow up with them to understand it better. I like Vendor 3’s solution as well, similar to Vendor 2 but appliance-based with a simpler license model, but I’m not sure two servers in the Processing Layer will suffice.

When I look at the operational costs of all vendors, Vendor 1 produces the lowest storage costs. Vendor 3 produces the lowest support cost, but that’s due to the two servers in the Processing Layer compared to the four for the other vendors, which again I’m not convinced will be sufficient.

As you can see, selecting a SIEM product for your organization can be a complex task. There are many high-quality SIEM products on the market, but depending on your organization and how it operates, one product may be more suitable than another, and the cost can differ significantly. Minor functionality from how the solution will process the required data sources to the amount supported parsers can drastically make one solution more feasible than others.

A thorough requirements gathering exercise can reduce the risks associated with a SIEM implementation. It will help your organization select a product that maps best to your requirements. It will allow vendors to provide an accurate, high-performing solution. Ultimately, it will pave the way for an efficient and effective solution that lowers implementation and operational costs, reduces waste and rework, and provides end users with a tool that increases your organization’s security posture while providing business value.

If Milton Friedman Created Your SIEM Team

When you mix an economist with the Godfather, you get an offer you can’t understand. But when you mix the philosophy of a famous economist with your SIEM team, you can create a high-performing team that continuously improves the environment, plans accordingly, creates better use cases, and ultimately reduces the probability of your phone ringing on a Friday afternoon for a SIEM issue.

Milton Friedman was one of the 20th Century’s most influential economists. Without going into detail or starting a debate on economic policy, he argued that a single owner would take better care of something than multiple entities or an unclear entity. The single owner likely has a direct interest in the value of it and will maintain it better than an entity that doesn’t. And thus his famous quote:

“When everybody owns something, nobody owns it, and nobody has a direct interest in maintaining or improving its condition.”
– Milton Friedman

A SIEM is likely one of your more complicated security products to manage, and needs extensive customization over the other black-boxed security applications your vendors manage for you. Not only do you need to manage the content and use cases, you need to manage the data feeds, ensure data is parsing correctly, troubleshoot issues with the application, support SIEM end-users, and plan for growth. All this effort requires input from various teams within your organization. Given the multiple teams involved, it’s critical to establish accountability and know who is responsible for what part of the environment.

SIEM Environment Requirements

The first requirement of any SIEM solution is clear, single ownership; an entity that has a direct interest in improving and maintaining the overall SIEM environment, and is ultimately accountable for its entire operation. Without clear ownership, staff and end users will be discouraged from escalating issues. Teams will not have a dispute mechanism, and instead of resolving issues, they will point the finger at each other. Those issues will then be brushed under the rug, and will result in a major outage or security issue down the road for leadership to deal with. Work will not be distributed accordingly, and highly-skilled staff that are overworked will leave, taking valuable knowledge and training investments with them. Relationships between the teams will be strained, and ultimately entropy will overrun your environment, in which significant investment will need to be made in order to turn it around.

The second requirement of a SIEM solution is a healthy, teamwork-oriented environment. Given that many teams are involved in the implementation and operation of your organization’s SIEM, positive and open communication between the teams is required for issues to be raised, work to be assigned to the appropriate teams, and for knowledge to be shared. Healthy teams will raise pertinent issues to leadership and resolve them quicker than teams that don’t. Healthy teams share valuable knowledge and train each other. All of this contributes to a work environment that retains staff, and attracts new talent into the team.

The third requirement of a SIEM solution is a strong skillset. SIEM environments are complicated, and you’ll need many skills to manage it from architecture and design planning, parser development, rule logic development, to social skills required to obtain and maintain data from other teams. Before making investments in your SIEM skillset, the first two requirements should be met, or else you risk losing highly skilled staff that are hard to find and retain.

The fourth requirement of a SIEM solution is documented roles and responsibilities. Many mistake this as the first requirement, but a RACI, for example, will not be followed or enforced if the first three requirements are not met. If your staff don’t have the proper skillset, one or two employees may end up doing everyone else’s work, and leave when they burn out. If your teams have poor communication with each other, issues may end up going unresolved and unnoticed by leadership, leading to an outage down the road.

Where practical, entire SIEM teams should be under one VP or line of business. Having one VP accountable for the implementation and operation of your SIEM gives the VP incentive to ensure the solution isn’t rushed into production, and that it has adequate resources for operations. The single VP will have more of an incentive to ensure the health of the SIEM environment than another organization that makes one VP accountable for the implementation only, and another VP for the support of it. Such a situation can incentivize the implementer VP to get it in as soon and cheap as possible and leave the support VP with a mess. Given that SIEMs can take years to fully implement, this should be avoided at all costs. The single VP also acts as a single escalation point and can’t deflect the issue to another VP or line of business. When there are 2 VPs and the roles and responsibilities aren’t clear, disputes can arise or the issue can be ignored. Again, it’s ideal to have your entire SIEM environment under a single VP, but in organizations with a good working environment, different parts of it owned by different VPs or lines of business can work out well. There are also some roles and responsibilities, such as server and storage administration, that are common to be outside of your security organization.

RACI Matrix Overview

One of the industry’s most common roles and responsibilities document is a RACI Matrix, which stands for Responsible, Accountable, Consulted, and Informed. The goal of a RACI is to list all stakeholders involved in the solution and the required tasks, and then assign one of the following values to a stakeholder(s) for each task.

While a RACI is designed to document roles and responsibilities, it has another valuable benefit: quantifying work efforts. Once you see all the various tasks involved in your SIEM environment, you can see how much work effort the various stakeholders are assigned. For example, if Engineering is responsible for Parser Management, and they spend 20 hours per week maintaining the 40 custom parsers, they can justify the half of an FTE they’re requesting.

It’s easy for a SIEM RACI to span several hundred lines given the amount of tasks and teams involved, and I’d thus recommend to keep it as high level as possible. The objective should be to assign tasks to the teams, and then leave the teams responsible for figuring out how work is managed. This avoids the SIEM Owner having to resolve disputes within teams. The SIEM owner should have a single point of contact within each of the teams to work with directly.

A SIEM environment should have at minimum an overall RACI that defines the roles and responsibilities of all stakeholders. Additionally, each team may want to create an internal RACI that clarifies who within the team is doing what. This is optional, but highly recommended, as it can help employees understand their tasks, assist management in understanding the required tasks and work efforts, and most important establishes accountability. For example, if you have 100 correlation rules and leave it up to “the team” to manage it, you may find that the task of keeping the rules relevant is being ignored. When you break up the rules, the first 40 to be “owned” by Bill, the next 40 “owned” by Bob, and the final 20 to be “owned” by Joe, who also gets to own reporting, you may find rule updates happening more frequently. There is accountability and you can follow up with Bill, Bob, and Joe to check the status of the tasks. If there isn’t progress, you can further narrow down the issue, whether it’s a skillset gap or work overload, and then coach the employee accordingly.

Many argue that assigning work to an individual rather than a team introduces a skillset gap when that employee leaves. The advantages of assigning it to an individual are a better understanding of the task via specialization and repetition, better documentation of the task as a result of the understanding, and ultimately a position for the individual to improve the condition of the SIEM relating to the task, for example correlation rule updates. Having a group manage something that is not well understood leads to the team ignoring the task, something they can do when no one is accountable for it. A group that doesn’t understand the task cannot document it properly or improve its condition. There’s nothing more frustrating working on something you don’t understand.

An overall RACI is a requirement for any SIEM environment, but as all organizations are different, how a team manages tasks within itself should be at the discretion of leadership.

Sample SIEM RACI

We’ll walk through a sample SIEM RACI to give you an idea on what it may look like in your organization. The RACI will be divided into subsections below by Category and commented on individually. A link to the full RACI Matrix is available at the end of the article.

The Stakeholders in this sample RACI are the SIEM Owner, Engineering, the Content Team, and Incident Response, who all fall under the Security Operations team. The Server Support and Storage Support teams fall under a different line of business, Infrastructure Services.

The first Category is Governance, and you can clearly see how the SIEM Owner is both Accountable and Responsible for the overall SIEM solution, dealing with the vendor, and internal escalations from any stakeholders.

The second Category is Architecture and Design, in which the SIEM Owner is also Accountable and Responsible, but Consults the Engineering, Content, and Incident Response teams. The SIEM Owner needs to work with them to make sure their requirements are met, the search speeds are adequate, the required data sources are available, and that the SIEM solution adequately meets all these requirements, and if not, are built into future versions.

For the Logging Configuration category, the SIEM Owner needs to make sure not only are the required log sources logging to the SIEM, but that they are logging the correct data. Engineering needs to be Consulted to ensure correct parsing, and the Content and Incident Response teams need to make sure the data they need is available within the logs.

The SIEM Owner is also Accountable and Responsible for leading all new projects, and ensuring the SIEM solution is compliant with the organization’s compliance and governance standards. You can also see at this point the SIEM Owner isn’t a mere decision maker; he or she will be active in the management of your company’s SIEM.

The Engineers are Accountable and Responsible for the health and stability of the SIEM solution, and to ensure data feeds are integrated into the SIEM correctly. They do everything from application support to patching. The only two support-related tasks that they are not Accountable and Responsible for are Server and Storage Support, but will be Consulted when necessary.

The Content Team are the SIEM end users, and are strictly Accountable and Responsible for developing and maintaining rules and reports. They are also active in providing input for new use cases, but the Accountability and Responsibility for that task falls on the SIEM owner.

The Incident Response Team is Accountable and Responsible for responding to the alerts generated by the correlation rules, and reviewing reports. They are also Accountable and Responsible to provide tuning recommendations for the rules and reports based on their investigations and observations.

The Engineers tried to get the Content Team to manage user accounts, but they lost the battle and ended up getting the task.

 

As you can see, a RACI is a simple document that can clarify who is responsible for what part of the SIEM environment. It can also be used by leadership to quantify work efforts, assist in understanding the various tasks employees do, and identify areas that require improvement. Issues can be raised and be visible to leadership on Monday morning instead of Friday afternoon, or during a breach.

A RACI is not practical without three other major requirements: clear ownership, a teamwork-oriented environment, and a strong SIEM skillset. Clear ownership gives the owner an incentive to maintain and improve the SIEM, and prevents issues from being ignored or assuming they’re the responsibility of another entity. A high-performing team maintains and improves the environment, retains highly-skilled staff, and attracts new talent into the team. A strong SIEM skillset allows staff to execute the required tasks. All of this contributes to a better return on investment the SIEM will provide your organization, and ultimately a better security posture.

 

Link to a sample SIEM RACI Matrix: Sample_SIEM_RACI

Please like, share or comment if you found this article useful. Thank you!

A SIEM Odyssey: How Albert Einstein Would Have Designed Your SIEM Architecture

Albert Einstein taught us that there are four dimensions: the three physical dimensions plus time. The light being generated by the sun exists, but it will take about eight minutes to reach the earth before it exists in our environment. Many of the lights you see in the sky at night were generated by stars millions of years ago, and may no longer exist today.

The four dimensions of spacetime can teach us a lot about the universe, and now a good lesson on SIEM architecture design. In SIEM environments, log data is sent through various layers, introducing a delay between the data source and destination. In a properly designed SIEM architecture, the delay between the source and destination should be minimal, a few minutes at most. But in an undersized SIEM architecture, delays between the source and destination can be high, and in worst cases, data may not reach the destination at all.

SIEM environments have three main layers. The first is the data sources, the various Windows servers, firewalls, and security tools that will either send data to your SIEM, or your SIEM will pull data from. The second layer is the Processing Layer, which consists of applications (Connectors/Forwarders/Ingestion Nodes) designed to process and structure log data, and forward it. The final layer is the Analytics Layer, which is where log data is stored, security analytics is performed, and end users search for data.

To highlight the risk introduced into your organization by an undersized SIEM architecture, we’ll use a DDoS attack against your organization as an example. The Bad Guyz Group has found a clever way to funnel millions out of your organization. Before they initiate the fraud scheme, they want to distract your organization from what is actually going on in order to buy themselves time, and thus launch a large-scale DDoS against your web servers.

Your DDoS protections begin sending out alerts, notifying your SOC of the attack and that there’s no concern at the moment. The amount of traffic directed at your web servers seems to be increasing, but is not near a level that will void your DDoS protections. Your SOC notifies leadership that even though there’s an active attack in progress, there’s nothing to be worried about.

While your DDoS protection is working as expected, your SIEM Processing Layer is being flooded with a 400% increase in firewall and proxy traffic. Your SIEM Processing Layer was only designed to process 10,000 events per second (EPS), and is now struggling to process a surge of 40,000 EPS. Cache files start to appear within minutes, growing at a rate of 100GB per hour, which will exhaust cache space within eight hours.

Hours later, SOC Analysts notice that the timestamps on most of the log data are several hours behind. They send an email to the SIEM Engineers, asking to see if there’s anything wrong with the SIEM application. When the SIEM Engineers get out of their project meeting a few hours later, they login to the servers and notice the cache files, extremely high EPS rates, and maxed-out RAM/CPU usage. They then notice that the surge in data is being caused by firewall and proxy logs. After a conversation with the SOC, the SIEM Engineers are then informed of the DDoS attack that happened earlier in the day.

Later in the evening, the SIEM Engineers warn leadership that the Processing Layer is dropping cache files of log data and is refusing new connections, resulting in data loss. The average log data delay now stands at eight hours as the DDoS attack continues.

Fortunately, the DDoS attack stopped the following morning, and the SIEM Processing Layer began reducing the amount of cache files on the servers. The SIEM Engineers anticipate that cache files should be completely cleared by 5PM.

Later that morning, the SOC Manager gets a call from the Fraud team, asking if they can see traffic to several IP Address. The SOC Analysts begin searching, but the response times are very slow and the latest data available is from last night. Just as the SIEM Engineers expected, by 5PM all cache files were cleared, and analysts were searching data in real-time again. They found only one hit from one of the IPs provided. The Fraud team insists there should be more than that, but the SIEM Engineers note that the other hits may have been dropped when the Processing Layer was refusing new connections during the surge in data. Leadership isn’t happy, and calls for an immediate review of the SIEM environment.

The bad news is that many SIEM environments are not sized appropriately to deal with such scenarios, or with legitimate data surges in general. These situations can leave your organization blind to an attack in progress, as the data required for an investigation exists, but is not yet available to your analysts, or worse is being dropped from existence.

The good news is that you can significantly reduce the probability and severity of this scenario. While SIEM environments can be expensive, the costly part is typically the Analytics Layer, and for many organizations over-sizing this layer isn’t an option. However, the Processing Layer tends to be much less expensive, and in some cases would simply result in the cost of the physical or virtual servers required.

A SIEM Processing Layer should be significantly larger than your sustained average event per second rate. While this number is a requirement to determine SIEM application licensing costs, many organizations make the mistake of sizing their SIEM according to this metric. In addition to spikes, the amount of traffic received by your SIEM during the day is likely to be much higher than at night. If your sustained EPS rate is 20,000 EPS, then it can be possible for your EPS during the day to be 30,000, and EPS at night to be in the 5,000-10,000 EPS range. If you receive a spike in traffic during the day, the 30,000 EPS can turn into 60,000 EPS. In many SIEM environments, this would cause the Processing Layer to quickly exhaust caches and begin dropping data. Supporting a large spike in traffic could simply be done by adding more devices (Connectors/Forwarders/Ingestion Nodes) within the Processing Layer. The increase in processing power and overall cache availability would reduce the risk of log transmission delays and data dropping.

In addition to reducing the probability of data delay and loss, an over-sized Processing Layer brings high availability benefits, and as well can make migrations and upgrades easier. If you have a single point of failure, you can lose your Processing Layer entirely if the device fails. If you have enough Connectors/Forwarders only to process your sustained EPS rate, you risk the above scenario when one of the devices within the Processing Layer fails, as the others have to make up for the extra EPS rates. If you need to upgrade your Processing Layer, the extra devices can make the upgrade smoother and transparent to any operational issues.

While we may have solved the issue with the Processing Layer, the surge in data can also result in transmission delays, data loss, slower end-user search response times, and system stability issues on the Analytics Layer. However, while it may seem logical to build an Analytics Layer to support double the sustained EPS rates, it can be cost prohibitive for many organizations. An adequately-sized Processing Layer can assist during surges by aggregating data (combining similar events into one, for SIEM products that support aggregation), caching it locally, and limiting the EPS-out rates to the Analytics Layer. Your SIEM Engineers can also control what data is sent over others, so if there’s a dire need for a particular data source, your staff can limit other data sources to be sent to the Analytics Layer so that the pertinent sources can be sent in priority.

In summary, there’s a strong return on investment for building an adequate SIEM Processing Layer given the low costs, risk reduction, and invaluable security benefits. Even with a minimal Analytics Layer, a properly-sized Processing Layer will be sufficiently able to process a surge of data, cache data that can’t be forwarded, prevent data loss, and reduce risks caused by large increases in log data. Make the investment and leave spacetime issues for the physicists!

Please like, share, or comment if you enjoyed this article. Thank you!

The Pros and Cons of Structuring Log Data at Ingestion Time with SIEMs

Another important but often overlooked part of a SIEM architecture and design or product analysis exercise is whether the product structures the data or not as it’s ingested/processed, and how that can affect your organization’s environment. This seemingly miniscule functionality can have a significant effect on your SIEM environment, and can even introduce risk into your organization.

Let’s start with the advantages of structuring (parsing) log events at ingestion time. In general, structuring data as its ingested/processed can give you many opportunities to manipulate the data in a positive way.

Advantages of Structuring Log Data at Ingestion Time

1. Ability to Aggregate

Aggregation gives the SIEM tool the ability to combine multiple, similar events into one single event. The biggest advantage of this is the reduction in EPS rates processed by the SIEM and the reduced storage requirements. Ten firewall deny events from the same source and destination can be combined into one, using 1,000 bytes of storage instead of 10,000 bytes. SIEM tools typically have the ability to aggregate hundreds of events over a period of thirty seconds or more, so it’s common to see aggregation successfully combine hundreds of events into one. Caution must be used with aggregation settings as the higher the aggregation window (the maximum amount of time the Connector/Log Processor will wait for similar events to combine into one), the higher the memory requirement for it.

2. Ability to Standardize Casing

Most SIEM tools can easily standardize casing of all fields parsed by the Connector/Log Processor. This is often an overlooked benefit that comes with structuring data, as the various data sources in your environment will log in various casings, and thus introduce a potential security risk. The security risk that can be introduced is that your security analysts may be getting null search results when the data they need is actually there.

In several investigations, SOC analysts were having issues getting hits for a particular user’s data. Upon closer inspection, we found the desired data, and discovered that the initial searches were coming up null because the casing in the searches did not match that of the log data. The SOC analysts were searching for “frank,” but the SIEM tool was configured to be case-sensitive, and the Windows logs were being logged in uppercase as “FRANK.” Thus, by simply having the Connector standardize casing for particular fields, you can minimize the above scenarios.

Standardizing casing can also increase search performance. When the SIEM tool only has to search in one case, it reduces the amount of characters it needs to search for. A simple search for “FRANK” only requires the tool to match on five characters instead of thirty-two (FRANK, frank, Frank, FRank, FRAnk, etc).

Why not simply disable case-sensitivity for searches? This is seemingly the best option, as the risk of missing data is mitigated. However, the major disadvantage of this option is that it increases the processing power required for the searches. For smaller environments, this is practical and the effects will likely be negligible. However, in larger environments where the SIEM is processing several thousand events per second and there are several end users, the results can be noticeable.

A practical work around can be to disable case-sensitivity for particular searches at the discretion of the analyst. Many SIEM tools offer this option for this very reason. There’s typically an option before the search to disable case-sensitivity.

A best practice to mitigate missing data due to casing issues while maximizing performance is to start with a generic case insensitive search, and once you get hits on the data you’re searching for, switch to the casing you see. For example, if you’re looking for user Frank’s Windows logs, start with a small, e.g. few minute, case-insensitive search “frank”, and once you see that the Windows logs are logging it as “Frank,” switch to the proper casing and then expand the search. This is a practical option that will help analysts avoid missing data, and will not require you to configure your tool to be case insensitive.

Regardless of how you chose to configure case-sensitivity, simply ensure your staff understand how your environment works and best practices for searching your data.

3. Ability to Add and Modify Fields

Many SIEMs can append data to existing fields, override fields with new data, and modify values put into fields. A common nuisance when searching for log data is how some systems have their FQDN logged (e.g. server01.ca.companya.com) while others simply log the server name (server01). This can cause a similar risk as case-sensitivity, where SOC analysts search for Device Host Name =”server01” but get no results as the server appears in the logs as “server01.ca.companya.com”. This forces the SOC analyst to do a wildcard search of Device Host Name =”server01*” etc, and ultimately requires more processing power from the SIEM.

When data is structured/parsed at ingestion time, the SIEM can do a simple lookup of the e.g. first period, strip whatever proceeds the period, and then put that into another field. Using “server01.ca.companya.com” as an example, the parser can leave server01 in the host name field, and then put the stripped ca.companya.com in the e.g. domain field. Analysts then know that they only need to search for the server name in the host name field, and to search the domain field if they want to know the domain of the server.

 

Now that we’ve fallen in love with the advantages of structuring data at ingestion time, let’s look at the disadvantages before we leave for the honeymoon.

The Disadvantages of Structuring Log Data at Ingestion Time

1. Increased Event Size

The first, and potentially most costly aspect of structuring the data at ingestion time is the increase in event size. When you structure the data, you increase the size of the event, in many cases doubling its size or more. Please see my related article, The Million Dollar SIEM Question: To Parse or Not To Parse, for more on this.

2. Potential Data Loss and Integrity Issues

Given that your parser is instructed to place values in specific fields, for example taking the value after the second comma and putting it into the user name field, there’s potential integrity and data loss issues if your parser is not updated at the same time the logging format for a particular data source changes.

Let’s take a look at a sample log event:

01-11-2018 14:12:22, 10.1.1.1, frank, authentication, interactive login, successful
The parser takes the timestamp from the characters before the first comma, the IP Address after the first comma, the user name after the second comma, the type of event after the third comma, the type of login after the fourth comma, and finally the outcome after the fifth comma. All is well until the vendor decides to add a new field, an event code, and change the order of the events:

01-11-2018 14:12:22, 4390000, 10.1.1.1, authentication, interactive login, successful, frank

This is a simple change, but your parser needs to be updated to ensure that the values are put into the correct fields, and to add in the new field. Should this new log type be implemented without a corresponding parser change, we’re not only going to have data in the wrong fields, we’re not going to know who did the login, as the value “frank” will not be visible to the parser.

3. Increased System Requirements

The more modifications the parser has to make to the event, the more processing power the Connector/Forwarder/Processor will consume. Ensure there are sufficient system resources able to process the required modifications.

 

A Summary of the Pros and Cons of Structuring data at ingestion time: