Month: December 2018

Timestamp Management

One of the most critical components of a SIEM environment is accurate timestamps. In order to perform any investigation, analysts will need to know when all the applicable events occurred. Given that SIEM applications can have many timestamps, it’s critical for staff to know which are used for what. Improper use of timestamps can have many repercussions and add risk to your organization.

The two most important timestamps in an event are the time which the event was generated on the data source, and the time the event was received by the SIEM. The time which the event was generated on the data source is commonly known as the Event Time. The time the event was received by the SIEM is commonly known as the Receipt Time.

A common question posed by junior staff is why they can’t find the events they’re looking for. Once I rule out a case-sensitivity searching issue, I then check to see if the correct dates/times are being used in the search. For example, a brute-force alert was generated after a user generated 50 failed logins. While the alert was generated in the past 24 hours, analysts can’t find any failed logins for the user that triggered the alert. Upon closer inspection, we see that the alert was generated off the Receipt Time, but the Event Time timestamps are from one week ago. After a quick investigation, we discovered that the system owner took the system offline last week, and just reconnected it to the network 24 hours ago.

Timestamp discrepancies can be even more severe when using dashboards or monitoring for alerts. For example, a SOC uses a dashboard that monitors for particular IPS alerts that were setup to detect an exploit attempt of a new, high-priority vulnerability. The SIEM has been experiencing caching lately, and events are currently delayed 28 hours. The dashboard the SOC analysts are using is configured to use the Event Time timestamp and shows events from the past 24 hours. An alert that occurred 26 hours ago finally clears the cache queue but fails to show up on the dashboard because it doesn’t match the timestamp criteria. Thus, the critical alert goes unnoticed.

While this can be a severe issue, the fix is simple. The SOC analysts could configure the dashboard to use the Receipt Time timestamp instead, so all alerts received in the past 24 hours would show up and be noticeable regardless of when it was generated. Dashboards in general should have both timestamps shown, and staff in general should be aware of the various timestamps used by the SIEM. Another advantage of using both timestamps is allowing staff to be aware when there are delays in receiving log data. Minor delays can be expected, but significant delays can be caught immediately and actioned before large caches are formed.

Some timestamp discrepancies can be normal, especially in large organizations. You’re bound to have at least some out of thousands of servers with misconfigured dates, development machines logging to the SIEM when they shouldn’t be, or network outages that cause transmission delays. Having staff aware of the various timestamps and potential discrepancies can reduce the risk that they turn into larger issues in your organization.

Monitor for Caching

Caching is a sign that the system is unable to keep up with the volume of data. While some caching can be expected and considered normal, frequent occurrences are an indication of an undersized architecture or application misconfigurations. Excessive caching can result in major delays in receiving log data and ultimately data loss.

Given the risks of data caching, most SIEMs come with monitoring capabilities to alert when caching occurs. These should be implemented to alert when caching is beyond what is considered acceptable. For example, you may expect some minor caching during the day at peak hours, and thus don’t need alerts during this time, but alerts should be generated whenever there is caching outside this period.

Caching can also be detected from the server operating system, where you would see cache files build up in the applicable application directory. Thus, if your SIEM application doesn’t support alerting when caching occurs, you should be able to detect and alert via the OS.

Regardless of how it’s implemented, ensure your environment has appropriate alerting when caching is detected.

Calculate and Configure Caches

Until someone invents a technology that guarantees one hundred percent uptime, we’ll need to accept that at some point in a SIEM environment there will an application or system failure. Additionally, we’ll need to take the application offline at least a few times per year for scheduled maintenance and upgrades. While most SIEM applications have caching capabilities built into them, it’s critical to ensure the environment has appropriate cache sizes configured and sufficient storage. Insufficient storage or misconfigured cache configurations can result in data loss.

Typically in SIEM environments, the Processing Layer (Connectors/Collectors/Parser Layer) is designed to send to the Analytics Layer via TCP, and if it’s unavailable, data will be cached locally on the Processing Layer servers until the Analytics Layer is available again. Thus, the Processing Layer servers will need sufficient local disk space to house the expected caches.

In order to determine what an appropriate cache size is, we need to look at your organization’s requirements, SLAs, and other factors that will help us determine how long an outage can last, and how long it typically takes to resolve issues within your IT department. If you’re certain an outage would last no longer than 3 days, then we need to ensure the Processing Layer servers can support 3 days’ worth of cached log data. Caches can also get large quickly as it’s typically raw, uncompressed data.

To calculate how much storage we’ll need for caching, we can simply take the Average Sustained 24h EPS rate, and then multiply it by the average event size and the amount of seconds per day. For example, if your Average Sustained 24h EPS is 5,000, and your normalized event size is 2,000 bytes, then we’ll need about 864 GB of space per day. So if we have 2 servers in the Processing Layer and we expect an outage to last no longer than 3 days, then we’ll need 1.3 TB of free storage per server to meeting the cache space requirements (864 GB/Day X 3 days = 2.6 TB, or 1.3TB across 2 Servers).

We’ll also need to ensure the application is configured to use the appropriate cache size as well. Many SIEM applications are configured with a default cache size, which may not be sufficient for your environment.

Log Source Verification

A critical function of any SIEM environment is verifying that the intended systems are logging successfully. Systems that are believed to be logging to a SIEM that aren’t pose a significant risk to an organization, creating a false sense of security and limiting the amount of data available for an investigation.

A common mistake that can be made while verifying if a particular system is logging to a SIEM is using the incorrect fields for confirmation. For example, most SIEMs have several IP address and hostname fields, ranging from source IP address, destination IP address, device IP address, and others. Given the multiple fields, it can be confusing to know which is used for what, especially for new staff. This leaves a possibility that staff are pulling incorrect data and providing inaccurate results when performing verification or searching in general.

As an example, Company A is implementing a new Linux server, and the SOC is being asked if they can see logs coming from it, 172.16.2.1. The SIEM application has three IP address fields: Device IP Address, Source IP Address, and Destination IP Address. The Device IP Address field contains the IP address of the server generating the log event. The Source IP Address is the source of the event, and the Destination IP Address field is the target of the event.

One of the SOC analysts performs the verification, searches for “172.16.2.1,” and gets one result:

Event Name: Accept
Source IP Address: 10.1.1.1
Source Port: 22
Device IP Address: 172.16.50.25
Destination IP Address: 172.16.2.1
Destination Port: 22

Without paying attention to the field names, the analyst mistakenly mentions that the new Linux server is logging, when in fact what he’s looking at is an accept traffic event generated from firewall 172.16.50.25, not an event from the new server. The project to implement the new server is now considered completed, and Company A now has a security gap.

While this can be a major issue and add risk to an organization, a simple process can be followed to show staff which fields to use for verification. Your SIEM vendor can also easily tell you which fields to use as well. Learning and education sessions on searching can also be used to address this and ensure staff know how to search effectively.