Category: Operations

Timestamp Management

One of the most critical components of a SIEM environment is accurate timestamps. In order to perform any investigation, analysts will need to know when all the applicable events occurred. Given that SIEM applications can have many timestamps, it’s critical for staff to know which are used for what. Improper use of timestamps can have many repercussions and add risk to your organization.

The two most important timestamps in an event are the time which the event was generated on the data source, and the time the event was received by the SIEM. The time which the event was generated on the data source is commonly known as the Event Time. The time the event was received by the SIEM is commonly known as the Receipt Time.

A common question posed by junior staff is why they can’t find the events they’re looking for. Once I rule out a case-sensitivity searching issue, I then check to see if the correct dates/times are being used in the search. For example, a brute-force alert was generated after a user generated 50 failed logins. While the alert was generated in the past 24 hours, analysts can’t find any failed logins for the user that triggered the alert. Upon closer inspection, we see that the alert was generated off the Receipt Time, but the Event Time timestamps are from one week ago. After a quick investigation, we discovered that the system owner took the system offline last week, and just reconnected it to the network 24 hours ago.

Timestamp discrepancies can be even more severe when using dashboards or monitoring for alerts. For example, a SOC uses a dashboard that monitors for particular IPS alerts that were setup to detect an exploit attempt of a new, high-priority vulnerability. The SIEM has been experiencing caching lately, and events are currently delayed 28 hours. The dashboard the SOC analysts are using is configured to use the Event Time timestamp and shows events from the past 24 hours. An alert that occurred 26 hours ago finally clears the cache queue but fails to show up on the dashboard because it doesn’t match the timestamp criteria. Thus, the critical alert goes unnoticed.

While this can be a severe issue, the fix is simple. The SOC analysts could configure the dashboard to use the Receipt Time timestamp instead, so all alerts received in the past 24 hours would show up and be noticeable regardless of when it was generated. Dashboards in general should have both timestamps shown, and staff in general should be aware of the various timestamps used by the SIEM. Another advantage of using both timestamps is allowing staff to be aware when there are delays in receiving log data. Minor delays can be expected, but significant delays can be caught immediately and actioned before large caches are formed.

Some timestamp discrepancies can be normal, especially in large organizations. You’re bound to have at least some out of thousands of servers with misconfigured dates, development machines logging to the SIEM when they shouldn’t be, or network outages that cause transmission delays. Having staff aware of the various timestamps and potential discrepancies can reduce the risk that they turn into larger issues in your organization.

Monitor for Caching

Caching is a sign that the system is unable to keep up with the volume of data. While some caching can be expected and considered normal, frequent occurrences are an indication of an undersized architecture or application misconfigurations. Excessive caching can result in major delays in receiving log data and ultimately data loss.

Given the risks of data caching, most SIEMs come with monitoring capabilities to alert when caching occurs. These should be implemented to alert when caching is beyond what is considered acceptable. For example, you may expect some minor caching during the day at peak hours, and thus don’t need alerts during this time, but alerts should be generated whenever there is caching outside this period.

Caching can also be detected from the server operating system, where you would see cache files build up in the applicable application directory. Thus, if your SIEM application doesn’t support alerting when caching occurs, you should be able to detect and alert via the OS.

Regardless of how it’s implemented, ensure your environment has appropriate alerting when caching is detected.

Log Source Verification

A critical function of any SIEM environment is verifying that the intended systems are logging successfully. Systems that are believed to be logging to a SIEM that aren’t pose a significant risk to an organization, creating a false sense of security and limiting the amount of data available for an investigation.

A common mistake that can be made while verifying if a particular system is logging to a SIEM is using the incorrect fields for confirmation. For example, most SIEMs have several IP address and hostname fields, ranging from source IP address, destination IP address, device IP address, and others. Given the multiple fields, it can be confusing to know which is used for what, especially for new staff. This leaves a possibility that staff are pulling incorrect data and providing inaccurate results when performing verification or searching in general.

As an example, Company A is implementing a new Linux server, and the SOC is being asked if they can see logs coming from it, 172.16.2.1. The SIEM application has three IP address fields: Device IP Address, Source IP Address, and Destination IP Address. The Device IP Address field contains the IP address of the server generating the log event. The Source IP Address is the source of the event, and the Destination IP Address field is the target of the event.

One of the SOC analysts performs the verification, searches for “172.16.2.1,” and gets one result:

Event Name: Accept
Source IP Address: 10.1.1.1
Source Port: 22
Device IP Address: 172.16.50.25
Destination IP Address: 172.16.2.1
Destination Port: 22

Without paying attention to the field names, the analyst mistakenly mentions that the new Linux server is logging, when in fact what he’s looking at is an accept traffic event generated from firewall 172.16.50.25, not an event from the new server. The project to implement the new server is now considered completed, and Company A now has a security gap.

While this can be a major issue and add risk to an organization, a simple process can be followed to show staff which fields to use for verification. Your SIEM vendor can also easily tell you which fields to use as well. Learning and education sessions on searching can also be used to address this and ensure staff know how to search effectively.

Alerting On Quiet Log Sources

Data sources that stop logging to your SIEM put your organization at risk. If one of your organization’s firewalls stops logging to the SIEM, your SOC will be blind to malicious traffic traversing it. If your endpoint protection application stops logging, your analysts won’t be able to see if malicious files are being executed on one of your billing servers.

In a perfect world, your SIEM should alert when any data source stops logging to your SIEM. While this is feasible in smaller organizations, it can become daunting in large organizations. It’s easier for your SOC to follow up with one system owner who sits a few cubicles over than with 100 system owners from different lines of businesses. The task of remediating several hundred systems not logging to a SIEM can easily consume an entire resource. In large organizations, network outages, system upgrades and maintenance windows can be a regular occurrence. Should you alert on any data source that stops logging to your SIEM in a predefined period, you could easily end up flooding your SOC, and in a worst case scenario, your analysts will develop a practice of ignoring these alerts.

As a best practice, especially in large organizations, a SIEM should be configured to alert when critical data sources stop logging. The data sources should at minimum include critical servers to the business (e.g. client-facing applications), firewalls, proxies, and security applications. A threshold of less than an hour in your organization may generate excessive alerts, as some sources that are file-based may be delayed by design, for example by copying the file to the SIEM every 30 minutes. However, data sources that haven’t logged in one hour may warrant an alert in your organization.

Another thing to consider when remediating systems not logging to the SIEM is that malware experts and threat intelligence specialists may not be the best resources to chase system owners down. While they may not mind the odd alert for this, they’re not likely going to have time to chase down and get 100 system owners to configure their systems properly, or have the patience to continuously follow up with them. Thus, in larger organizations, project management may be a good fit for this task.

Having all your systems log to your SIEM is a critical part of reducing your organization’s risk. Having a practical, manageable task for remediating systems that stop logging will ensure the process is followed and the risk is reduced.

Disable Unused Content

When building new SIEM environments or working with existing ones, one of the quickest ways you can improve the performance and stability of the environment is to remove unused content. While this may seem obvious to experienced SIEM resources, it’s common to find reports or rules running in the background that don’t serve a purpose. In some environments, unused content can be slowing the system down and contributing to application instability. Unused content is especially common in environments that don’t have enough staff to manage the SIEM.

Default rules provided by the vendor are often enabled but unused. The first indication that a rule is unused is if it doesn’t have an action or it isn’t used for informational purposes. If a rule isn’t alerted to the attention of an investigator or SIEM engineer, it may be that the rule is simply running in the background consuming system sources. A rule to trigger an alert when someone logs into the SIEM may be useful, but an ad-hoc report to obtain the same information may suffice. A significant amount of inefficient rules that match a large percentage of events can adversely affect the performance of the environment.

Reports can be another source of unused content. In many environments, I find reports that were originally setup to be used temporarily, but are no longer being used by the recipient. It’s often for the recipient to forget to follow up with the SIEM staff to note the reports are no longer required. Over a period of several years, this can easily amount to several dozen reports running on a regular basis, putting a significant strain on the system for no benefit.

All SIEM environments are different, and there’s no set of content that must be enabled or disabled. But there’s very likely content in your environment that can be disabled, and the system resources can instead be used to provide security analysts better search response times. So on a regular basis or whenever there’s a complaint about search response times or application instability, determine if there’s any content that can be safely disabled.

Effective Searching

There are two critical reasons end users should learn how to search their SIEM effectively. Ineffective searching is a risk to your organization, where end users can produce inaccurate data, and thus provide inaccurate investigation results. Ineffective searching can also degrade the SIEM’s performance, increasing the amount of time required for analysts to obtain data, while affecting the overall stability of the system.

If a security analyst is asked to perform an investigation and searches incorrectly, the results for a query on malicious traffic may return null when in fact there are matches. A compromised user account may be generating significant log data, but your analysts can’t obtain logs for it because they are searching for “jsmith” instead of a case-sensitive “JSmith.” End users can also match on incorrect fields, believing they are finding the correct data when they are not.

Ineffective queries can lengthen the amount of time required to complete them, and increase the system resources used by the SIEM. Many queries can improved to significantly increase their performance, making the end user happier with a faster response time, and a healthier system that has more CPU and RAM to work on other tasks. A simple rule of thumb is to match as early in the query as possible to limit the amount of data the system searches through. Searching for data in particular fields rather than searching all fields is also a way to reduce the amount of processing the system must do. Additionally, some SIEM tools allow you to easily check for poorly performing queries. For example, Splunk’s Search Job Inspector can not only show you which queries are taking the longest, but even which parts of the query are taking longer than others, allowing you to optimize accordingly.

It’s also common for security analysts to get requests for excessive data. In many cases requestors will ask for more information than is required in order to let them drill-down into the information they need, instead of having to submit multiple requests for data. For example, there may be a request to pull log data on a user for the past two months, when all that is required is some proxy traffic for a few days. These types of requests can be resource-intensive on SIEMs, especially if there are multiple queries running simultaneously. The impact can be more severe when the queries are scheduled reports. Scheduling multiple, large, inefficient queries on a regular basis can consume a significant amount of system resources. With a few inquiries to the requestor, the security analyst may be able to significantly reduce the amount of data searched for.

While ineffective searching is a risk, it’s a simple one to reduce. Training sessions, lunch-and-learns, or workshops can significantly reduce the risks of analysts searching incorrectly and consuming unnecessary system resources. I find a simple three-page deck can provide enough information to assist analysts with searching, highlighting the tool’s case sensitivity, common fields, and sample queries. Nearly all SIEM vendors offer complimentary documentation that will show you how to search best with their product. Thus, a few hours of effort can reduce searching risks while optimizing your SIEM environment.