Azure Sentinel Lists and Rules

One of the first questions I had about Azure Sentinel was if it supports “Lists.” Lists are available in most (if not all) SIEMs, and how they work in each differs. Lists can help end users create use cases, store selected data outside of retention policies, blacklist/whitelist, and more. You can read more about the utility of SIEM Lists in a previous post here.

Regarding Sentinel, the answer is yes, it supports two main types of lists: temporary lists that are created and used in queries, and external lists (e.g. CSV files hosted in Azure Storage) that can be used for lookups. You can also create a custom log source via the CEF Connector and use that as a pseudo list.

In this post we’ll create a couple of lists and analytics rules that will trigger off values in the lists. We’ll use the data generated from the custom CEF Connector created in a previous post here.

The first use case will detect when the same user account has logged in from two or more different IP addresses within 24 hours, a common use case to flag potential account sharing or compromised accounts. The second use case will trigger when a login is detected from an IP found in a list of malicious IP addresses.

First, let’s create a query to put the users that are logging in from 2 or more different IP addresses into a list called ‘suspiciousUsers’.

Next, let’s take the users from the list and then query the same log source to show the applicable events generated by those users. The results show us all the “Login Success” events generated by the users in the list. We could also use this list to query other data sources in Sentinel.

Query:

let suspiciousUsers =
CommonSecurityLog
| where TimeGenerated >= ago(1d)
| where DeviceProduct == “Streamlined Security Product”
| where Message == “Login_Success”
| summarize dcount(SourceIP) by DestinationUserName
| where dcount_SourceIP > 1
| summarize make_list(DestinationUserName);
CommonSecurityLog
| where TimeGenerated >= ago(1d)
| where DeviceProduct == “Streamlined Security Product”
| where Message == “Login_Success”
| where DestinationUserName in (suspiciousUsers)

So instead of adding the applicable events to the list as they occur and then have a rule query the list, we are simply creating the list in real-time and then using the results in another part of the query. Since the list is temporary, the major thing to consider here is ensuring your retention policies are in line with your use case. This is not an issue with this use case as we are only looking at the past 24 hours, but if you would like to track e.g. RDP authentication events over 6 months, you would need 6 months of online data.

For the next list, we’ll use our CEF Connector to ingest a list of malicious IPs from a threat intelligence feed. We’ll use a simple Python script to write the values in the file to the local Syslog file on a Linux server, which will then be forwarded to Sentinel by the CEF Connector. The IPs in the file were randomly generated by me.

The CSV file has three columns: Vendor, Product, and IP. The values look as follows:

Using an FTP Client (e.g. WinSCP), copy the CSV file to the server.
Next, let’s create a file, give it execute permissions, and the open it.

touch process_ti.py
chmod +x process_ti.py
vi process_ti.py

Paste the script found here into the file, save and close, then run it.

./process_ti.py

Let’s check that there are 300 entries from our CSV file:

CommonSecurityLog
| where DeviceVendor == “Open Threat Intel Feed”
| summarize count()

Now that we can assume the ingestion was successful, let’s make a list named ‘maliciousIPs’. We’ll use this list to match IPs found in the Streamlined Security Product logs.

let maliciousIPs =
CommonSecurityLog
| where TimeGenerated >= ago(1d)
| where DeviceVendor == “Open Threat Intel Feed”
| summarize make_list(SourceIP);
CommonSecurityLog
| where TimeGenerated >= ago(1d)
| where DeviceProduct == “Streamlined Security Product”
| where SourceIP in (maliciousIPs)

Output should look as follows, showing the authentication events from IPs in the ‘maliciousIPs’ list.


Now that we can lookup the data with the queries, let’s create a couple of analytics rules that will detect these use cases in near real-time.

From the Analytics menu, select ‘Create’, then ‘Scheduled query rule’.


Enter a name and description, then select ‘Next: Set rule logic >’.


Enter the query used for the first list (suspiciousUsers), and then we’ll map the DestinationUserName field to the ‘Account’ Entity Type, and SourceIP field to the ‘IP’ Entity Type. You need to click ‘Add’ in order for it to be added to the rule query. Once it’s added the column value will say ‘Defined in query’.


For Query scheduling, run the query every five minutes, and lookup data from the last hour. Set the alert threshold to greater than 0, as the threshold for this use case is already set in the query (2 or more IPs for the same user). We’ll leave suppression off.

One of the nice things about creating rules in Sentinel is that it shows you how many hits your rule will trigger based on your parameters. The graph saves you from doing this yourself, which you would likely do when creating a use case.

We’ll leave the default settings for the ‘Incident settings (Preview)’ and ‘Automated response’ tabs, and then click ‘Create’ on the ‘Review and create’ tab.


Once the rule is created, we can go to the ‘Incidents’ tab to see triggered alerts. We can see that the rule was already triggered by three user accounts.


Next, let’s create a rule that triggers when a user logs in from an IP in the ‘maliciousIPs’ list we created.


We’ll add the query and Map entities as we did in the prior rule.


We’ll schedule the query and set the threshold as follows.


We’ll leave the default settings for the ‘Incident settings (Preview)’ and ‘Automated response’ tabs, and then click ‘Create’ on the ‘Review and create’ tab.


Once the rule is created, we can go back to the Investigations page and see that it has already been triggered by three users.


As you can see, lists are easy to create and can be useful when writing queries and developing use cases. You can also use an external file hosted in Azure Storage and access it directly within a query. For further reading on this topic, there are some helpful posts available on the Microsoft site here, and here.

Script to read from CSV file and write to Syslog in CEF Format

Sample Python script that opens a CSV file and writes the values in CEF format to the local Syslog file on a Linux server. Designed to be used with this post.

#!/usr/bin/python
## Simple Python script designed to read a CSV file and write the values to the local Syslog file in CEF format.
## Frank Cardinale, April 2020

## Importing the libraries used in the script
import syslog
import csv
with open('sample_malicious_IPs.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:

        #Creating a value that will be used to write to the Syslog file. Rows added to applicable CEF fields.
        syslog_message = "CEF:0|" + row[0] + "|" + row[1] + "|1.0|1000|ThreatIntelFeed|10|src=" + row[2]

        #Writing the event to the Syslog file.
        syslog.openlog(facility=syslog.LOG_LOCAL7)
        syslog.syslog(syslog.LOG_NOTICE, syslog_message)

SIEM Lists and Design Considerations


Those familiar with creating use cases in SIEMs have likely at some point worked with “Lists.” These are commonly known in SIEMs as “Active Lists” (ArcSight), “Reference Sets” (QRadar), “Lookups” (Splunk), “Lookup Tables” (Securonix, Devo), and similar in other tools. Lists are essentially tables of data, and you can think of them as an Excel-like table with multiple rows and columns. Lists are different in each of the SIEMs on the market. Some are simply a single column which you can use for e.g. IP Addresses, and others are up to 20 columns that can support a significant amount of data. Log retention policies typically don’t apply to Lists, so you can keep them for as long as needed.

SIEM Lists have three main drivers: limitations with real-time rule engines, limited retention policies, and external reference data.

Limitations with Real-Time Rule Engines

SIEMs with real-time rule engines have the advantage of triggering your use cases as data is ingested (versus running a query every 5 minutes). But the real-time advantage turns out to be a disadvantage when you have a use case that spans a greater timeframe. The longer the timeframe of the use case, the more system resources used by the real-time engine, thus making some use cases impossible. For example, real-time rule engines can easily detect 10 or more failed logins in 24 hours, but not over three months–that would be far too much data to keep in memory. To compensate for this, Lists can be used to store data required by use cases that can’t be done via the real-time rule engine. The List can store, for example, RDP logins over a much longer period, e.g. for one year, including the login type, username, hostname, IP address, and possibly more depending on your SIEM. You can then trigger an alert when the count for a particular user reaches the desired threshold based on the amount of entries in the List.

Limited Retention Policies

Limited retention policies were also a large driver for Lists. Most SIEM environments only have 3 months of online data. In order to access data older than that, it must be e.g. restored from an archive/backup, which is typically inconvenient enough that an analyst won’t even ask for it. With Lists, you can store selected data outside of your retention policies. If you want to store RDP logins for longer than your retention policy allows, you can simply add the values to a List.

External Reference Data

SIEMs are extremely effective at matching data in log files. The advent of threat intelligence data brought security teams massive lists of malicious IP addresses, hostnames, email addresses, and file hashes that needed to be correlated with firewall, proxy, and endpoint protection logs. These threat intel lists can be easily put into a List and then correlated with all applicable logs. Most (if not all) SIEM products support these types of Lists.

Other List Uses

Lists can often enhance your use case capabilities. If your SIEM product can’t meet all of the conditions of a use case with its real-time engine or query, you can sometimes use Lists to compensate. For example, you can put the result of the first use case into a List, and then use a second use case that uses both the real-time engine and values in the List.

Lists can be useful for whitelisting or suppressing duplicate alerts. For example, you can add the IP, username or hostname of the event that triggered an alert to a List (e.g. users/domains that are already being investigated), and use the List to suppress subsequent alerts from the same IP, username, or hostname.

Lists can also help simplify long and complicated queries. Instead of writing a single query, you can put the results of the first part of a query into a List, and then have the second query run against the values in the List.


As you can see, Lists can be very useful for SIEM end users. Overlooking List functionality during a SIEM design can have profound impacts. While List functionality differs per SIEM, it’s important to understand how your SIEM works and ensure it meets your requirements.

Integrate Custom Data Sources with Azure Sentinel’s CEF Connector

Microsoft Azure Sentinel allows you to ingest custom data sources with its CEF Connector. For those not familiar with CEF (Common Event Format), it was created to standardize logging formats. Different applications can log in wildly different formats, leaving SIEM engineers to spend a large portion of their time writing parsers and mapping them to different schemas. Thus, CEF was introduced to help standardize the format in which applications log, encourage vendors to follow the standard, and ultimately reduce the amount of time SIEM resources spend writing and updating parsers. You can find more information on CEF on the Micro Focus website.

With Azure Sentinel, you can ingest custom logs by simply writing in CEF format to the local Syslog file. Many data sources already support Sentinel’s CEF Connector, and given how simple it is, I’m sure your developers or vendors wouldn’t mind logging in this format if asked. Once the data source is logging in CEF and integrated with Sentinel, you can use the searching, threat hunting, and other functionality provided by Sentinel.

To highlight this, we’re going to write to the Syslog file on a default Azure Ubuntu VM, and then query the data in Sentinel. This activity is simple enough to be done with basic Azure and Linux knowledge, and can be done with Azure’s free subscription, so I would encourage anyone to try it.

Requirements:
-Azure subscription (Free one suffices)
-Basic Azure knowledge
-Basic Linux knowledge
-Azure Sentinel instance (default)
-Azure Ubuntu VM (A0 Basic with 1cpu .75GB RAM suffices)

Once you have an Azure subscription, the first step is to create an Azure Sentinel instance. If you don’t already have one, see the “Enable Azure Sentinel” section on the Microsoft Sentinel website.

Once you have an Azure Sentinel instance running, create an Ubuntu VM. Select ‘Virtual Machines’ from the Azure services menu.

Select ‘Add’.

Add in the required parameters:

At the bottom of the page, select ‘Next: Disks’.

Leave all default values for the Disk, Networking, Management, Advanced, and Tags sections, and then select ‘Create’ on the ‘Review and create tab’.

Add a firewall rule that will allow you to SSH to the server. For example, you can add your IP to access the server on port 22.

Next, select ‘Data connectors’, the ‘Common Event Format (CEF)’ Connector from the list, then ‘Open connector page’.

Copy the command provided. This will be used on the Ubuntu server.

Connect to the Ubuntu server using an SSH client (e.g. Putty).

Once logged in, paste the command, then press Enter.

Wait for the install to complete. As noted on the CEF Connector page, it can take up to 20 minutes for data to be searchable in Sentinel.

You can check if the integration was successful by searching for ‘Heartbeat’ in the query window.

Next, we’re going to use the Linux logger command to generate a test authentication CEF message before using a script to automate the process. We’re going to use the standard CEF fields, and as well add extensions ‘src’ (Source Address), ‘dst’ (destination address), and ‘msg’ (message) fields. You can add additional fields as listed in the CEF guide linked at the beginning of this post. Command:

logger “CEF:0|Seamless Security Solutions|Streamlined Security Product|1.0|1000|Authentication Event|10|src=10.1.2.3 dst=10.4.5.6 duser=Test msg=Test”

where:
CEF:Version|Device Vendor|Device Product|Device Version|Device Event Class ID|Name|Severity|Source Address|Destination Address|Message

The event appears as expected when searching the ‘CommonSecurityLog’, where events ingested from the CEF Connector are stored:

Now we’re going to use the Python script at the end of this post that will generate a sample authentication event every 5 minutes. Simply create a file, give it execute permissions, then open it with vi. Be sure to put the file in a more appropriate location if you plan on using it longer-term.

mkdir /tmp/azure
touch /tmp/azure/CEF_generator.py
chmod +x /tmp/azure/CEF_generator.py
vi /tmp/azure/CEF_generator.py

Paste the script into the file by pressing ‘i’ to insert, and then paste. When finished, exit by pressing Esc, and then save and exit, ‘:wq’.

Run the file in the background by running the following command:

nohup python /path/to/test.py &

As expected, events generated on the Ubuntu server are now searchable in Sentinel:

In less than an hour, you now have searchable data in a standard format from a custom application using Sentinel’s CEF Connector. Additionally, you can setup alerts based on the ingested events, and work with the data with other Sentinel tools such as the incident management and playbook apps.

CEF generator Python script link here.

CEF Event Generator

This is a sample CEF generator Python script that will log sample authentication events to the Syslog file. Created and tested on an Azure Ubuntu 18.04 VM. Please check indentation when copying/pasting.

#!/usr/bin/python
# Simple Python script designed to write to the local Syslog file in CEF format on an Azure Ubuntu 18.04 VM.
# Frank Cardinale, April 2020

# Importing the libraries used in the script
import random
import syslog
import time

# Simple list that contains usernames that will be randomly selected and then output to the "duser" CEF field.
usernames = ['Frank', 'John', 'Joe', 'Tony', 'Mario', 'James', 'Chris', 'Mary', 'Rose', 'Jennifer', 'Amanda', 'Andrea', 'Lina']

# Simple list that contains authentication event outcomes that will be randomly selected and then output to the CEF "msg" field.
message = ['Login_Success', 'Login_Failure']

# Endless loop that will run the below every five minutes.
while True:

    # Assigning a random value from the above lists to the two variables that will be used to write to the Syslog file.
    selected_user = random.choice(usernames)
    selected_message = random.choice(message)

# Assigning a random integer value from 1-255 that will be appended to the IP addresses written to the Syslog file.
    ip = str(random.randint(1,255))
    ip2 = str(random.randint(1,255))

# The full Syslog message that will be written.   
    syslog_message = "CEF:0|Seamless Security Solutions|Streamlined Security Product|1.0|1000|Authentication Event|10|src=167.0.0." + ip + " dst=10.0.0." + ip + " duser=" + selected_user + " msg=" + selected_message

# Writing the event to the Syslog file.
    syslog.openlog(facility=syslog.LOG_LOCAL7)
    syslog.syslog(syslog.LOG_NOTICE, syslog_message)

# Pausing the loop for five minutes.
    time.sleep(300)

# End of script

Standard-Size your SIEM HA and DR Environments

A common decision made when designing a highly available or disaster recovery enabled SIEM solution is to under-size the secondary environment with fewer resources than the main production environment. Many believe that losing a server to a hardware failure or application due to corruption is highly unlikely, and if such a situation were to occur, a system with fewer resources can suffice while the primary system is down. Thus, with a minute probability of having a server or application failure, many believe they can get away with fewer RAM, cores, and disk space on the HA or DR server(s). After all, none of us want to bet on something that isn’t likely to happen.

For many organizations, this may be an acceptable risk for many reasons, including budgetary restrictions, other security compensating controls, risk appetite, and others. For example, a small company simply may not have the budget for a highly available SIEM solution. For others, it may be that their other security applications provide compensating controls, where their analysts can obtain log data from other sources.

For organizations looking to implement high availability in some fashion, it’s important to understand how small differences can have a major impact.

To highlight what can happen, let’s use Company A as an example. Company A is designing a new SIEM estimated to process 10,000 EPS. The company has requested additional budget for HA capabilities but want to save some of the security budget for another investment. They find an unused server in the data centre that has fewer RAM, cores, and disk space, and decide to use it as the DR server. They ask their SIEM vendor for their thoughts, and the vendor replies that the reduced system resources should only result in slower search response times, and the 2 TB hard drive should provide just over four days of storage (given 10,000 EPS X 2,500 bytes per normalized event X 86,400 seconds/day X 80% compression = 432GB/day after compression). Company A accepts the risk, thinking that any system issue should be addressed within their 24-hour hardware SLA, and that they should have the application back up in two days at most.

A few months down the road, the Production SIEM system fails. The operations staff at Company A quickly reconfigure the SIEM Processing Layer (Connectors/Collectors/Forwarders) to point to the DR server. Log data begins ingesting into the DR server and is searchable by the end users. The security team pats themselves on the back for averting disaster.

However, things go sour when the server admins learn that the hardware vendor won’t be able to meet the 24-hour SLA. Three days pass and the main Production server still remains offline. While the DR server is still processing data and searches are slower but completing, the security team becomes anxious as the 2TB hard drive approaches 90% utilization on the fourth day.

When disk capacity is fully utilized early the following morning, the SIEM Analytics Layer (where data is stored) begins purging data greater than four days old, diminishing the security team’s visibility. The purging jobs are also adding stress to the already fully utilized cores, which are also now causing searches to time out. The Analytics Layer begins refusing new data, forcing the SIEM Processing Layer to cache data locally on the server. That is also a concern for the security team since the single Processing Layer server only has 500GB of disk space, and the 400GB of available cache at this rate would be fully utilized in roughly 4 hours (given 10,000 EPS X 2,500 bytes per normalized event X 14,400 seconds = 360 GB uncompressed) as the SIEM’s processing application can’t compress data (this SIEM can only compress at the Analytics Layer).

As you can see, overlooking a somewhat minor design consideration, making assumptions, relying on SLAs, and so on, can have major impacts on your environment and reduce the utility of your SIEM in a disaster. As SIEMs consist of multiple applications (Connectors, Analytics), the required high availability should be considered for the various components. Forgoing high availability may need to be done for budgetary or other reasons, but it’s critical to ensure that the environment is aligned to your requirements and risk appetite.

Managing Your SIEM Internally or Outsourcing

Given the advent of cloud computing, many companies are now outsourcing at least some parts of their IT infrastructure and applications to third parties, allowing them to focus on their core companies. Many organizations don’t want to invest in an IT department or data center, and can’t match the speed and efficiency that Amazon AWS or Microsoft Azure can provide.

While your SIEM environment is likely one of your more complicated security applications to manage, you can outsource one or more parts of it to third parties, including content development, application, and infrastructure management. For example, you can focus on developing content internally by your security analysts, have the application setup and maintained by the vendor or other third party, and have the infrastructure hosted in AWS or Azure.

Before embarking on an outsourcing initiative, there are some major considerations that need to be taken before outsourcing one or more parts of your SIEM environment.

Determining which parts to outsource

In order to determine which parts of your SIEM environment to keep or outsource, you’ll need to understand your competencies. Do you have a significant amount of staff with sufficient security competencies, but not SIEM specifically? Do you have a SOC that is proficient in both SIEM content development and investigations? Are you willing to invest in obtaining and retaining scarce SIEM staff? Does your line of business and its regulations force you to keep all data internally? Do you want to keep your SIEM staff internal for a few years and then consider outsourcing it? Can your data centre rack-and-stack a server within a day, or does it take six months?

While these can be difficult questions to answer, the good news is that you’ll be able to outsource any part of your SIEM environment. Many of the major consulting firms can offer you SIEM content developers, engineers, investigators, and application experts to manage everything from your alerts to your SIEM application.

A summary of the SIEM environment functions that can be outsourced:
Content Development
-SIEM correlation rules, reports, metrics, dashboards, and other analytics.
Investigations
-Performing security investigations, working with the outputs of alerts, reports, etc.
Engineering
-Integrating data into the SIEM, parser development.
Application Management & Support
-Installing, updating, and maintaining the SIEM application.
Infrastructure
-Setup and maintenance of SIEM hardware, servers, and storage.

Privacy Standards and Regulations

Depending on what country and jurisdiction your organization falls under, you may be subject to laws that restrict the processing and storing of data in particular geographic regions. Azure and AWS have expanded significantly and have infrastructure available in many countries, allowing your data to be stored “locally.” Additionally, Azure and AWS can provide secure, private networks, segregating your data from other organizations.

While it can seem risky to store your data within another entity, many organizations take advantage of Amazon’s and Microsoft’s network and security team rather than building the required teams internally.

Internet Pipe

You’ll need an adequate, fault-tolerant Internet pipe between your organization and your hosting company. The amount of bandwidth needed depends on the SIEM architecture. For example, you may want to collect log data locally by the Processing Layer, where it will be collected, structured/modified, and then sent to the Analytics Layer. If we’re using Product A, which structures log data into an average event size of 2,000 bytes, and our sustained EPS rate is 5,000, then we’ll need 10MB of available bandwidth in order to forward over the link without any latency.

SIEM application that supports your desired outsourced architecture.

Let’s say for example that you want to collect and consolidate log data locally, and then have that consolidated data sent to a SIEM in the cloud, and another copy to your local, long-term data store. Does the application that will be collecting and consolidating log data locally support this model? Is there a limitation that it can only send to one destination, and thus can’t meet the requirement to send to two destinations?

Other Considerations:
Ownership
-Regardless of how you build and operate your SIEM environment, ensure that there’s an owner with a direct interest in maintaining and improving its condition, especially if working with multiple vendors. Multiple parties can point the finger at each other if there are issues within the environment, so it’s critical to have an entity that can prevent stalemates, resolve issues, improve the SIEM’s condition, and ultimately increase the value it provides to your organization.
Contract Flexibility
-If you’re going to be using the RFP process, understand that the third parties you’re submitting the proposal to are there to win business and compete on price. As a result, some third parties may under-size a solution to create a price attractive to the purchaser. While this can simply be considered business as usual, it’s important to understand that your environment may need to be augmented or adjusted during its tenure, and the service provider may ask for additional funds to ensure the environment’s health. Additionally, requirements can change significantly in a short period of time, which can change the infrastructure required for the environment.

Overall, there are many ways to structure your SIEM environment to take advantage of outsourcing. There are many organizations that can help you manage any part of it. How to do it best will depend on your organization’s requirements, line of business, relationships with third parties, competencies, strategy, and growth.

Selecting a SIEM Storage Medium

Given that SIEMs can process tremendous amounts of log data, a critical foundation of your SIEM environment is storage. In order to provide analytical services and fast search response times, SIEMs need dedicated, high-performing storage. Inadequate storage will result in poor query response times, application instability, frustrated end users, and ultimately make the SIEM a bad investment for your organization.

Before running out and buying the latest and greatest storage medium, understanding your retention requirements should be the first step. Does your organization need the typical three months of online and nine months of offline data? Or do you have larger requirements, such as six months of online followed by two years of offline data? Do you want all of your data “hot”? Answering these questions first is critical to keep costs as low as possible, especially if you have large storage requirements.

Once we understand the storage requirements, we can then better understand which medium(s) to use for the environment. If we have a six-month online retention requirement for a 50,000 event per second processing rate, we’re going to need dedicated, high-speed storage to be able to handle the throughput. While we definitely need to meet the IOPS requirement vendors will request, we need to also ensure the storage medium is dedicated. Even if the storage has the required IOPS, if the application can’t access the storage medium when required, the IOPS will be irrelevant. Thus, if using a technology such as SAN, ensure that the application is dedicated to the required storage and that the SAN is configured accordingly.

Another factor to consider when designing your storage architecture for your SIEM environment is what storage will be used per SIEM layer. The Processing Layer (Connectors/Collectors/Forwarders) typically doesn’t store data locally unless the Analytics Layer (where data is stored) is unavailable. However, when the Analytics Layer is unavailable, the Processing Layer should have the appropriate storage to meet the processing requirements. Dedicated, high-speed storage should be used to process large EPS rates, and should have the required capacity to meet caching requirements.

To save on storage costs, slower, shared storage can be used to meet offline retention requirements. When needing access to historical data, the data can be copied back locally to the Analytics layer for searching.

Ensuring you have the right storage for your SIEM environment is a simple but fundamental task. As SIEMs can take years to fully implement and equally long to change, selecting the correct storage is critical. For medium-to-large enterprises, dedicated, high-speed storage should be used to obtain fast read and write performance. While smaller organizations should also make the same investment, there are cases where slower, more cost effective storage can be used for low processing rates and minimal end user usage of the SIEM.

Understanding Your License Model

SIEM license models can vary significantly. Some are simply based on the average ingested data per day, while others can have multiple factors such as ingested data per day, amount of end users, and the amount of devices it collects data from. Regardless of the license model, it’s critical to understand how it works to ensure you don’t under-allocate sufficient funds for it. A misunderstanding of your license model can unexpectedly consume more security budget than anticipated, and thus increase risk to your organization by limiting resources available for both the SIEM and other security services.

Additionally, as most companies are constantly growing and changing, it’s pivotal to understand how the license model can be augmented, changed, and what the penalties are for any violations.

While the simpler the license model the better, there’s nothing wrong with a license model with various factors as long as it’s well understood and meets your organization’s requirements. After a requirements gathering exercise, you should be able to tell your vendor the expected ingestion rates per day, how many users there will be, and the expected growth rates.

There are other less-obvious factors that can also significantly affect license models. Two often overlooked factors are how the vendor charges for filtering/dropping unneeded data, and if the ingested data rates are based on raw or aggregated/coalesced amounts. For example, if you’re planning on dropping a significant amount of data by the Processing Layer, Product A (which doesn’t charge for dropped data) would have lower license costs than Product B (which can drop data, but includes the dropped amount in license costs), all else equal. Product C, which aggregates/coalesces data and determines license costs based on the aggregated/coalesced EPS, would have lower license costs than Product D, which aggregates/coalesces data but determines license costs based on raw EPS rates, all else equal.

If you’re comparing different SIEMs, you should ensure that you’re performing an accurate comparison, as SIEMs can vary significantly. A license model for a full SIEM solution from Company A is likely to be more expensive than a log management-only solution from Company B.

SIEMs can be expensive and consume a significant portion of your security budget. Misunderstanding your requirements and then signing a contract with a license model that’s unclear or difficult to understand is a major risk. Reduce that risk by spending the resources necessary to understand it and choose one that aligns best with your organization.

Timestamp Management

One of the most critical components of a SIEM environment is accurate timestamps. In order to perform any investigation, analysts will need to know when all the applicable events occurred. Given that SIEM applications can have many timestamps, it’s critical for staff to know which are used for what. Improper use of timestamps can have many repercussions and add risk to your organization.

The two most important timestamps in an event are the time which the event was generated on the data source, and the time the event was received by the SIEM. The time which the event was generated on the data source is commonly known as the Event Time. The time the event was received by the SIEM is commonly known as the Receipt Time.

A common question posed by junior staff is why they can’t find the events they’re looking for. Once I rule out a case-sensitivity searching issue, I then check to see if the correct dates/times are being used in the search. For example, a brute-force alert was generated after a user generated 50 failed logins. While the alert was generated in the past 24 hours, analysts can’t find any failed logins for the user that triggered the alert. Upon closer inspection, we see that the alert was generated off the Receipt Time, but the Event Time timestamps are from one week ago. After a quick investigation, we discovered that the system owner took the system offline last week, and just reconnected it to the network 24 hours ago.

Timestamp discrepancies can be even more severe when using dashboards or monitoring for alerts. For example, a SOC uses a dashboard that monitors for particular IPS alerts that were setup to detect an exploit attempt of a new, high-priority vulnerability. The SIEM has been experiencing caching lately, and events are currently delayed 28 hours. The dashboard the SOC analysts are using is configured to use the Event Time timestamp and shows events from the past 24 hours. An alert that occurred 26 hours ago finally clears the cache queue but fails to show up on the dashboard because it doesn’t match the timestamp criteria. Thus, the critical alert goes unnoticed.

While this can be a severe issue, the fix is simple. The SOC analysts could configure the dashboard to use the Receipt Time timestamp instead, so all alerts received in the past 24 hours would show up and be noticeable regardless of when it was generated. Dashboards in general should have both timestamps shown, and staff in general should be aware of the various timestamps used by the SIEM. Another advantage of using both timestamps is allowing staff to be aware when there are delays in receiving log data. Minor delays can be expected, but significant delays can be caught immediately and actioned before large caches are formed.

Some timestamp discrepancies can be normal, especially in large organizations. You’re bound to have at least some out of thousands of servers with misconfigured dates, development machines logging to the SIEM when they shouldn’t be, or network outages that cause transmission delays. Having staff aware of the various timestamps and potential discrepancies can reduce the risk that they turn into larger issues in your organization.