What’s the big deal with log data?

So what’s the big deal with all this log data, and why on earth should I spend a large chunk of my budget to collect it? Aren’t the other security tools I have good enough? What exactly is in all this log data, anyway?

Log data is one of today’s most valuable assets: data. Google, Twitter and Facebook collect enough data on people to detect flu outbreaks faster than medical professionals can. Without owning a single taxi, GPS data gave a software company the opportunity to become the world’s largest taxi service. A computer algorithm can recommend a movie you’d like to watch and spare you from having to read reviews of movie critics. Amazon can tell you what book you’d like to read next or what household products you may be running low on.

In the context of cyber security, log data contains records of activity from your various IT systems. These records can help you understand what goes on inside your network. They can show you which user accounts are being used. They can show you which users are consistently visiting blocked websites. They can show you the suspicious files being blocked by your endpoint protection application. They can highlight suspicious processes running on your servers. They can tell you which exploits your web servers are vulnerable to and if anyone is trying to attack them. Ultimately, they can uncover activity in your network that is adding risk to your organization.

Log data is typically output to a file or database, where it was traditionally used for troubleshooting purposes. If someone couldn’t log into a particular application, the system admins would check the log files to see if they could find out why. If a customer application was down, the support team would check the log files to see if they could find out the cause of the crash.

As the amount of log data grew, many saw that the files sitting on their servers contained invaluable data. Many applications were born to manage all of this data, helping organizations search through it and assist in detecting issues before they became outages. In the early 2000’s, some programmers with a security mindset thought of creating an application that would act as a centralized repository of log data for security investigators, and be able to alert in real-time when particular values or suspicious patterns were detected in the log data. The result of this was the birth of SIEM, Security Information and Event Management.

Let’s take a quick peek at some log data. Here’s a small sample of authentication activity, which is a user failing to login, and then successfully logging into their workstation.

-May 1 2018 1:00PM, IP=10.1.1.1, User=Bob, Message=login failure
-May 1 2018 1:01PM, IP=10.1.1.1, User=Bob, Message=login failure
-May 1 2018 1:02PM, IP=10.1.1.1, User=Bob, Message=login success

Most log files will at minimum answer who, what, when, where, why, and how. Given the advent of SIEMs, most vendors now provide detailed logging for their applications, and some even allow you to customize what is output.

Here you can see a couple of punctual users logging into their company network in the morning, generating VPN login data:

-May 1 2018 8:50AM, IP=23.91.128.44, User=John, message=VPN Login Success
-May 1 2018 8:54AM, IP=23.95.148.12, User=Bob, message=VPN Login Success

Log files can also be specific to an application. Here we have some startup activity on the billing server:

-May 1 2018 9:54AM, hostname=billingserver01, message:NOTICE: Application starting
-May 1 2018 9:55AM, hostname=billingserver01, message:NOTICE: Running startup scripts

That’s great, you may think, but why should you devote resources to collect and manage this data? Let’s expand the above entries and see what the big deal is.

Using the authentication activity again:

-May 1 2018 1:00PM, IP=10.1.1.1, User=asmith, Message=login failure
-May 1 2018 1:01PM, IP=10.1.1.1, User=bsmith, Message=login failure
-May 1 2018 1:02PM, IP=10.1.1.1, User=csmith, Message=login failure
-May 1 2018 1:03PM, IP=10.1.1.1, User=dsmith, Message=login failure
-May 1 2018 1:04PM, IP=10.1.1.1, User=esmith, Message=login failure
-May 1 2018 1:05PM, IP=10.1.1.1, User=fsmith, Message=login failure
-May 1 2018 1:06PM, IP=10.1.1.1, User=gsmith, Message=login failure
-May 1 2018 1:07PM, IP=10.1.1.1, User=hsmith, Message=login failure

These log entries become interesting now that someone is trying to log into the billing server using an incremental version of “smith.” This small story could be many things, from a developer testing something, a script running in the background, or it could be indicative of someone trying to guess a username, attempting to gain unauthorized access to the server.

Let’s take a look at the VPN log again:

-May 1 2018 8:50AM, IP=23.91.128.44, User=Bob, message=VPN Login Success
-May 8 2018 8:55AM, IP=23.91.128.44, User=Bob, message=VPN Login Success
-May 15 2018 8:52AM, IP=23.91.128.44, User=Bob, message=VPN Login Success
-May 22 2018 8:59AM, IP=23.91.128.44, User=Bob, message=VPN Login Success
-May 29 2018 8:44AM, IP=23.91.128.44, User=Bob, message=VPN Login Success
-May 29 2018 9:30PM, IP=62.176.64.51, User=Bob, message=VPN Login Success

Nothing unusual about Bob being his punctual self logging into work, except that “he” logged in from Bulgaria at about 9:30PM on May 29. Scenarios like this could be John on a business trip, or not John at all.

Finally, let’s take a look at some file executions in a log file. Here is a sample system updating itself, but for some reason the last file executed doesn’t seem to be a standard update file, which could be indicative of a malicious file being executed.

-May 4 2018 1:10AM, hostname=billingserver01, msg=file “update_01.exe” executed
-May 4 2018 1:13AM, hostname=billingserver01, msg =file “update_02.exe” executed
-May 4 2018 1:15AM, hostname=billingserver01, msg =file “update_03.exe” executed
-May 4 2018 1:50AM, hostname=billingserver01, msg =file “A2.exe” executed

As you can see, log data can contain invaluable data that can help your organization investigate suspicious activity and detect attacks in real time. Log data can indicate issues brewing in your systems that can be caught in advance before an outage or breach occurs. SIEM is a technology that centralizes log data, makes it available for searching, allows staff to alert on suspicious activity, and ultimately enhance the efficiency and effectiveness of your organization’s security operations.