Tag Archives: splunk

PFSense, Suricata, and Splunk: mildly complicated, but very doable

I run a home lab, with a bunch of VMs running vaguely security-related tools, with a PFSense router in front of everything. On PFSense, I am running Suricata on several interfaces. I also collect the data in to a Splunk instance (❤️ developer license).

The Problem

You can forward data into Splunk with syslog. Ingesting syslog in to Splunk is not the easiest way to collect data – ideally you want to use a Splunk Universal Forwarder – (n.b. do not configure Splunk indexers or forwarders to listen for syslog directly! Use a dedicated syslog server!), but PFSense can forward its syslog natively, and Suricata alerts get written to syslog – so why not use that?

Well, because the data that Suricata puts in a syslog event is next to useless.

02/28/2023-20:18:38.695473  [**] [1:2016683:3] ET WEB_SERVER WebShell Generic - wget http - POST [**] [Classification: Potentially Bad Traffic] [Priority: 2] {TCP} 118.232.97.242:56937 -> <REDACTED>:80

A signature name, a source, and a destination. That’s it. Nothing to really let you understand what was happening underneath. What you really want is the Suricata eve.json output, which PFSense very helpfully allows you to enable. This contains a wealth of data, like the raw packet (in base64), decoded protocol elements like HTTP request headers, the URL, a DNS query – but although you can view these in the PFSense web UI, they are not covered by the native log management. What else could we use?

Splunk is easiest when you use a Splunk forwarder. Always use one when you can! However, sometimes it’s not the right choice… technically you can run the Forwarder on PFSense, as it’s FreeBSD, and there is a FreeBSD build of the forwarder. However, it has some drawbacks: you’ll find that although you appear to be able to enable boot-start, it doesn’t actually work. PFSense manages the set of services that run at boot and anything that is not an official PFSense package won’t get started. I’d been running it like this for ages, and given that I only rebooted my firewall once every six months or so, it was only a minor headache. If you’re running it as a production box though, that’s very much not ideal!

So, syslog is out, and a Splunk forwarder is problematic. What next?

Enter syslog-ng

PFSense is extensible, with a number of officially maintained add-on packages. Suricata is one. Another is the versatile syslog management engine, syslog-ng. Almost any way you can imagine of moving a log from one place to another, syslog-ng can do it for you. It could certainly scoop up files that aren’t covered by PFSense’s default syslog forwarding, and send them over to a syslog receiver. But. But.

syslog-ng can also write directly to the Splunk HTTP Event Collector (HEC). In fact, this is exactly what the Splunk Connect 4 Syslog package is under the hood – syslog-ng with a bunch of wrapping around it to make configuring it for a large number of Splunk inputs a bit less work. We don’t have that handy wrapping in PFSense, but it will absolutely let us do the necessary config manually, using an officially PFSense supported method. This post is therefore a step-by-step on how to set that up.

1. Groundwork

1.1 Splunk

You will need to have Splunk HEC set up. This part is covered by Splunk training (look in particular at the System Admin, Data Admin, and Architecting courses) and documentation so I will not rehash it here. However, as a brief summary, you will need to:

  • enable HEC and generate tokens
  • configure a load balancer (this is a non-Splunk item, but it is a critical step if you have a Splunk deployment with multiple indexers; do not direct HEC output at just one indexer in a cluster, it will do bad things to your Splunk deployment)

Tokens for a single-instance deployment can be found in Settings > Data Inputs > HTTP Event Collector. A HEC token looks like this in Splunk Web:

Extract from Splunk web interface showing a token named "sample_token", the edit/disable/delete links, and the token itself which is a GUID-style string of multiple hexadecimal blocks joined by dashes

Consult the documentation linked above for information about how to obtain the token if set up in a distributed deployment.

1.2 PFSense

  • Go to the System > Package Manager screen, search for the syslog-ng package, and install it

1.3 Network

Your PFSense device needs to be able to connect to the address of the HEC endpoint, on the appropriate port. The default port Splunk uses for this purpose is 8088. If using a load balancer, it must be able to connect to all Splunk indexers on the relevant port, and your PFSense device must also be able to connect to the load balancer. Test both of these things before starting to configure syslog-ng.

If you are intending to use a hostname to specify the Splunk instance / load balancer address, make sure that PFSense can resolve the hostname.

2. Configure syslog-ng

The most basic syslog-ng configuration has 3 components: a source, a destination, and a log directive that instructs syslog-ng to send source X to destination Y. The configuration needed for this use case is only a few minor tweaks away from this baseline. To begin configuring, navigate to Services > syslog-ng in the PFSense admin interface. You quite likely will not need to alter anything under the General tab. Configuring specific logging settings is done under the Advanced tab. Click Add to start writing a config.

2.1 The source

The “Object Type” for a source must be… Source. Sorry, no prizes for guessing that one.

Editing a config of syslog-ng in PFsense. There are blank fields labelled "Object Name", "Object Parameters" and "Description"; and a dropdown "Object Type" selected on "Source"

The “Object Name” is a unique identifier for the config stanza you are defining. There are few strict limitations, but it is a recommended convention to prefix source config names with “s_”, destination names with “d_” etc. The remainder should be brief, but descriptive. This config is to read the Suricata eve.json log files, so I have named it “s_suricata_eve_json”.

The “Object Parameters” define what is actually going to happen. To determine what to set, we must understand where and how the data we want to send exists.

PFsense stores Suricata logs in /var/log/suricata. It can run multiple instances of Suricata, one for each firewall interface. Every instance of Suricata gets its own directory within this path, and the logs are in these subdirectories.

Command line listing of /var/log/suricata showing multiple directories named after interfaces

We could write a separate source stanza for each individual file, manually specifying the interface name. However, that way you would need to edit the config whenever you set Suricata on a new interface. We can instead watch all the directories at once.

Editing a config of syslog-ng in PFsense, showing an object of type "Source" with the title s_suricata_eve_json

The wildcard-file option allows collecting multiple files, and can recursively search directories from a specified base path. That’s perfect for our use case. We specify the base-dir option to /var/log/suricata, set recursion to “yes”, and read all files named “eve.json”.

Additionally, the “no-parse” flag is set. This is because the default behaviour of syslog-ng is to attempt to interpret all messages as RFC-compliant syslog messages, where there is a set of default header fields such as syslog priority, timestamp, and host. Suricata eve.json events consist of a JSON object, with no header; trying to parse a syslog header from this results in improperly formatted JSON (and we need it to be valid JSON when it is sent to Splunk). This is the resulting definition:

{
  wildcard-file(
    base-dir("/var/log/suricata/")
    filename-pattern("eve.json")
    recursive(yes)
    flags(no-parse)
  );
};

Write a brief description, save this configuration, then click Add again for the next one.

2.2 The destination

You need to direct the events which are found in the source to your Splunk HEC receiver. Set the “Object Type” as “Destination”. My destination is labelled “d_splunk_suricata_hec”.

The syslog-ng option that allows sending data to HEC is the http() function. In this function we will define the destination (HEC endpoint host and the path “/services/collector/event” which is where Splunk HEC listener receives data), the token generated in step 1.1, and the HTTP body. The body is a JSON object with a specific set of fields that Splunk expects.

PFSense screenshot showing editing of a syslog-ng config set up to send to Splunk HEC

You must replace several elements of this with values specific to your environment:

  • <splunk_hec_endpoint> should be the IP or hostname of your load balancer, or of the Splunk instance if it is an all-in-one instance
  • <generated_hec_token> should be replaced with the token generated in step 1.1
  • In an ideal environment, you will be using proper certificate management with PKI; instead of setting peer-verify(no), you would load your organisation’s certificates into PFSense
  • <index> should be changed to the Splunk index you wish the logs to be sent to

After changing the values it should look something like this:

{
    http(url("https://172.16.3.9:8088/services/collector/event")
        method("POST")
        user_agent("syslog-ng User Agent")
        user("user")
        password("eb2cb049-3091-4b82-af1e-da11349a5e2b")
        peer-verify(no)
        body("{ \"time\": ${S_UNIXTIME},
                \"host\": \"${HOST}\",
                \"source\": \"${FILE_NAME}\",
                \"sourcetype\": \"suricata\",
                \"index\": \"suricata\", 
                \"event\":  ${MSG} }\n")
  );
};

Write a brief Description, save the configuration, and click Add to start writing the final part.

2.3 The log directive

Now that a source and destination have been defined, they can be connected together with a third stanza, where the “Object Type” is “Log”. This is the simplest of the three, and looks like so:

PFSense admin page showing a syslog-ng log stanza being configured

The source() function uses the Object Name chosen in step 2.1; the destination() function takes the name chosen in 2.2. Add these in, set an object name and description for this stanza, and save – and you should be rolling!

{ 
  source(s_suricata_eve_json);
  destination(d_splunk_suricata_hec); 
};

3. Checking your work

The first place to look is in the index you set as destination for Suricata events. Depending on how busy the device is, you might get dozens of events a minute, or only a few per hour. If you don’t see anything, try looking in the following places to see why:

3.1 Suricata logs on PFSense

You can see the events as they are written on the device under Services > Suricata > Logs View. If you have command line access you can also look in the filesystem at /var/log/suricata/<interface name>/eve.json.

3.2 syslog-ng logs on PFSense

Under Services > syslog-ng > Log Viewer, you can see recent messages from the syslog-ng service. Possible errors you could see here, and their causes include:

error sending HTTP request; url='https://<your host>:8088/services/collector/event', error='Couldn\'t resolve host name' 

PFSense could not look up the specified hostname via DNS; redo step 1.3

curl: error sending HTTP request; url='https://<your host>:8088/services/collector/event', error='SSL peer certificate or SSH remote key was not OK' 

the certificate is not trusted – you should specify peer-verify(no) if this is expected

Server returned with a 4XX (client errors) status code, which means we are not authorized or the URL is not found.; url='https://<your host>:8088/services/collector/event', status_code='400' 

the request wasn’t formatted correctly; you may have made a typo when constructing the text in the body() function

3.3 HEC logs in Splunk

If syslog-ng connects successfully but submits bad information, Splunk HEC will log an error. You can search for this with:

index=_internal component=HttpInputDataHandler

If the problem is badly formatted data, the messages aren’t hugely informative, but they are at least enough to confirm roughly what’s going on.

10-11-2023 20:31:45.580 +0100 ERROR HttpInputDataHandler [23746 HttpDedicatedIoThread-0] - Failed processing http input, token name=pfsense_syslog_ng, channel=n/a, source_IP=<syslog-ng source IP>, reply=6, events_processed=0, http_input_body_size=1046, parsing_err="While expecting event object key: Unexpected character: ':', totalRequestSize=1046"

When I encountered this, the only way I could think of to see what the problem was, was to write a second destination for syslog-ng where it would write events to a new file, using the same formatting text used in the body() function of the http() destination. I could then read the file and figure out which bit of the JSON was incorrect.

Hopefully now that I’ve shown exactly which bits to alter in this guide, you won’t have a need for that level of debugging! If you find messages like this, first re-read section 2.2 and check your destination stanza very carefuly against the example, for missing or extra characters.

4. Wrap up

If all went well, you now have all the eve.json events in Splunk, in all their lovely detail. If this has been helpful, I’d love to hear from you – or if there’s anything wrong or missing, please let me know. Happy Splunking!

tstats: afterburners for your Splunk threat hunting

Recently, @da_667 posted an excellent introduction to threat hunting in Splunk. The information in Sysmon EID 1 and Windows EID 4688 process execution events is invaluable for this task. Depending on your environment, however, you might find these searches frustratingly slow, especially if you are trying to look at a large time window. You may also have noticed that although these logs concern the same underlying event, you are using two different searches to find the same thing. Is there anything we can do to improve our searches? Spoiler: yes!

One of the biggest advantages Splunk grants is in the way it turns the traditional model of indexing SIEM events on its head. Instead of parsing all the fields from every event as they arrive for insertion into a behemoth of a SQL database, they decided it was far more efficient to just sort them by the originating host, source type, and time, and extract everything else on the fly when you search. It’s a superb model, but does come with some drawbacks.

Some searches that might have been fast in a database are not so rapid here. Again, because there is no database, you are not constrained to predefined fields set by the SIEM vendor – but there is nothing to keep fields with similar data having the same name, so every type of data has its own naming conventions. Putting together a search that covers three different sources for similar data can mean having to know three different field names, event codes specific to the products… it can get to be quite a hassle!

The answer to these problems is datamodels, and in particular, Splunk’s Common Information Model (CIM). Datamodels allow you to define a schema to address similar events across diverse sources. For example, instead of searching

index=wineventlog EventCode=4688 New_Process_Name="*powershell.exe" 

and

index=sysmon EventCode=1 Image=*powershell.exe 

separately, you can search the Endpoint.Processes datamodel for process_name=powershell.exe and get results for both. The CIM is a set of predefined datamodels for, as the name implies, types of information that are common. Once you have defined a datamodel and mapped a sourcetype to it, you can “accelerate” it, which generates indexes of the fields in the model. This process carries a storage, CPU and RAM cost and is not on by default, so you need to understand the implications before enabling it.

Turn it up to 11

Let’s take this example, based on the fourth search in @da_667’s blog. In my (very limited) data set, according to the Job Inspector, it took 10.232 seconds to search 30 days’ worth of data. That’s not so bad, but I only have a few thousand events here, and you might be searching millions, or tens of millions – or more!

Splunk Job Inspector showing search time and cost breakdown

What happens if we try searching an accelerated datamodel instead? Is there much of a difference?

Splunk Job Inspector information for accelerated datamodel search

Holy shitballs yes it does. This search returned in 0.038 seconds, that’s nearly 270x faster! What sorcery is this? Well, the command used was:

| tstats summariesonly=true count from datamodel=Endpoint.Processes where Processes.process_name=powershell.exe by Processes.process_path Processes.process

What’s going on in this command? First of all, instead of going to a Splunk index and running all events that match the time range through filters to find “*.powershell.exe“, my tstats command is telling it to search just the tsidx files – the accelerated indexes mentioned earlier – related to the Endpoint datamodel. Part of the indexing operation has broken out the process name in to a separate field, so we can search for an explicit name rather than wildcarding the path.

The statistics argument count and the by clause work similarly to the traditional stats command, but you will note that the search specifies Processes.process_name – a quirk of the structure of data models means that where you are searching a subset of a datamodel (a dataset in Splunk parlance), you need to specify your search in the form

datamodel=DatamodelName[.DatasetName] where [DatasetName.]field_name=somevalue by [DatasetName.]field2_name [DatasetName.]field3_name

The DatasetName components are not always needed – it depends whether you’re searching fields that are part of the root datamodel or not (it took me ages to get the hang of this so please don’t feel stupid if you’re struggling with it).

Filtered, like my coffee

Just as with the Hurricane Labs blog, options for filtering and manipulating tstats output can be managed with the same operations.

| tstats summariesonly=true count from datamodel=Endpoint.Processes where Processes.process_name=powershell.exe NOT Processes.parent_process_name IN ("code.exe", "officeclicktorun.exe") by Processes.process_path Processes.process | `drop_dm_object_name("Processes")`

You can filter on any of the fields present in the data model, and also by time, and the original index and sourcetype. The resulting data can be piped to whatever other manipulation/visualisation commands you want, which is particularly handy for charts and other dashboard features – your dashboards will be vastly sped up if you can base them on tstats searches.

You’ll also note the macro drop_dm_object_name – this reformats the field names to exclude the Processes prefix, which is handy when you want to manipulate the data further as it makes the field names simpler to reference.

A need for speed

How do I get me some of this sweet, sweet acceleration I hear you ask? The first thing to understand is that it needs to be done carefully. You will see an increase in CPU and I/O on your indexers and search heads. This is because the method involves the search head running background searches that populate the index. There will be a noticeable increase in storage use, with the amount depending on the summary range (i.e. time period covered by detailed indexing) and how busy your data sources are.

With this in mind, you can start looking at the Common Information Model app and the documentation on accelerating data models. I highly recommend consulting Splunk’s Professional Services before forging ahead, unless your admins are particularly experienced. The basic process is as follows:

  • Ensure that your sourcetypes are CIM compliant. For most Splunk-supported apps, this is already done.
  • Ensure that you have sufficient resources to handle the increased load
  • Deploy the CIM app
  • Enable acceleration for the desired datamodels, and specify the indexes to be included (blank = all indexes. Inefficient – do not do this)
  • Wait for the summary indexes to build – you can view progress in Settings > Data models
  • Start your glorious tstats journey
Configuration for Endpoint datamodel in Splunk CIM app
Detail from Settings > Data models

Datamodels are hugely powerful and if you skim through the documentation you will see they can be applied to far more than just process execution. You can gather all of your IDS platforms under one roof, no matter the vendor. Get email logs from both Exchange and another platform? No problem! One search for all your email! One search for all your proxy logs, inbound and outbound! Endless possibilities are yours.

One search to rule them all, one search to find them… happy Splunking!