Category Archives: Log Analysis

BSides Workshop “My log obeys commands – Parse!”

Better late than never, as they say. Last week I went to BSides London, which was pretty awesome. In between hanging out with all sorts of awesome people and downing mojitos, I had the opportunity to present a workshop. It seemed to go pretty well – though I have definitely learned enough to improve it for next time.

The short version is that it was an introduction to the basic principles and techniques of log parsing, for people at the level of a junior SOC analyst. Minimal regex knowledge required.

Although I don’t have a recording of the workshop, I’m putting the slides up here in case they’re of use to anyone. Enjoy! If you have any questions, please tweet @http_error_418 😊

My log obeys commands – Parse!

Collecting Netscaler appflow reverse proxy logs

TL/DR: python script to combine Netscaler reverse proxy HTTP request and response objects and output the result as an Apache-style log. Github repo is here: https://github.com/scherma/ipfixwatch

So the time came where your organisation decided it needs a new and shiny reverse proxy, hopefully before the current bucket of bolts went to the great big datacentre in the sky. It’ll be fun, they said. I told them we needed to talk about the definition of fun. They said they’d schedule a meeting.

This is not the right place to provide (nor am I really qualified) to give an in depth explanation of appflow; the short version is that it is a binary protocol for logging application-level information in a form that is more flexible than syslog. It has the benefit of having a well-defined structure, which is a plus from a log collection perspective, but being binary means parsing it is tricky and requires specialised tools.

So how can you get the juicy details out of that content? Easier said than done. Citrix will happily sell you an appliance; I leave it to the reader to imagine how many arms and legs the price might be. Ditto Splunk. Then there are the free/OSS options, which is where we arrive at Logstash.

Logstash can receive appflow (or ipfix/netflow) data, parse it, and output information summarising each appflow record. This is great and works (mostly). But when one starts looking at the output, a fly appears in the ointment: requests and responses are logged in separate records. This means that if you’re looking to replace your existing logs like for like, you could have a problem on your hands. Let’s take a look at some of the data. Here is the output in Logstash’s standard json format for a HTTP request:

{
    "@version":"1",
    "host":"10.0.50.4",
    "netflow":{
        "destinationIPv4Address":"10.0.50.5",
        "netscalerHttpReqUserAgent":"Mozilla/5.0",
        "destinationTransportPort":443,
        "netscalerHttpReqUrl":"/some/random/path?with=parameters&other=stuff",
        "sourceIPv4Address":"123.234.123.234",
        "netscalerHttpReqMethod":"GET",
        "netscalerHttpReqHost":"internalhost.unsafehex.com",
        "netscalerHttpReqReferer":"",
        "sourceTransportPort":1337,
        "netscalerHttpDomainName":"netscalerdemo.unsafehex.com",
        "netscalerTransactionId":2459675,
        "netscalerHttpReqXForwardedFor":""
    },
    "@timestamp":"2017-11-08T14:59:58.000Z",
    "type":"ipfix"
}

Plenty of useful information, but as you can see, nothing to indicate what response this request got. This is because the raw appflow packets from the Netscaler output the request and response as separate records, and Logstash is doing a literal translation of each binary record into a separate JSON object. Now let’s have a look at the corresponding response record:

{
    "@version":"1",
    "host":"10.0.50.4",
    "netflow":{
        "destinationIPv4Address":"123.234.123.234",
        "destinationTransportPort":1337,
        "sourceIPv4Address":"10.0.50.5",
        "netscalerHttpRspStatus":200,
        "sourceTransportPort":443,
        "netscalerHttpRspLen":35721,
        "netscalerTransactionId":2459675
    },
    "@timestamp":"2017-11-08T14:59:59.000Z",
    "type":"ipfix"
}

Fortunately for us, although it does not provide us the request and response in a single record, it does mark them as belonging to the same transaction, via the netscalerTransactionId field. This means that we are now able to combine the information and produce the information you might be expecting in an HTTP log.

Having discovered this this, I was able to throw together a python script which will read the JSON output of Logstash and rebuild the separate records into a unified message. At the time the Apache Extended format was the most suited for my requirements and so the current version of my script (here) writes this out to hourly log files. Given that the data becomes a python dict, it would be very easy to adapt this to whatever other format you are interested in.

The code’s clearly a hack job so if anyone feels like turning it into good code, I’d welcome the education. In any case, happy parsing!

A few notes:

  • ipfix, and hence appflow, are UDP protocols. There is no guarantee that Logstash will capture the corresponding response for every request, or vice versa. The script maintains a dict of currently-unmatched requests and responses for which there is a size limit to prevent it eating all of your memory. While Logstash is operating normally I have not seen any issues with unpaired request/response but it is theoretically possible.
  • If the script cannot match a request with a response, it will stay in memory until the script is stopped with SIGINT or keyboard interrupt, or the table size limit is reached. At this point, unpaired requests will be written to the output with a response code of 418. Unpaired responses will be discarded.
  • It won’t auto-create the output directory at this point. Sorry, forgot that part, I made them manually and didn’t think of it.

syslog-ng flat file collection: where did my program go?!

Using syslog-ng to forward logs is pretty nice, there’s plenty of documentation and the configuration is relatively easy to understand compared to other stuff out there (looking at you rsyslog), but that doesn’t mean everything is completely obvious. If you search for information on how to read a text file log with syslog-ng, you might come up with something like this:

source s_squid3access {
    file("/var/log/squid3/access.log" follow-freq(1)); };

Which checks the file /var/log/squid3/access.log for new entries every second. However, if you simply send this as is, you might end up with a message similar to the one below being sent to your syslog destination (note that I’ve modified my squid instance to log in the Apache Combined log format)

<13>1 2017-11-07T19:07:44+00:00 myproxy 192.168.1.4 - - [meta sequenceId="84"] - - [07/Nov/2017:19:07:43 +0000] "CONNECT www.netflix.com:443 HTTP/1.1" 200 12237 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" TCP_MISS:HIER_DIRECT

which corresponds to the following line in the log file:

192.168.1.4 - - [07/Nov/2017:19:09:03 +0000] "CONNECT www.netflix.com:443 HTTP/1.1" 200 5101 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" TCP_MISS:HIER_DIRECT

Note the position of the IP address in the syslog message – it is in the syslog header section in the position of the syslog program field. If you want to collect and parse this information in a SIEM for example, this will cause you quite the headache, as the message it attempts to interpret will begin “– – [07/Nov/2017:19:07:43 +0000]” rather than “192.168.1.4 – – [07/Nov/2017:19:07:43 +0000]“. It will no longer match the format of the data the collector is expecting (probably resulting in the message not getting parsed at all), and even if you tried fixing that, the missing data is a fairly crucial bit of info – there’s not much use knowing that someone visited a site if you can’t find out which user or host it was.

The answer is given in the syslog-ng documentation, although it is not immediately obvious. The section on collecting messages from text files hints that if the message does not have a syslog header, it may behave in an unusual way, but it does not explain in detail what will happen; for that you must look at the options for the file() method. In the description of the flags() option “no-parse” it notes that by default, syslog-ng assumes the first value in a log line is the syslog program. If you set this flag, your originating IP will again be part of the message section, and your SIEM/parsing will be happy again.

You can also set the program_override() option so that the program field is populated, as it is useful in certian SIEM/collection tools to have this info. Now your config file might look a bit like this:

source s_squid3access {
    file("/var/log/squid3/access.log" follow-freq(1) flags(no-parse) program_override("squid3")); };

and all should be well. Happy logging!

ELK and Plex’s DNS barf

Recently I’ve been playing around with ELK – Elasticsearch, Logstash and Kibana, a set of tools for collecting, indexing, searching and representing data. It’s particularly good for handling logs, and if there’s anything I am all about, it’s logs. It’s pretty fun, and once you get over the initial terminology hurdles, surprisingly easy (at least, for my modest home lab’s requirements). I don’t expect to be writing a guide about how to set it up since there are plenty of decent ones out there already, but thought people might like a nice little example of the kind of thing you can do once it’s running.

One thing that it can do very readily is flag up unusual events on your network – providing you are collecting the relevant logs of course. For instance, so far I have it receiving logs from all of my *nix boxes, firewall, IDS and VPN. I would give it proxy logs too, if I could persuade squid not to keep breaking everything, but that’s another story. Anyway, in passing I noticed that the number of events received had spiked massively over a one hour period last night.

Logspike

Well that got my attention! I figured it was probably something to do with the fact that I had been migrating my media server, but was a little surprised to see just how big a jump it was. Fortunately one thing that Elasticsearch (the index/log storage backend) and Kibana (the shiny web interface and visualisation tool) make absurdly easy is turning a big number like this into something readily understandable.

First I needed to understand how the logs in this time period were different from the rest of the day which as you can see has a pretty constant level of about 8,000 events per hour. This is as simple as adding what Kibana terms a ‘sub_bucket’, which allows you to split the count up based on various criteria. In my case I can selected one of the fields that I have indexed, to show how much of that volume is coming from which program.

Logspike breakdown

From this I see that the traffic through the firewall (filterlog) has jumped by two or three times – somewhat interesting, but the real eye-opener is that nice mauve colour. That would be bind, my DNS service, and quick back-of-envelope maths says that’s a 20-fold increase. What’s the deal, yo?

Yet again, you can make this nice and obvious with Kibana. If I filter down to just the logs coming from bind, I can then start breaking it down by one of the fields I’m extracting from the bind logs, the queried hostname.

Analyzed

And here we have two runaway culprits, lyricfind.plex.tv and lastfm-z.plexapp.com. That would be because when I stood up a new Plex server I hadn’t moved my old library data over. Plex therefore decided it needed to look up every item in my library, with DNS queries going out for each file.

This is a very small example and I’ve barely scraped the surface of what Kibana or the other tools are capable of, but hopefully it shows you how you can rapidly dig in to your data to get the story behind something that happened, and see it in a way that doesn’t need Cypher to tell you which symbols are the blondes, brunettes and redheads.