Collecting Netscaler appflow reverse proxy logs

TL/DR: python script to combine Netscaler reverse proxy HTTP request and response objects and output the result as an Apache-style log. Github repo is here: https://github.com/scherma/ipfixwatch

So the time came where your organisation decided it needs a new and shiny reverse proxy, hopefully before the current bucket of bolts went to the great big datacentre in the sky. It’ll be fun, they said. I told them we needed to talk about the definition of fun. They said they’d schedule a meeting.

This is not the right place to provide (nor am I really qualified) to give an in depth explanation of appflow; the short version is that it is a binary protocol for logging application-level information in a form that is more flexible than syslog. It has the benefit of having a well-defined structure, which is a plus from a log collection perspective, but being binary means parsing it is tricky and requires specialised tools.

So how can you get the juicy details out of that content? Easier said than done. Citrix will happily sell you an appliance; I leave it to the reader to imagine how many arms and legs the price might be. Ditto Splunk. Then there are the free/OSS options, which is where we arrive at Logstash.

Logstash can receive appflow (or ipfix/netflow) data, parse it, and output information summarising each appflow record. This is great and works (mostly). But when one starts looking at the output, a fly appears in the ointment: requests and responses are logged in separate records. This means that if you’re looking to replace your existing logs like for like, you could have a problem on your hands. Let’s take a look at some of the data. Here is the output in Logstash’s standard json format for a HTTP request:

{
    "@version":"1",
    "host":"10.0.50.4",
    "netflow":{
        "destinationIPv4Address":"10.0.50.5",
        "netscalerHttpReqUserAgent":"Mozilla/5.0",
        "destinationTransportPort":443,
        "netscalerHttpReqUrl":"/some/random/path?with=parameters&other=stuff",
        "sourceIPv4Address":"123.234.123.234",
        "netscalerHttpReqMethod":"GET",
        "netscalerHttpReqHost":"internalhost.unsafehex.com",
        "netscalerHttpReqReferer":"",
        "sourceTransportPort":1337,
        "netscalerHttpDomainName":"netscalerdemo.unsafehex.com",
        "netscalerTransactionId":2459675,
        "netscalerHttpReqXForwardedFor":""
    },
    "@timestamp":"2017-11-08T14:59:58.000Z",
    "type":"ipfix"
}

Plenty of useful information, but as you can see, nothing to indicate what response this request got. This is because the raw appflow packets from the Netscaler output the request and response as separate records, and Logstash is doing a literal translation of each binary record into a separate JSON object. Now let’s have a look at the corresponding response record:

{
    "@version":"1",
    "host":"10.0.50.4",
    "netflow":{
        "destinationIPv4Address":"123.234.123.234",
        "destinationTransportPort":1337,
        "sourceIPv4Address":"10.0.50.5",
        "netscalerHttpRspStatus":200,
        "sourceTransportPort":443,
        "netscalerHttpRspLen":35721,
        "netscalerTransactionId":2459675
    },
    "@timestamp":"2017-11-08T14:59:59.000Z",
    "type":"ipfix"
}

Fortunately for us, although it does not provide us the request and response in a single record, it does mark them as belonging to the same transaction, via the netscalerTransactionId field. This means that we are now able to combine the information and produce the information you might be expecting in an HTTP log.

Having discovered this this, I was able to throw together a python script which will read the JSON output of Logstash and rebuild the separate records into a unified message. At the time the Apache Extended format was the most suited for my requirements and so the current version of my script (here) writes this out to hourly log files. Given that the data becomes a python dict, it would be very easy to adapt this to whatever other format you are interested in.

The code’s clearly a hack job so if anyone feels like turning it into good code, I’d welcome the education. In any case, happy parsing!

A few notes:

  • ipfix, and hence appflow, are UDP protocols. There is no guarantee that Logstash will capture the corresponding response for every request, or vice versa. The script maintains a dict of currently-unmatched requests and responses for which there is a size limit to prevent it eating all of your memory. While Logstash is operating normally I have not seen any issues with unpaired request/response but it is theoretically possible.
  • If the script cannot match a request with a response, it will stay in memory until the script is stopped with SIGINT or keyboard interrupt, or the table size limit is reached. At this point, unpaired requests will be written to the output with a response code of 418. Unpaired responses will be discarded.
  • It won’t auto-create the output directory at this point. Sorry, forgot that part, I made them manually and didn’t think of it.

syslog-ng flat file collection: where did my program go?!

Using syslog-ng to forward logs is pretty nice, there’s plenty of documentation and the configuration is relatively easy to understand compared to other stuff out there (looking at you rsyslog), but that doesn’t mean everything is completely obvious. If you search for information on how to read a text file log with syslog-ng, you might come up with something like this:

source s_squid3access {
    file("/var/log/squid3/access.log" follow-freq(1)); };

Which checks the file /var/log/squid3/access.log for new entries every second. However, if you simply send this as is, you might end up with a message similar to the one below being sent to your syslog destination (note that I’ve modified my squid instance to log in the Apache Combined log format)

<13>1 2017-11-07T19:07:44+00:00 myproxy 192.168.1.4 - - [meta sequenceId="84"] - - [07/Nov/2017:19:07:43 +0000] "CONNECT www.netflix.com:443 HTTP/1.1" 200 12237 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" TCP_MISS:HIER_DIRECT

which corresponds to the following line in the log file:

192.168.1.4 - - [07/Nov/2017:19:09:03 +0000] "CONNECT www.netflix.com:443 HTTP/1.1" 200 5101 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36" TCP_MISS:HIER_DIRECT

Note the position of the IP address in the syslog message – it is in the syslog header section in the position of the syslog program field. If you want to collect and parse this information in a SIEM for example, this will cause you quite the headache, as the message it attempts to interpret will begin “– – [07/Nov/2017:19:07:43 +0000]” rather than “192.168.1.4 – – [07/Nov/2017:19:07:43 +0000]“. It will no longer match the format of the data the collector is expecting (probably resulting in the message not getting parsed at all), and even if you tried fixing that, the missing data is a fairly crucial bit of info – there’s not much use knowing that someone visited a site if you can’t find out which user or host it was.

The answer is given in the syslog-ng documentation, although it is not immediately obvious. The section on collecting messages from text files hints that if the message does not have a syslog header, it may behave in an unusual way, but it does not explain in detail what will happen; for that you must look at the options for the file() method. In the description of the flags() option “no-parse” it notes that by default, syslog-ng assumes the first value in a log line is the syslog program. If you set this flag, your originating IP will again be part of the message section, and your SIEM/parsing will be happy again.

You can also set the program_override() option so that the program field is populated, as it is useful in certian SIEM/collection tools to have this info. Now your config file might look a bit like this:

source s_squid3access {
    file("/var/log/squid3/access.log" follow-freq(1) flags(no-parse) program_override("squid3")); };

and all should be well. Happy logging!

Bonding rituals

I may have been quiet on here but not because I haven’t been doing lots of fun nerdy stuff. Unfortunately, there’s a fair amount of it that can’t be blogged about, hence the lack of new material here, but a problem came up the other day that was a royal pain in the ass pretty fun and interesting, and maybe some folks out there might be scratching their heads over it and appreciate there being something in the depths of t’interwebs to explain it.

Bonding is a pretty damn useful thing, especially to us NSM folks. Take a 1×1 tap and run the output cables up to a nice bit of tin running $distro_of_choice, a few minutes of tweaking interface config files, and hey presto! a bonded interface with both directions of traffic for Snort/Suricata/Bro/whatever to listen to, and your kit is safely out of line where the sysadmins can’t blame you when something breaks and takes out the internet (they’ll probably still try though).

So far, so standard. The other day I needed to do this in a VM – no problem, I thought. VMWare will let you pass traffic through to the guest; you need to put the switch into promiscuous mode because the interface in your guest/sniffer won’t have an IP assigned, which you can do in the vSwitch Security Policy.

With each output of the tap assigned its own vSwitch which was attached to an individual interface on the guest, I created a bond interface to combine the two. In the very best tradition of here’s one I made earlier let someone else make and plagiarised shamelessly, you can read a good guide here.  One notable exception – use mode 0 (round robin) and not  active/passive – we want to combine the outputs, instead of having the second only work if the first fails.

So, having done that, I brought up the bond0 interface and… weirdness happened. I was only seeing one side of the traffic. tcpdump on the bond0 interface was only showing the responses, not the requests. The slaved interfaces told a similar story, one had traffic (inbound), and the other was silent. Odd. Next check, was the ESXi host seeing the traffic but not passing it through? Checking this requires the use of pktcap-uw rather than VMWare’s implementation of tcpdump, which will not let you look at traffic on individual vSwitches. This showed the traffic was indeed present.

Proper head-scratching time now. The interface settings were all correct, the problem persisted through restarts of the interfaces, the networking service, even the OS. Next step was bringing up each interface manually one at a time; now it got even weirder. eth1 showed responses as expected. eth2 showed requests – awesome! bond0 showed… just the responses. Checked eth2 and it was now silent as the grave. Curses! This didn’t change when bond0 was shut down again; outbound traffic would only reappear when eth2 was brought up without bond0. Enabling bond0 killed it again until it was started without bond0 running. What the hell?

Having pretty much run out of ideas, a bit of experimentation was on the cards, starting with the ESXi config settings. This was clearly a stroke of genius, because upon setting MAC address changes to ‘accept’, it instantly started working. Why would this be?

One of the things that enabling bonding does is that the bond0 interface defaults to starting with the MAC of the first interface to join the bond. In round-robin mode, it then shuttles its MAC address around each interface to receive frames; VMWare’s (sensible) default is to ignore changes like this, and as a result, will stop transmitting traffic to the interface it sees as having violated the restriction until the interface is bounced. Thus, the first slave to join will receive traffic because its MAC stays the same, and the second stops being sent data because the vSwitch has seen its MAC change. Permitting changes on the vSwitch means the MAC can be assigned as necessary.

TLDR: If you want to use a bonded interface in an ESXi guest like this, you must set ‘Allow MAC address changes’ to accept on the vSwitches the slave interfaces connect to.

ELK and Plex’s DNS barf

Recently I’ve been playing around with ELK – Elasticsearch, Logstash and Kibana, a set of tools for collecting, indexing, searching and representing data. It’s particularly good for handling logs, and if there’s anything I am all about, it’s logs. It’s pretty fun, and once you get over the initial terminology hurdles, surprisingly easy (at least, for my modest home lab’s requirements). I don’t expect to be writing a guide about how to set it up since there are plenty of decent ones out there already, but thought people might like a nice little example of the kind of thing you can do once it’s running.

One thing that it can do very readily is flag up unusual events on your network – providing you are collecting the relevant logs of course. For instance, so far I have it receiving logs from all of my *nix boxes, firewall, IDS and VPN. I would give it proxy logs too, if I could persuade squid not to keep breaking everything, but that’s another story. Anyway, in passing I noticed that the number of events received had spiked massively over a one hour period last night.

Logspike

Well that got my attention! I figured it was probably something to do with the fact that I had been migrating my media server, but was a little surprised to see just how big a jump it was. Fortunately one thing that Elasticsearch (the index/log storage backend) and Kibana (the shiny web interface and visualisation tool) make absurdly easy is turning a big number like this into something readily understandable.

First I needed to understand how the logs in this time period were different from the rest of the day which as you can see has a pretty constant level of about 8,000 events per hour. This is as simple as adding what Kibana terms a ‘sub_bucket’, which allows you to split the count up based on various criteria. In my case I can selected one of the fields that I have indexed, to show how much of that volume is coming from which program.

Logspike breakdown

From this I see that the traffic through the firewall (filterlog) has jumped by two or three times – somewhat interesting, but the real eye-opener is that nice mauve colour. That would be bind, my DNS service, and quick back-of-envelope maths says that’s a 20-fold increase. What’s the deal, yo?

Yet again, you can make this nice and obvious with Kibana. If I filter down to just the logs coming from bind, I can then start breaking it down by one of the fields I’m extracting from the bind logs, the queried hostname.

Analyzed

And here we have two runaway culprits, lyricfind.plex.tv and lastfm-z.plexapp.com. That would be because when I stood up a new Plex server I hadn’t moved my old library data over. Plex therefore decided it needed to look up every item in my library, with DNS queries going out for each file.

This is a very small example and I’ve barely scraped the surface of what Kibana or the other tools are capable of, but hopefully it shows you how you can rapidly dig in to your data to get the story behind something that happened, and see it in a way that doesn’t need Cypher to tell you which symbols are the blondes, brunettes and redheads.

Angler knows when you’re fakin’ it

A brief introduction to Angler for those who are new to it: Angler is a framework of malicious code that criminals can purchase/rent and deploy on web servers they have control over – usually ones they have compromised rather than renting for themselves. It is typically comprised of an initial component which is injected into a site you are likely to visit, which redirects you to the second location, known as the exploit kit landing page. The landing page has code which tests what features and capabilities your browser has, in order to identify whether you are vulnerable to any of the exploits the criminals have access to; this is known as “browser enumeration”. This page often contains the exploits themselves, but they can sometimes be called from another location. Finally the exploit will contain commands that will attempt to download and run a program, which could be anything – a remote access tool, ransomware, you name it. If you want to know a bit more about what comes out the business end of an Angler chain, there are many examples on places like Malware Don’t Need Coffee and Cisco Talos.

Angler has developed steadily, adding new features in its efforts to more effectively select vulnerable users, better evade detection, and increase the difficulty of analysing it. The creators have added polymorphism on all stages of the code so that it changes every time the page loads, making it much more difficult for antivirus and intrusion detection systems to recognise. There are up to three layers of obfuscation in the javascript. The most recent trick I have seen is a little call that doesn’t seem to do much in terms of sorting potential victims from the crowd, but made analysing it a bit more of a challenge.

1 raw page

The above image will be familiar to anyone who has investigated Angler. Dozens of variables with completely random names, and an eval to turn them into some valid code. Looks like this one has been inserted into a header PHP file of some sort as it’s appearing even before the opening HTML tags. This particular group becomes:

eval(String.fromCharCode.apply(null,document.getElementById("xndotnjmjwlj").innerHTML.split(" ")))

Getting the text from a hidden div and turning it from a series of character codes into text, then evaluating. The hidden div part is a bit new but nothing particularly special (previously it just put the text to decode directly in a string variable). That text becomes the following:

2 angler decoded div 1

On line 14 it is getting the code from another div and then feeding it through a loop which does some maths on the characters and constructs another string. That doesn’t sound too complicated, but try as I might I just could not get this code to do its stuff, not until I started stepping through it and watching what values got assigned to the variables in each step; and as it turns out, the problem is on the very first line.

kaypkiafgibrwvp = (+[window.sidebar])

The lines that follow it create a loop with this value as the start, and test whether the strings “rv:11” and “MSIE” are present in the user agent – a fairly standard way of detecting what browser is coming to call. Why was this code not running properly? Presumably if it’s a for loop this code should start with kaypkiafgibrwvp being 0, so let’s check that bit – in fact you can try this out yourself. Bring up the developer console in your browser (Firefox: ctrl+shift+k; Chrome: ctrl+shift+i; IE: F12) and put the text in the box into the console. In IE, Chrome and Safari, you’ll get “0”. But in Firefox and some javascript debugging tools, you’ll get “NaN”.

That “NaN” breaks the code – it will never send your browser anywhere. So in order to get this code to turn out something useful, I’ll need to replace at least that value with 0 rather than the output of the expression. The other value that’s set here from browser variables, othddelxtnfae, is used as a modulus a bit further down so it’s necessary to make sure that one’s correct too; a little tinkering showed that the correct value for this variable was 2 – indicating the UA must contain “rv:11” or “MSIE10”, but not both.

3 final redirect formatted

And there you have it, an iFrame hidden off the top of the page, going to the next part of Angler.

Given that the call to window.sidebar only eliminates one of the major browsers from contention it seemed unlikely to me that this would be the reason for including it – an anti-analysis technique seemed more likely. A bit of searching reveals that this is because some popular analysis tools depend on open source code from the Firefox javascript engine… and Trustwave basically wrote everything in this post two days ago. Lesson learned: analysis first, beauty sleep later.

Beginning your journey into javascript deobfuscation

Obfuscation of code is an interesting technique used by malware authors to conceal what their code is doing from anyone looking at it. Finding out what it does can be tricky, especially if you’re not sure how to start. So, when a wave of WordPress compromises broke out a few months back and Zscaler wrote a nice overview of what the compromised sites were trying to do, I thought it was an excellent opportunity to go through the javascript they quoted and show exactly how to unwrap it; especially given that it’s a relatively simple example.

The first thing you need to do is to find the page with the obfuscated javascript on it. I can’t tell you exactly how you should do this; it will depend on how you are encountering the issue – it could be that it’s your WordPress install that has been owned, or it could be that you’re seeing alerts from your intrusion detection or antivirus systems that you need to investigate – any number of reasons. Once you have identified the source of the alert (proceed very carefully from here on in!) you need to get the source code of the page that you’re investigating. Use something like curl or wget in Linux/OSX for safety. I would never suggest pointing a browser at it unless you really know what you’re doing.

Obfuscated code tends to stand out to human eyes because it creates patterns that are very distinct from the code on the rest of the page. There are different styles, so this example won’t teach you to spot all of them but believe me when I say that once you’ve seen a few you will be able to rapidly scroll through a page and spot the blocks of evil code. In my example, the evil code looks like this:

1 Raw JS

Distinguishing features include:

  • large wall of text where code above and below is relatively short and spaced out
  • random looking letters that make no sense
  • lots of little groups of numbers
  • a “document.write” command
  • not seen here, but an “eval” command is often associated with malicious javascript – worth mentioning because this one is relatively rare in benign code whereas “document.write” is common.

None of these is evil in itself but they’re common things to see in evil javascript.

The first thing you need to do is break out the stuff between the script tags into a separate file to play around with it. I’m going to do my decoding using Python because that’s what I’m used to but you can use pretty much any language for this. Find statements separated by semicolons and split them on to new lines so it’s easier to follow (for large, complicated blocks you might want to use a tool like JSPretty but this one is quite simple).

2 JS line by line

Once you’ve done that you can start to get an idea of what is in there. In this example, there are variables ‘a’, ‘b’, ‘c’, ‘clen’, then a for loop, and finally an unescape statement. Breaking down the for loop further, it does the following:

  • iterates through every character in ‘a’
  • in each iteration, get the numeric ASCII value of the character
  • bitwise XOR that value with 2
  • convert the resulting value into a new ASCII character
  • append the new ASCII character to a string

This is something that converts very readily to Python, like so:

3 JS to python

Followed by the output:

4 decode stage 1 output

This gives us a URL-encoded string (character codes preceded by % symbols). Python has a built-in library, urllib, which can decode this for us.

5 url encoding handling

Finally, we get a human-readable version of the content.

6 final output

The “document.write” command would have written the decoded content to the browser which as you can now see would have created an 0x0 sized iframe calling a resource from teaserguide[.]com. If you haven’t already read Zscaler’s article showing what happens after that, I highly recommend that you do so now.

This was a pretty low-level example; some malicious code has obfuscation three or more levels deep. Others use weird tricks in javascript to produce code that is mind-bendingly hard to pick apart such as the JJEncode technique (warning: you might need a stiff drink and a lie down after looking at that one – I did, and I take my hat off to whoever wrote the script to reverse it). I hope this was helpful to get you started.