Tag Archives: forensics

Mounting image with python-guestfs

Automating a sandbox: Evidence Collection

20,000 Leagues Under The Sand: Part 5

read part 4

You may have a tricked-out sandbox that logs host activity, does packet capture and IDS, and will make you a slice of toast, but none of the bells and whistles will do you any good without collecting the information and putting it in front of your eyes. The techniques required will test your knowledge of network and file system forensics, as well as your skill with code. Let’s start with an easy one.

Suricata logs

If you have followed the suggestions made earlier in this series, Suricata will be writing events to files in /var/log/suricata/ in JSON form, one object per line. This lends itself to ease of use; pretty much any language will have a good JSON parsing library. All you will need to do is filter for entries based on the timestamp being within the period you were running your malware sample.

Be aware that the Suricata log does not get truncated unless you have specified. If you read and filter the log using  the simplest method (line-by-line read from the start, parsing each event then filtering), this will eventually become very slow. You should consider rotating the file, either yourself or using Suricata’s built in rotation, and make sure that your parsing and filtering takes account of this rotation.

Packet capture

As mentioned in the post discussing networking, you can either create a per-run packet capture as part of your code (assuming your language has the appropriate libraries), or a systemwide one which you can then extract portions of.

If you only ever plan to have one guest VM sandboxing malware at a time, the per-run capture should be fine and relatively simple. If you are slightly nuts ambitious like me and want to design for the possibility of several in parallel, a systemwide capture would be more suitable. Again, depending on the way you have organised capture, you should make sure your code accounts for the rotation of the pcaps.

Host activity/event logs

Early on in this series I waxed lyrical about the advantages of Sysmon. I am not going to contradict any of that here, but collecting its output is not as simple as you might think. Windows event logs get written to EVTX files, but not necessarily immediately. Therefore although an event may be generated, its presence in the EVTX file is not guaranteed. Under testing I have found that not even a shutdown is a guarantee of the events being written to the file. The only method I have found to be 100% reliable is to query the Windows Event Log API¹. Therefore, to collect Sysmon logs in a reliable fashion, you need to be able to use the Windows API.

I am aware of two methods for doing this. The first is to write a program which queries the API, and run that in your sandbox. You can then write the data to a file, or send it out of the sandbox immediately. To send it out of the sandbox you could have a service on the host listening on the virtual network interface, such as an FTP or HTTP server.

The second method would be to use Windows Event Forwarding. This is a tremendously useful technique for blue teamers and comes highly recommended by Microsoft staff. It does, however, require you to have a second Windows host on which to collect the events, which may not be an option for you. Most documentation you will find on this will refer to setting it up in an Active Directory environment, however it is also capable of running in workgroup-only systems.

¹ I strongly suspect that the events are being written to temporary files but at the time of writing this is little better than a hunch. I’ll chase down my suspicion at some point and if it’s right there’ll be a new post about my findings.

Filesystem collection

Getting events is a huge win, and might well be all you need; but why not go one step further? Malware drops and modifies files and writes to the registry, and if you could get your hands on that evidence, it could be invaluable. Another of the reasons for choosing LibVirt/QEMU as my hypervisor was the availability of python bindings for LibGuestFS, allowing me to directly mount and read QEMU disk images. However, you should still be fine with other hypervisors: VMWare also provides a utility for this, and VirtualBox can apparently be mounted as a… network block device? Please can I have some of whatever Oracle have been smoking, because it’s clearly the good shit.

Detailed coverage of the options for filesystem evidence collection could run to several blog posts of its own, so I won’t go into everything here. However, I will describe three approaches, each with their own advantages and drawbacks.

  • Diffing from a known-good state

The slowest, but most comprehensive method. Requires building a comprehensive catalogue of the hashes of all files on the disk prior to malware execution, and another one after, and identifying the differences. Not recommended unless you are truly desperate to roast your CPU with hash calculations.

  • Metatadata-based selection

Since you know the lower and upper time bounds for possible activity by the malicious sample, you can walk the directory tree and select only items which have been changed or created in that period. Relatively quick, but some malware is known to modify the MFT record with false created/modified values, known as ‘timestomping’.

  • Key items and locations

The majority of malware activity is limited to just a few locations. Taking a copy of the user directory, and SYSTEM and SOFTWARE registry hives, plus a couple of other items, would capture the traces left by most samples you might ever run.

There is a final option for collection of file-based evidence, and that is to use a host agent which collects the files as the malware writes them. The above methods would fail to capture a file that has been created and subsequently removed. In an earlier post I mentioned that if you were so inclined, you could write code which would monitor API calls yourself. Doing this would also give you the ability to capture temporary files in addition to the ones which are left behind.

Hopefully you now have an idea of the approaches you can use to gather useful information from the execution of a malware sample without the need for manual intervention. The final post in my series considers anti-analysis techniques and countering sandbox evasion.

Host activity monitoring

20,000 Leagues Under The Sand – Part 2

read part 1

As a newbie sandboxer, the biggest obstacle for me was finding a way of getting in-depth information on what actions were being performed by malware I wanted to test. In particular, I wanted to be able to drop some samples, go away and make lunch, then come back and be looking at some results. That meant stepping through it in a debugger was out, or at least a lesson for another day. You’ve probably already seen that I ended up using Sysmon, but let’s have a look at the alternatives for a moment.

Built in Windows logging

Filesystem forensics

  • The files in C:\Windows\Prefetch\ can show if executables were run
  • The AppCompatCache registry key and AmCache.hve hive contain more detailed information on program execution, though neither logs individual execution instances or command line options
  • You can diff the filesystem – have a clean copy, either of the Master File Table or of the entire structure – and compare to see what’s changed; this is a fairly intensive operation, especially if you intend to see if a known good file has been replaced with a malicious version
  • There are tools for parsing registry hives so identifying new/modified keys is possible

Creating your own API call logging

  • If you’re a good enough programmer to write code that logs API calls, this is the gold standard. I am not (yet) up to this. It is possible to monitor for most of the interesting events such as process and file creation, registry modification etc. using filter drivers. If you want to go a step further and monitor (or even intercept and change) system calls, you need to be looking at DLL injection. This is the method used by Cuckoo sandbox, among many others.

Building monitoring in to the virtualisation

  • Technically this is all just code simulating hardware running other code. If you’re smart enough to modify a hypervisor so that it can recognise and log API calls within its guests, go for it. Please excuse me for thinking you’re a bit mad though!

Options #1, #2 and #4 hold an additional advantage of being difficult or impossible for sandbox evasion techniques to pick up on.

And then we get to Sysmon, which is in effect a version of #3, but it has a big advantage: somebody else did all the work for us! Hooray for Mark Russinovich and Thomas Garnier. Many sandboxes do API call monitoring; sometimes it can be a little bit excessively detailed (hello Cuckoo) but as far as understanding what malware is doing goes, it’s the bee’s knees. Let’s have a look at what you can get out of it.

Sysmon ProcessCreate event output

Sysmon Process Created event

We’ll ignore for now how much my UI leaves to be desired. Here is perhaps the most commonly of interest event to you: Process Created. In this event you have a wealth of data including not only the location of the executable, launch command and parent processes, but the MD5 and SHA256 hashes of the file. You can also get the import hash here too – though I’d forgotten to turn it on for this run. You can see what ran, from where, by whom, and how it was run, in a glance.

Sysmon File Created event output

Sysmon File Created event

Next up we can log the act of creating a file; in this case a trojan makes new copy of itself which is placed in C:\Users\<username>\System\Library\mshost.exe.

Sysmon Registry Value Set event data

Sysmon Registry Value Set event

You can also monitor for interesting things happening in the registry. This is one of the primary methods by which malware achieves “persistence”, i.e. the ability to remain active on the system it infects. Here we can see a new entry being created in one of the user’s Run keys.

Sysmon Network Connection output

Sysmon Network Connection event

In a final example, Sysmon allows you to detect initiation of network connections; not only do we have the network level data of the destination IP and port captured, but the destination hostname is also identified.

In just four event types, Sysmon is able to record the malware starting, hiding itself, achieving persistence, and contacting its Command and Control server. This is the power of logging API calls. But wait – there’s more! This only scratches the surface of what Sysmon can do. It is also capable of identifying:

  • A process changing the creation time of a file
  • Process termination
  • Loading of drivers
  • Loading of additional modules in to existing processes
  • Creation of threads within other running processes
  • Raw access to the disk (as opposed to using the file system APIs)
  • Access to another process’s memory
  • Creation of alternate data streams
  • Use of named pipes (a method of communicating between processes)
  • Use of Windows Management Instrumentation

As you can see, it’s a fantastic tool which would be pretty hard to top if you decided to try doing this yourself. If you are thinking of experimenting with malware – or looking for something to help you keep a closer eye on your systems in general – I can’t recommend it enough.

In part 3 I will discuss the use of IDS and packet capture tools to get detailed information on the malware’s communication.