20,000 Leagues Under The Sand – part 6
As long as there are robbers, there are going to be cops. The bad guys know perfectly well that people will be trying to identify their malware, and have all sorts of anti-analysis tricks up their sleeves to evade detection. Malware will very often perform checks for common analysis tools and stop running if it identifies their presence. Since one of the most fundamental tools for a malware analyst is the use of a virtual machine, it is the subject of numerous and varied detection attempts in many families of malware.
In its default configuration, a virtual machine is likely to have a wide range of indicators of its true nature. For example, it is common for standard peripheral names to contain hints (or outright declarations) that they are virtual.
This is likewise the case for QEMU/KVM among others. As well as peripheral devices, the CPU vendor information may also be an indicator:
Less obvious to casual browsing but still perfectly simple for code running on the system to detect are features such as the CPUID Hypervisor Bit, MAC address, and registry key indications such as the presence of BOCHS or VirtualBox BIOS information in the registry.
These detections depend on the code of the hypervisor; in some cases they can be overcome by specifying particular configuration values, and in others they can only be solved by modifying the source code of the hypervisor and recompiling it. Fortunately for my choice of QEMU/KVM, many people have already looked at this particular problem and some have been generous enough to publish their solutions. There is also a fair amount of information out there for VirtualBox (partly because Cuckoo favours this hypervisor), and some for VMWare ESXi.
Another means of detecting an analysis environment is to collect information indicating the behaviour and use of the system. As discussed in part 4 of this series, simulating the presence of a user is an important ability to counter this evasion method. You should also consider environmental factors such as uptime (it is improbable that a user would boot a system and immediately run malware; some samples look for a minimum uptime period).
Presence of the abnormal, absence of the normal
One of the side effects of Windows being engineered to be helpful, is that it leaves behind traces of a user’s activity everywhere. In addition, people are messy. They leave crap scattered all over their desktop, fill their document folders with junk, and run all sorts of unnecessary processes. Looking for evidence of this is quite simple, and malware has been known to refuse to run if there are insufficient recent documents, or very few running processes.
Malware may also attempt to evade detection by searching for running and installed services and the presence of files linked to debuggers, analysis tools and the sandbox itself (e.g. VirtualBox Guest Additions).
Anti-analysis could be a series all of its own, and my understanding of it is still quite narrow. I strongly encourage you to research the topic yourself as there are tons of excellent articles out there written by authors with far more experience.
Although it is not specific to sandboxing, I do not feel this series would be complete without some mention of the delivery of the output. You can write the best code to manage, control, and extract data from your sandbox in the world, but it is worthless if you cannot deliver it to your users in a helpful fashion. Think about what types of data are most important (IDS alerts/process instantiation/HTTP requests?), what particular feature of that data it is that makes it useful (HTTP hostname? destination IP? Alert signature name, signature definition?) and make sure that it is clearly highlighted – but you MUST allow the user to reach the raw data.
I cannot stress this enough. Sandboxes are a wonderful tool to get the information you need as a defender (though not everyone is so enthusiastic), but they are imprecise instruments. The more summarised and filtered the information is, the higher the chance it has to lead the analyst to false conclusions.
You should look at other sandboxes out there and draw inspiration, as well as learn what you want to avoid (whether because it’s too complicated for you right now, or you just think it’s a bad way of doing things) when making one yourself. Start by looking at Cuckoo, because it’s free and open source. Take a peek at the blogs and feature sheets of the commercial offerings like VMRay, Joe Sandbox, Hybrid Analysis, and the very new and shiny any.run.
Sandboxing is a huge topic and I haven’t begun to scratch the surface with this series. However, I hope that I have done enough to introduce the major areas of concern and provide some direction for people interested in dabbling in (or diving into) this fascinating world. I didn’t realise quite how much work it would be to reach the stage that I have; getting on for 18 months in, I’m still very much a novice and my creation, whilst operational, is distinctly rough around the edges. And on all of the flat surfaces. But it works! And I had fun doing it, and learned a ton – and that’s the main thing. I hope you do too.