Tag Archives: malware

Via squid and second host running tor

Intercepting SSL with squid proxy and routing to tor

There was a time when practically all malware communicated with its command and control (C2) servers unencrypted. Those days are long gone, and now much of what we would wish to see is hidden under HTTPS.  What are we to do if we want to know what is going on within that traffic?


(for those who are unfamiliar with the HTTPS protocol and public key encryption)

The foundation of HTTPS is the Public Key Infrastructure. When traffic is to be encrypted, the destination server provides a public key with which to encrypt a message. Only that server, which is in possession of the linked private key, can decrypt the message. Public key, or asymmetric encryption, is relatively slow so instead of all traffic being secured with this, the client and server use this stage only to negotiate a new key in secret for a symmetrically encrypted connection. If we wish to be able to read the traffic, we need to obtain the symmetric encryption key.

How can this be achieved? If we are in a position to intercept the traffic, we could provide a public key that we are in control of to the client, and establish our own connection to the server. The traffic would be decrypted at our interception point with our key, and re-encrypted as we pass it to the server with the server’s key. However, because HTTPS must be able to keep information confidential, it has defences designed with this attack in mind. A key issued by a server is normally provided along with the means to verify that it is genuine, not falsified as we wish to do. The key is accompanied by a cryptographic signature from a Certificate Authority (CA), and computers and other devices using HTTPS to communicate hold a list of CAs which are considered trustworthy and authorised to verify that keys are valid. Comparing the signature against the client’s stored list enables the client to verify the authenticity of the public key.

If we wish to inspect encrypted communication, we must both intercept the secret key during the exchange, and convince the client that the certificate it receives is genuine. This post will walk through the process needed to achieve those two goals.


Starting point

I have already been running a sandbox that routes traffic via tor. It is loosely based on Sean Whalen’s Cuckoo guide, and implements the tor routing without going via privoxy, as shown below.

Initial setup

Using this method allows me to run malware without revealing the public IP of my lab environment. It has certain drawbacks; some malware will recognise that it is being routed via tor and stop functioning, however the tradeoff is acceptable to me.

squid | tor

Using squid with tor comes with some caveats that make the eventual configuration a little complicated. The version of squid I am using (3.5.23) cannot directly connect to a tor process running on the local host. In order to route via tor locally you will need a parent cache peer to which the connection can be forwarded. Privoxy is capable of serving this purpose, so initially I attempted the setup shown below:

Via privoxy

This configuration will function just fine if all you want is to proxy via squid. Unfortunately, this version of squid does not support SSL/TLS interception when a parent cache is being used. So, since we cannot use privoxy, and squid cannot route to tor on the same host, what can we do? Run tor on a different host!

Via squid and second host running tor


squid with ssl intercept/ssl-bump

In order to use squid with ssl-bump, you must have compiled squid with the –with-openssl and –enable-ssl-crtd options. The default package on Debian is not compiled this way, so to save you some time I have provided the commands I used to compile it:

apt-get source squid
cd squid3-3.5.23/
./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libexecdir=${prefix}/lib/squid3 --srcdir=. --disable-maintainer-mode --disable-dependency-tracking --disable-silent-rules 'BUILDCXXFLAGS=-g -O2 -fdebug-prefix-map=/build/squid3-4PillG/squid3-3.5.23=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -Wl,--as-needed' --datadir=/usr/share/squid --sysconfdir=/etc/squid --libexecdir=/usr/lib/squid --mandir=/usr/share/man --enable-inline --disable-arch-native --enable-async-io=8 --enable-storeio=ufs,aufs,diskd,rock --enable-removal-policies=lru,heap --enable-delay-pools --enable-cache-digests --enable-icap-client --enable-follow-x-forwarded-for --enable-auth-basic=DB,fake,getpwnam,LDAP,NCSA,NIS,PAM,POP3,RADIUS,SASL,SMB --enable-auth-digest=file,LDAP --enable-auth-negotiate=kerberos,wrapper --enable-auth-ntlm=fake,smb_lm --enable-external-acl-helpers=file_userip,kerberos_ldap_group,LDAP_group,session,SQL_session,time_quota,unix_group,wbinfo_group --enable-url-rewrite-helpers=fake --enable-eui --enable-esi --enable-icmp --enable-zph-qos --enable-ecap --disable-translation --with-swapdir=/var/spool/squid --with-logdir=/var/log/squid --with-pidfile=/var/run/squid.pid --with-filedescriptors=65536 --with-large-files --with-default-user=proxy --enable-build-info='Debian linux' --enable-linux-netfilter build_alias=x86_64-linux-gnu 'CFLAGS=-g -O2 -fdebug-prefix-map=/build/squid3-4PillG/squid3-3.5.23=. -fstack-protector-strong -Wformat -Werror=format-security -Wall' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -Wl,--as-needed' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/squid3-4PillG/squid3-3.5.23=. -fstack-protector-strong -Wformat -Werror=format-security' --with-openssl --enable-ssl-crtd
make && make install

The configuration above is identical to the precompiled one in the Debian Stretch repository, apart from the addition of the SSL options. If you are using a different distro the above command may not work.

Most of my configuration is based on the guide in the official squid documentation. My squid configuration is as follows:

acl ftp proto FTP
acl SSL_ports port 443
acl SSL_ports port 1025-65535
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl LANnet src # local network for virtual machines
acl step1 at_step SslBump1
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
http_access allow LANnet
http_access deny manager
http_access allow localhost
http_access deny all
http_port 3128 intercept # intercept required for transparent proxy
https_port 3129 intercept ssl-bump \
    cert=/etc/squid/antfarm.pem \
    generate-host-certificates=on dynamic_cert_mem_cache_size=4MB
ssl_bump peek step1
ssl_bump bump all
sslcrtd_program /usr/lib/squid/ssl_crtd -s /var/lib/ssl_db -M 4MB
sslcrtd_children 8 startup=1 idle=1
access_log daemon:/var/log/squid/access.log logformat=combined
pid_filename /var/run/squid/squid.pid
coredump_dir /var/spool/squid
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
request_header_access X-Forwarded-For deny all
httpd_suppress_version_string on
always_direct allow all

Use the SSL certificate generation process shown in the linked guide. Once you have created the .pem file, copy the section from —–BEGIN CERTIFICATE—– to —–END CERTIFICATE—– into a new file with the extension .crt.

A few notes here:

  • The ‘intercept’ keyword is necessary if you are using iptables to redirect ports to squid as a transparent proxy. If you configure your client to explicitly use a proxy, you should not use it.
  • The always_direct clause is used because we are routing squid’s output to another host (running tor) as the default gateway. If you wanted to use the squid → privoxy → tor configuration locally, you would use ‘never_direct’ instead.
  • The path for the ssl_crtd tool in Debian is /usr/local/squid/ssl_crtd – no libexec.
  • When setting permissions for the cache directories in Debian, use “proxy:proxy” instead of “squid:squid” as this is the default user that Debian creates to run the squid service.

In order for the virtual machine to treat the falsified public keys as genuine, we must instruct it to trust the certificate as created above. For a Windows 7 host like mine, double click the .crt file and import the certificate in to the Trusted Root Certification Authorities store.

Importing a cert

With squid set up and certificate imported, you must then configure iptables on the hypervisor host to redirect traffic through squid.

iptables -t nat -A PREROUTING -i virbr0 -p tcp --dport 80 -j REDIRECT --to-port 3128
iptables -t nat -A PREROUTING -i virbr0 -p tcp --dport 443 -j REDIRECT --to-port 3129

where virbr0 is the name of the virtual interface in QEMU. You should adjust interface name and destination ports as required for your setup.

tor service

On the second host I have installed tor (version from Debian Stretch repo). This is configured with ports to listen for TCP and DNS connections in /etc/tor/torrc:


Then with iptables, inbound traffic from the hypervisor host is redirected to tor:

-A PREROUTING -s -i eth0 -p tcp -j REDIRECT --to-ports 8081

Since the objective is to keep my real IP hidden, care must be taken to ensure the host’s routing does not leak information. In /etc/network/interfaces, instead of specifying a gateway, I added two routes:

up route add -net netmask gw
up route add -net netmask gw

This causes all traffic not intended for my internal network to be routed to the host running the tor service (on I have then configured my firewall so that it only allows connections reaching in to this VLAN, or from the tor host, not from the malware VM hypervisor.  When updates are required, connectivity can be enabled temporarily, with the VMs paused or shut off. Alternative techniques include allowing the hypervisor host to update via tor (if I didn’t mind it being slow), or routing the traffic from the VMs without NAT and denying anything outbound from the VM network on my core router, but that’s something to look at another day.

With the gateways set up, the routing for the VM interface can then be applied on the hypervisor host:

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -A FORWARD -i virbr0 -j ACCEPT

After applying these rules you should have a fully functioning TLS/SSL intercept routed via tor. To test, start by attempting to resolve a few hostnames from the VM and verify that the traffic is hitting your tor service host BEFORE giving any web pages a spin. Move on to HTTP/HTTPS traffic once you are sure DNS is working correctly.


Once you have a functioning setup you should expect to see both HTTP and HTTPS URLs appearing in your squid access log. In addition, if you perform a packet capture on the hypervisor virtual interface (virbr0 in my case), you can use the key generated earlier to view the decrypted traffic in Wireshark. You will need to copy the private key section of the .pem file to a new file to use in Wireshark. When entering the protocol as described in the link above, use ‘http’ in lowercase – uppercase will not work.

importing an SSL key in wireshark

decrypted output of call to https://ipapi.co

Sandbox stealth mode: countering anti-analysis

20,000 Leagues Under The Sand – part 6

read part 5

As long as there are robbers, there are going to be cops. The bad guys know perfectly well that people will be trying to identify their malware, and have all sorts of anti-analysis tricks up their sleeves to evade detection. Malware will very often perform checks for common analysis tools and stop running if it identifies their presence. Since one of the most fundamental tools for a malware analyst is the use of a virtual machine, it is the subject of numerous and varied detection attempts in many families of malware.

Simple strings

In its default configuration, a virtual machine is likely to have a wide range of indicators of its true nature. For example, it is common for standard peripheral names to contain hints (or outright declarations) that they are virtual.

VirtualBox DVD indicator

VirtualBox DVD drive

This is likewise the case for QEMU/KVM among others. As well as peripheral devices, the CPU vendor information may also be an indicator:

Device Manager processor info in QEMU/KVM

Device Manager processor info in QEMU/KVM

Less obvious to casual browsing but still perfectly simple for code running on the system to detect are features such as the CPUID Hypervisor Bit, MAC address, and registry key indications such as the presence of BOCHS or VirtualBox BIOS information in the registry.

SystemBiosVersion registry value

SystemBiosVersion registry value

These detections depend on the code of the hypervisor; in some cases they can be overcome by specifying particular configuration values, and in others they can only be solved by modifying the source code of the hypervisor and recompiling it. Fortunately for my choice of QEMU/KVM, many people have already looked at this particular problem and some have been generous enough to publish their solutions. There is also a fair amount of information out there for VirtualBox (partly because Cuckoo favours this hypervisor), and some for VMWare ESXi.

Bad behaviour

Another means of detecting an analysis environment is to collect information indicating the behaviour and use of the system. As discussed in part 4 of this series, simulating the presence of a user is an important ability to counter this evasion method. You should also consider environmental factors such as uptime (it is improbable that a user would boot a system and immediately run malware; some samples look for a minimum uptime period).

Presence of the abnormal, absence of the normal

One of the side effects of Windows being engineered to be helpful, is that it leaves behind traces of a user’s activity everywhere. In addition, people are messy. They leave crap scattered all over their desktop, fill their document folders with junk, and run all sorts of unnecessary processes. Looking for evidence of this is quite simple, and malware has been known to refuse to run if there are insufficient recent documents, or very few running processes.

Malware may also attempt to evade detection by searching for running and installed services and the presence of files linked to debuggers, analysis tools and the sandbox itself (e.g. VirtualBox Guest Additions).

Anti-analysis could be a series all of its own, and my understanding of it is still quite narrow. I strongly encourage you to research the topic yourself as there are tons of excellent articles out there written by authors with far more experience.


Although it is not specific to sandboxing, I do not feel this series would be complete without some mention of the delivery of the output. You can write the best code to manage, control, and extract data from your sandbox in the world, but it is worthless if you cannot deliver it to your users in a helpful fashion. Think about what types of data are most important (IDS alerts/process instantiation/HTTP requests?), what particular feature of that data it is that makes it useful (HTTP hostname? destination IP? Alert signature name, signature definition?) and make sure that it is clearly highlighted – but you MUST allow the user to reach the raw data.

I cannot stress this enough. Sandboxes are a wonderful tool to get the information you need as a defender (though not everyone is so enthusiastic), but they are imprecise instruments. The more summarised and filtered the information is, the higher the chance it has to lead the analyst to false conclusions.

You should look at other sandboxes out there and draw inspiration, as well as learn what you want to avoid (whether because it’s too complicated for you right now, or you just think it’s a bad way of doing things) when making one yourself. Start by looking at Cuckoo, because it’s free and open source. Take a peek at the blogs and feature sheets of the commercial offerings like VMRay, Joe Sandbox, Hybrid Analysis, and the very new and shiny any.run.


Sandboxing is a huge topic and I haven’t begun to scratch the surface with this series. However, I hope that I have done enough to introduce the major areas of concern and provide some direction for people interested in dabbling in (or diving into) this fascinating world. I didn’t realise quite how much work it would be to reach the stage that I have; getting on for 18 months in, I’m still very much a novice and my creation, whilst operational, is distinctly rough around the edges. And on all of the flat surfaces. But it works! And I had fun doing it, and learned a ton – and that’s the main thing. I hope you do too.

Host activity monitoring

20,000 Leagues Under The Sand – Part 2

read part 1

As a newbie sandboxer, the biggest obstacle for me was finding a way of getting in-depth information on what actions were being performed by malware I wanted to test. In particular, I wanted to be able to drop some samples, go away and make lunch, then come back and be looking at some results. That meant stepping through it in a debugger was out, or at least a lesson for another day. You’ve probably already seen that I ended up using Sysmon, but let’s have a look at the alternatives for a moment.

Built in Windows logging

Filesystem forensics

  • The files in C:\Windows\Prefetch\ can show if executables were run
  • The AppCompatCache registry key and AmCache.hve hive contain more detailed information on program execution, though neither logs individual execution instances or command line options
  • You can diff the filesystem – have a clean copy, either of the Master File Table or of the entire structure – and compare to see what’s changed; this is a fairly intensive operation, especially if you intend to see if a known good file has been replaced with a malicious version
  • There are tools for parsing registry hives so identifying new/modified keys is possible

Creating your own API call logging

  • If you’re a good enough programmer to write code that logs API calls, this is the gold standard. I am not (yet) up to this. It is possible to monitor for most of the interesting events such as process and file creation, registry modification etc. using filter drivers. If you want to go a step further and monitor (or even intercept and change) system calls, you need to be looking at DLL injection. This is the method used by Cuckoo sandbox, among many others.

Building monitoring in to the virtualisation

  • Technically this is all just code simulating hardware running other code. If you’re smart enough to modify a hypervisor so that it can recognise and log API calls within its guests, go for it. Please excuse me for thinking you’re a bit mad though!

Options #1, #2 and #4 hold an additional advantage of being difficult or impossible for sandbox evasion techniques to pick up on.

And then we get to Sysmon, which is in effect a version of #3, but it has a big advantage: somebody else did all the work for us! Hooray for Mark Russinovich and Thomas Garnier. Many sandboxes do API call monitoring; sometimes it can be a little bit excessively detailed (hello Cuckoo) but as far as understanding what malware is doing goes, it’s the bee’s knees. Let’s have a look at what you can get out of it.

Sysmon ProcessCreate event output

Sysmon Process Created event

We’ll ignore for now how much my UI leaves to be desired. Here is perhaps the most commonly of interest event to you: Process Created. In this event you have a wealth of data including not only the location of the executable, launch command and parent processes, but the MD5 and SHA256 hashes of the file. You can also get the import hash here too – though I’d forgotten to turn it on for this run. You can see what ran, from where, by whom, and how it was run, in a glance.

Sysmon File Created event output

Sysmon File Created event

Next up we can log the act of creating a file; in this case a trojan makes new copy of itself which is placed in C:\Users\<username>\System\Library\mshost.exe.

Sysmon Registry Value Set event data

Sysmon Registry Value Set event

You can also monitor for interesting things happening in the registry. This is one of the primary methods by which malware achieves “persistence”, i.e. the ability to remain active on the system it infects. Here we can see a new entry being created in one of the user’s Run keys.

Sysmon Network Connection output

Sysmon Network Connection event

In a final example, Sysmon allows you to detect initiation of network connections; not only do we have the network level data of the destination IP and port captured, but the destination hostname is also identified.

In just four event types, Sysmon is able to record the malware starting, hiding itself, achieving persistence, and contacting its Command and Control server. This is the power of logging API calls. But wait – there’s more! This only scratches the surface of what Sysmon can do. It is also capable of identifying:

  • A process changing the creation time of a file
  • Process termination
  • Loading of drivers
  • Loading of additional modules in to existing processes
  • Creation of threads within other running processes
  • Raw access to the disk (as opposed to using the file system APIs)
  • Access to another process’s memory
  • Creation of alternate data streams
  • Use of named pipes (a method of communicating between processes)
  • Use of Windows Management Instrumentation

As you can see, it’s a fantastic tool which would be pretty hard to top if you decided to try doing this yourself. If you are thinking of experimenting with malware – or looking for something to help you keep a closer eye on your systems in general – I can’t recommend it enough.

In part 3 I will discuss the use of IDS and packet capture tools to get detailed information on the malware’s communication.

20,000 leagues under the sand, part 1

Greetings, malware junkies!

Welcome to the first part in my mini epic documenting my journey of discovery into the world of sandboxing. If you come here expecting groundbreaking advances in the field, you may be disappointed. If however you want to see some of the ideas a newbie had so that you don’t have to think of them yourself, you might be in the right place. Also maybe seeing the dumb mistakes I made so you can avoid ’em 😊

This series is not intended to be a technical instruction manual on sandbox creation. What I intend to do is introduce and discuss the core problems and issues and outline potential approaches for solving them. Along the way I will give specific examples with some detail from the solutions I created for my own sandbox.

A long time ago in a SOC far, far away…

I started my project a little over a year ago, having spent at least that long watching someone else do this roll-your-own sandbox thing with no small amount of awe. Although I was fair with python and could bumble my way around Linux, Snort rules, pcaps and the like, the idea of reproducing this kind of feat even on the most modest scale seemed like a pipe dream. I saw malware go in, and not just pcaps and Snort alerts come out, but a wealth of host activity like file creation, shell commands, remote threads… you name it, it was there. I had the barest scraps of understanding about the Windows API and far less than that about how one might go about tracking something using it.

Without having had this project to look up to, I might have set my sights a little lower, but I was hooked and I wanted to do All Of The Things. I had a laundry list of features in mind based on the aforementioned project and other sandboxes I was beginning to learn about, including but not limited to:

  • Host activity logging
  • Full network capture
  • NIDS alerting
  • AV detection
  • Cross referencing samples on lots of IOCs
  • Screen capture
  • User behaviour simulation
  • Countering sandbox evasion

You might be forgiven for thinking I was a little mad with ambition. No, you’d definitely be forgiven, I was bananas.

However, around Christmas 2016, the biggest obstacle suddenly got a lot smaller when I realised that I already knew of a tool that could do most of the things that my lack of C/C++/WinAPI coding knowledge was preventing, a tool that was continually praised by a twitter account I follow whom I’m sure you have never heard of – Sysmon. I was (perhaps a little optimistically) certain that I could find ways to get my code working for all of the other components I wanted, so when that realisation hit I immediately started coding. If I had known how much of my time it would eat, I might have had second thoughts…

Anyway, from this moment my pet project was born. It’s clunky, ugly, and unreliable, but I’ve learned a lot! Folks, may I present The Antfarm.

In my next post I will talk about my starting place for this somewhat chaotic adventure: how one can detect and log actions and events on a host that may be malicious.

part 2 – Host activity monitoring

Angler knows when you’re fakin’ it

A brief introduction to Angler for those who are new to it: Angler is a framework of malicious code that criminals can purchase/rent and deploy on web servers they have control over – usually ones they have compromised rather than renting for themselves. It is typically comprised of an initial component which is injected into a site you are likely to visit, which redirects you to the second location, known as the exploit kit landing page. The landing page has code which tests what features and capabilities your browser has, in order to identify whether you are vulnerable to any of the exploits the criminals have access to; this is known as “browser enumeration”. This page often contains the exploits themselves, but they can sometimes be called from another location. Finally the exploit will contain commands that will attempt to download and run a program, which could be anything – a remote access tool, ransomware, you name it. If you want to know a bit more about what comes out the business end of an Angler chain, there are many examples on places like Malware Don’t Need Coffee and Cisco Talos.

Angler has developed steadily, adding new features in its efforts to more effectively select vulnerable users, better evade detection, and increase the difficulty of analysing it. The creators have added polymorphism on all stages of the code so that it changes every time the page loads, making it much more difficult for antivirus and intrusion detection systems to recognise. There are up to three layers of obfuscation in the javascript. The most recent trick I have seen is a little call that doesn’t seem to do much in terms of sorting potential victims from the crowd, but made analysing it a bit more of a challenge.

1 raw page

The above image will be familiar to anyone who has investigated Angler. Dozens of variables with completely random names, and an eval to turn them into some valid code. Looks like this one has been inserted into a header PHP file of some sort as it’s appearing even before the opening HTML tags. This particular group becomes:

eval(String.fromCharCode.apply(null,document.getElementById("xndotnjmjwlj").innerHTML.split(" ")))

Getting the text from a hidden div and turning it from a series of character codes into text, then evaluating. The hidden div part is a bit new but nothing particularly special (previously it just put the text to decode directly in a string variable). That text becomes the following:

2 angler decoded div 1

On line 14 it is getting the code from another div and then feeding it through a loop which does some maths on the characters and constructs another string. That doesn’t sound too complicated, but try as I might I just could not get this code to do its stuff, not until I started stepping through it and watching what values got assigned to the variables in each step; and as it turns out, the problem is on the very first line.

kaypkiafgibrwvp = (+[window.sidebar])

The lines that follow it create a loop with this value as the start, and test whether the strings “rv:11” and “MSIE” are present in the user agent – a fairly standard way of detecting what browser is coming to call. Why was this code not running properly? Presumably if it’s a for loop this code should start with kaypkiafgibrwvp being 0, so let’s check that bit – in fact you can try this out yourself. Bring up the developer console in your browser (Firefox: ctrl+shift+k; Chrome: ctrl+shift+i; IE: F12) and put the text in the box into the console. In IE, Chrome and Safari, you’ll get “0”. But in Firefox and some javascript debugging tools, you’ll get “NaN”.

That “NaN” breaks the code – it will never send your browser anywhere. So in order to get this code to turn out something useful, I’ll need to replace at least that value with 0 rather than the output of the expression. The other value that’s set here from browser variables, othddelxtnfae, is used as a modulus a bit further down so it’s necessary to make sure that one’s correct too; a little tinkering showed that the correct value for this variable was 2 – indicating the UA must contain “rv:11” or “MSIE10”, but not both.

3 final redirect formatted

And there you have it, an iFrame hidden off the top of the page, going to the next part of Angler.

Given that the call to window.sidebar only eliminates one of the major browsers from contention it seemed unlikely to me that this would be the reason for including it – an anti-analysis technique seemed more likely. A bit of searching reveals that this is because some popular analysis tools depend on open source code from the Firefox javascript engine… and Trustwave basically wrote everything in this post two days ago. Lesson learned: analysis first, beauty sleep later.

Beginning your journey into javascript deobfuscation

Obfuscation of code is an interesting technique used by malware authors to conceal what their code is doing from anyone looking at it. Finding out what it does can be tricky, especially if you’re not sure how to start. So, when a wave of WordPress compromises broke out a few months back and Zscaler wrote a nice overview of what the compromised sites were trying to do, I thought it was an excellent opportunity to go through the javascript they quoted and show exactly how to unwrap it; especially given that it’s a relatively simple example.

The first thing you need to do is to find the page with the obfuscated javascript on it. I can’t tell you exactly how you should do this; it will depend on how you are encountering the issue – it could be that it’s your WordPress install that has been owned, or it could be that you’re seeing alerts from your intrusion detection or antivirus systems that you need to investigate – any number of reasons. Once you have identified the source of the alert (proceed very carefully from here on in!) you need to get the source code of the page that you’re investigating. Use something like curl or wget in Linux/OSX for safety. I would never suggest pointing a browser at it unless you really know what you’re doing.

Obfuscated code tends to stand out to human eyes because it creates patterns that are very distinct from the code on the rest of the page. There are different styles, so this example won’t teach you to spot all of them but believe me when I say that once you’ve seen a few you will be able to rapidly scroll through a page and spot the blocks of evil code. In my example, the evil code looks like this:

1 Raw JS

Distinguishing features include:

  • large wall of text where code above and below is relatively short and spaced out
  • random looking letters that make no sense
  • lots of little groups of numbers
  • a “document.write” command
  • not seen here, but an “eval” command is often associated with malicious javascript – worth mentioning because this one is relatively rare in benign code whereas “document.write” is common.

None of these is evil in itself but they’re common things to see in evil javascript.

The first thing you need to do is break out the stuff between the script tags into a separate file to play around with it. I’m going to do my decoding using Python because that’s what I’m used to but you can use pretty much any language for this. Find statements separated by semicolons and split them on to new lines so it’s easier to follow (for large, complicated blocks you might want to use a tool like JSPretty but this one is quite simple).

2 JS line by line

Once you’ve done that you can start to get an idea of what is in there. In this example, there are variables ‘a’, ‘b’, ‘c’, ‘clen’, then a for loop, and finally an unescape statement. Breaking down the for loop further, it does the following:

  • iterates through every character in ‘a’
  • in each iteration, get the numeric ASCII value of the character
  • bitwise XOR that value with 2
  • convert the resulting value into a new ASCII character
  • append the new ASCII character to a string

This is something that converts very readily to Python, like so:

3 JS to python

Followed by the output:

4 decode stage 1 output

This gives us a URL-encoded string (character codes preceded by % symbols). Python has a built-in library, urllib, which can decode this for us.

5 url encoding handling

Finally, we get a human-readable version of the content.

6 final output

The “document.write” command would have written the decoded content to the browser which as you can now see would have created an 0x0 sized iframe calling a resource from teaserguide[.]com. If you haven’t already read Zscaler’s article showing what happens after that, I highly recommend that you do so now.

This was a pretty low-level example; some malicious code has obfuscation three or more levels deep. Others use weird tricks in javascript to produce code that is mind-bendingly hard to pick apart such as the JJEncode technique (warning: you might need a stiff drink and a lie down after looking at that one – I did, and I take my hat off to whoever wrote the script to reverse it). I hope this was helpful to get you started.