Monday, June 22, 2020

Comparative analysis between Bindiff and Diaphora - Patched Smokeloader Study Case

This article presents a comparative study case of diffing binaries using two technologies: Bindiff [1] and Diaphora [2]. We approached this topic in a Malware Analysis perspective by analyzing a (guess which malware family?) Smokeloader (! :D) campaign.

In August 2019, I spotted this campaign using patched samples of Smokeloader 2018 samples. This specific actor patched binaries to add new controllers URLs without needing to pay extra money (Smokeloader's seller charges extra-fee for C2 URL updates). This campaign was described in more detail in this previous article [3].

More details about the original samples [4][5] analyzed in this article can be found in the following tables:

 Filename:                        smokeloader_2018_unpatched.bin 
 33792 Bytes
 File type: PE32 executable (GUI) Intel 80386, for Microsoft Windows
 md5: 76d9c9d7a779005f6caeaa72dbdde445
 sha1: 34efc6312c7bff374563b1e429e2e29b5da119c2
 sha256: b61991e6b19229de40323d7e15e1b710a9e7f5fafe5d0ebdfc08918e373967d3  

 Filename:                        smokeloader_2018_patched.bin
 Size: 1202732 Bytes
 File type: PE32 executable (GUI) Intel 80386, for Microsoft Windows 
 md5: 7ba7a0d8d3e09be16291d5e7f37dcadb
 sha1: 933d532332c9d3c2e41f8871768e0b1c08aaed0c
 sha256: 6632e26a6970d8269a9d36594c07bc87d266d898bc7f99198ed081d9ff183b3f  

The following tables hold details about the unpacked code dumped from "explorer.exe" used in this article [6][7]. 

 Filename:                          explorer.exe.7e8e32c0.0x02ee0000-0x02ef3fff.dmp 
 Size: 81920 Bytes
 File type: data
 md5: 711c02bec678b9dace09bed151d4cedd
 sha1: 84d6b468fed7dd7a40a1eeba8bdc025e05538f3c
 sha256: 865c18d1dd13eaa77fabf2e03610e8eb405e2baa39bf68906d856af946e5ffe1  

 Filename:                       explorer.exe.7e8df030.0x00be0000-0x00bf3fff_patched.dmp
 Size: 81920 Bytes (yes, same size)
 File type: data
 md5: d8f23c399f8de9490e808d71d00763ef
 sha1: e1daad6cb696966c5ced8b7d6a2425ff249bf227
 sha256: 421482d292700639c27025db06a858aafee24d89737410571faf40d8dcb53288  

Summarizing the main changes implemented by this patch are: 
  • wipes out the code for decrypting C2 URLs;
  • replaces it with NOPs and hardcoded C2 URL string; and
  • preserves the original size of decryption function to not disrupt offsets;
Figure 01 and 02 presents the graph of the original code. Figure 01 is the code used for indexing a table of encrypted C2 URLs payloads. Figure 02 lists the code used for decrypting the C2 URLs. This function is called in other parts of the code (not only by the function shown in Figure 01) - this is why they are not merged in one function. We labeled functions (e.g. "__decrypt_C2_url" and "__decrypt_c2_algorithm") in this assembly code to make it easier to read. 

Figure 01 - Original code used for decrypting C2 URLs.

Wednesday, June 10, 2020

Unpacking Smokeloader and Reconstructing PE Programatically using LIEF

This article holds notes on my experience unpacking a Smokeloader 2020 sample. The unpacked payload is further used for composing a valid PE file. The outcome is a PE32 executable containing clean code ready for reversing. 

First things first, here is the sample used in this research:

 Size308.17 KB (315568 bytes) 
 TypePE32 executable for MS Windows (GUI) Intel 80386 32-bit 
 First seen                     2020-03-06 21:45:11 
 md5 c067e0a2d7fc6092bb77abc7f7156b60
 sha256   25959cfe4619126ab554d3111b875218f1dbfadd79eed1ed0f6a8c1900fa36e0 

You can find it in VirusTotal [1]. 

This sample does regular already documented Smokeloader checks before unpacking the main payload, such as: 
  • checks if the process is running in the context of a debugger using "kernel32.isDebuggerPresent" function [2];
  • makes a copy of ntdll.dll, loads it and uses it instead. This technique helps to evade some sandboxes and has been described already in this article here [3];
  • looks for specific patterns in registry keys to check if the sample is running under a virtualised environment.
It also performs a small profiling of the hosting machine in order to decide which payload to inject. Smokeloader has specific code for both main architectures x86 and x64. In this article, we gonna unpack the x86 payload of the above mentioned sample. 

Smokeloader has been using various techniques to inject its final payload into the user file management process "explorer.exe". The sample analysed uses RtlCreateUserThread approach in order to copy the final payload to the targeted process. This injection method is better described in this Endgame/Elastic article [4].

So our game plan is: 
  1. pause execution before the unpacked payload is executed by "explorer.exe";
  2. transplant this code to a dummy PE shell;
  3. fix PE header values and section boundaries;
  4. patching Smokeloader code preamble;
  5. test unpacked Smokeloader PE;
  6. how to do all this programatically using LIEF [5].

Smokeloader 1st stage decompresses its payload using ntdll.RtlDecompressBuffer [6] after few anti-analysis checks described above. It does not call this function from the initially loaded ntdll.dll but from a copy of it loaded afterwards. So breakpoints should be set after the binary loads the copy of ntdll.dll. Figure 01 presents a screenshot of this specific code IDA.

Figure 01: Smokeloader first stage decompression code

Wednesday, December 18, 2019

Inline Loop Detection for Compressing API Call Traces

I have been working on a solution for compressing files containing trace of API calls coming out of a sandbox (Cuckoo sandbox [1]). This file holds events generated by a monitor module (DLL) which is injected into processes and hooks a pre-configured set of API calls. The main goal of this log is to describe interaction between processes and Operating System (Behavioural data).

This file can grow in size (memory demanding), complexity (CPU demanding) and noise (difficult to read) depending on two main factors: 
  • how long an analysed process is executed; and 
  • presence of specific control flow structures (e.g. loops).
This article proposes two methods for detecting looping patterns. The outcome of these methods is further used for compressing the above described sandbox log files without changing its structure.

Detecting loops by observing its execution is a complex task since not always a well defined pattern emerges. This article uses the terms "repeated element" and "loop body" to identify a content inside a loop statement. This element can be composed by a single call (repeated multiple times), multiple sequential calls and nested calls; its execution output can also be affected by control flow elements such as conditionals.

Repeated elements can also be affected by Pseudo-Random Number Generators. In this case, the final log presents more complex patterns. Algorithms described in this article do not consider this less common scenario and target detection of more general looping patterns.

.::[ Context and Example

Python-like pseudo-code snips are presented along this article for illustrating the proposed techniques. The following code snip describes a loop executing calls to functions labeled from 1 ("call_001") to 5 ("call_005"):

The body of the loop statement (line 2) executes functions "002" and "0035 times and function "004" every time "i" is even. The behavioural log produced by executing the script above would be:

Each entry in this log contains a position (line number "line[x]") and an identifier ("c_00[x]") to a call. A new entry is added to this file every time a function is executed. Taking a closer look, it is possible to observe a repetition pattern located between lines 2 and 14. In this specific example, these lines describes the execution of a loop and could be compressed to the following form:

Thursday, October 31, 2019

Dynamic Imports and Working Around Indirect Calls - Smokeloader Study Case

When reversing malware it is common to find an injected payload loading references to external resources (DLL functions). This happens for two main reasons:
  1. The hosting process does not have all resources necessary to the execution of the injected payload; 
  2. Making reversing engineering the malware trickier since the dumped segment will have all calls pointing to a meaningless address table. 
This article explains how to revert this trick and get back API call names annotations in an IDApro database. A sample of Smokeloader was used for illustrating the ideas described in this post.

This article is divided in three main parts:
  1. Explaining the observed technique;
  2. How it works; and
  3. How to circumventing it in order to facilitate reversing. 
First of all, shout out to Sergei Frankoff from Open Analysis for this amazing video tutorial on this same topic which inspired me to write about my analyses. Regards also to Mark Lim who also wrote a very interesting article about labelling indirect calls in 2018. His article uses structures instead of patching the code (which is also a good approach) but I think it lacks important details and I will try to cover these points in here.

Examples presented in this article were extracted from the following Smokeloader sample:

Filename:   p0n36i2d.exe
MD5:         a8cc396b6f5568e94f28ca3381c7f9df
SHA1:       12948e36584e1677e80f78b8cc5c20576024c13f
SHA256:   17b548f9c8077f8ba66b70d55c383f87f92676520e2749850e555abb4d5f80a5
Size:           215.5 KB (220672 bytes)
Type:          PE32 executable for MS Windows (GUI) Intel 80386 32-bit

Explaining what is going on in the first stage (packer/crypter) is out of scope; this article focuses on characteristics found in the final payload. This sample injects the main payload in "explorer.exe" as it is possible to observe in this AnyRun sandbox analysis.

Figure 01 shows how the code looks immediately after the execution control passes to the injected code.

Figure 01 - Smokeloader's final payload.
Three points were marked in this code snip (1, 2 and 3).  The first point (1) is the call to the main function (located at 0x002F1853). This function expects to receive an address through ECX register. This address points to a data segment where all temporary structures will be stored. 

Monday, August 5, 2019

Smokeloader's Hardcoded Domains - Sneaky Third Party Vendor or Cheap Buyer?

Smokeloader is a small modular bot first seen in 2011 [1] mainly used as a dropper for other malware families. Although mainly used for delivering a second stage stage, Smokeloader implements several malicious capabilities through its modules, such as: keylogging, process monitoring, DDOS, DNS redirection and form grabbing. These modules are often used for profiling and accessing infected machines before deploying a final malware increasing effectiveness of campaigns.  

- So here comes the main story -

Last week I saw a tweet [2] with an image of this server hosting few quite large executables (~1.2MB) claimed to be Smokeloader samples. These binaries were accessible through an Open Directory.

Figure 01: Open Directory exposing modified Smokeloader samples.

Along the 30th and 31st of July these files were changed few times. Here are the hashes found and analysed during the time of this research:

Wednesday, November 25, 2015

CryptoWall 3.0

This post presents an analysis for a sample of the crypto-virus known as CryptoWall 3.0. 

Table 1 shows general information about the analyzed binary.

Table 1: Artifact General Information
File name
220 KB
Binary type

The analyzed sample represents a variant of the CryptoWall malware delivered as payload by Angler Exploit Kit. There are some more elaborated analyses about this malware around the Internet [1][2][3]. 

CryptoWall is categorized as a ransomware by most anti-virus technologies. Its main behavior is: 
  • run some "anti-debugging" techniques (e.g. verifies if the caller process is Perl or Python);
  • deactivate native OS protections;
  • deactivate backup and file snapshotting policies;
  • start communication with C&C server;
  • generates a temporary crypto key for each request;
  • Uses this key to encrypt (RC4) the communication between the infected machine and the C&C server. This key is passed as parameter inside each HTTP request;
  • sends an unique identification of the infected machine to the server (called "CUUID");
  • the server sends back a public key and an unique image used to link the infected machine to its server side session;
  • this process uses this public key to encrypt specific files inside the disk of the victim (according an extensions table);
  • the malware redirects the victim to a web page where the user can pay (in Bitcoins) to obtain the private key and recover the encrypted files;
  • the malware deletes itself (files remain encrypted);
Once the amount requested is paid the user receives a private key which can be used for recovering the encrypted files. 

Monday, November 16, 2015

Fireeye - FlareOn 2015 (Challenge #4), Relocation Table and ASLR

Continuing the series of posts about the FlareOn reversing challenges 2015 [1]. If you interested, you can find my previous post with solutions for the first three challenges here [2]. 

I had some "extra work" due to infra-structure issues during this challenge and ended up learning new things. So I decided to dedicate a whole post for the challenge #4 not only explaining the solution but also explaining these issues. 

As the previous challenges the objective in this task is to recover the flag (an e-mail address) from a PE-32 binary file. The main insight for this challenge is that the binary was packed using UPX [3]. There are many ways to raise evidences that a binary was packed using UPX, the main is to examine sections. We could obtain this information using rabin2 by the following command. Figure 01 shows the sections labels for the analyzed binary.

Figure 01: Sections information for analyzed binary.
As we can notice there are 2 sections (marked with the red rectangles) called "UPX0" and "UPX1". These sections are used by UPX to store compressed data of the original binary.