Wednesday, December 18, 2019

Inline Loop Detection for Compressing API Call Traces


I have been working on a solution for compressing files containing trace of API calls coming out of a sandbox (Cuckoo sandbox [1]). This file holds events generated by a monitor module (DLL) which is injected into processes and hooks a pre-configured set of API calls. The main goal of this log is to describe interaction between processes and Operating System (Behavioural data).

This file can grow in size (memory demanding), complexity (CPU demanding) and noise (difficult to read) depending on two main factors: 
  • how long an analysed process is executed; and 
  • presence of specific control flow structures (e.g. loops).
This article proposes two methods for detecting looping patterns. The outcome of these methods is further used for compressing the above described sandbox log files without changing its structure.

Detecting loops by observing its execution is a complex task since not always a well defined pattern emerges. This article uses the terms "repeated element" and "loop body" to identify a content inside a loop statement. This element can be composed by a single call (repeated multiple times), multiple sequential calls and nested calls; its execution output can also be affected by control flow elements such as conditionals.

Repeated elements can also be affected by Pseudo-Random Number Generators. In this case, the final log presents more complex patterns. Algorithms described in this article do not consider this less common scenario and target detection of more general looping patterns.

.::[ Context and Example

Python-like pseudo-code snips are presented along this article for illustrating the proposed techniques. The following code snip describes a loop executing calls to functions labeled from 1 ("call_001") to 5 ("call_005"):

The body of the loop statement (line 2) executes functions "002" and "0035 times and function "004" every time "i" is even. The behavioural log produced by executing the script above would be:

Each entry in this log contains a position (line number "line[x]") and an identifier ("c_00[x]") to a call. A new entry is added to this file every time a function is executed. Taking a closer look, it is possible to observe a repetition pattern located between lines 2 and 14. In this specific example, these lines describes the execution of a loop and could be compressed to the following form:

Thursday, October 31, 2019

Dynamic Imports and Working Around Indirect Calls - Smokeloader Study Case

When reversing malware it is common to find an injected payload loading references to external resources (DLL functions). This happens for two main reasons:
  1. The hosting process does not have all resources necessary to the execution of the injected payload; 
  2. Making reversing engineering the malware trickier since the dumped segment will have all calls pointing to a meaningless address table. 
This article explains how to revert this trick and get back API call names annotations in an IDApro database. A sample of Smokeloader was used for illustrating the ideas described in this post.

This article is divided in three main parts:
  1. Explaining the observed technique;
  2. How it works; and
  3. How to circumventing it in order to facilitate reversing. 
First of all, shout out to Sergei Frankoff from Open Analysis for this amazing video tutorial on this same topic which inspired me to write about my analyses. Regards also to Mark Lim who also wrote a very interesting article about labelling indirect calls in 2018. His article uses structures instead of patching the code (which is also a good approach) but I think it lacks important details and I will try to cover these points in here.

Examples presented in this article were extracted from the following Smokeloader sample:

Filename:   p0n36i2d.exe
MD5:         a8cc396b6f5568e94f28ca3381c7f9df
SHA1:       12948e36584e1677e80f78b8cc5c20576024c13f
SHA256:   17b548f9c8077f8ba66b70d55c383f87f92676520e2749850e555abb4d5f80a5
Size:           215.5 KB (220672 bytes)
Type:          PE32 executable for MS Windows (GUI) Intel 80386 32-bit

Explaining what is going on in the first stage (packer/crypter) is out of scope; this article focuses on characteristics found in the final payload. This sample injects the main payload in "explorer.exe" as it is possible to observe in this AnyRun sandbox analysis.

Figure 01 shows how the code looks immediately after the execution control passes to the injected code.

Figure 01 - Smokeloader's final payload.
Three points were marked in this code snip (1, 2 and 3).  The first point (1) is the call to the main function (located at 0x002F1853). This function expects to receive an address through ECX register. This address points to a data segment where all temporary structures will be stored. 

Monday, August 5, 2019

Smokeloader's Hardcoded Domains - Sneaky Third Party Vendor or Cheap Buyer?

Smokeloader is a small modular bot first seen in 2011 [1] mainly used as a dropper for other malware families. Although mainly used for delivering a second stage stage, Smokeloader implements several malicious capabilities through its modules, such as: keylogging, process monitoring, DDOS, DNS redirection and form grabbing. These modules are often used for profiling and accessing infected machines before deploying a final malware increasing effectiveness of campaigns.  

- So here comes the main story -

Last week I saw a tweet [2] with an image of this server hosting few quite large executables (~1.2MB) claimed to be Smokeloader samples. These binaries were accessible through an Open Directory.

Figure 01: Open Directory exposing modified Smokeloader samples.

Along the 30th and 31st of July these files were changed few times. Here are the hashes found and analysed during the time of this research:


Wednesday, November 25, 2015

CryptoWall 3.0

This post presents an analysis for a sample of the crypto-virus known as CryptoWall 3.0. 

.::[ ARTIFACT CHARACTERISTICS
Table 1 shows general information about the analyzed binary.

Table 1: Artifact General Information
Characteristic
Value
File name
sample1.xxx
MD5
15914886232c164bb2521af59aa0e06e
SHA1
cdff1327225a6a721e32f204ad1e4eb3a4a1d44e
Size
220 KB
Binary type
PE 
Architecture
x86
Bits
32


.::[ GENERAL DESCRIPTION
The analyzed sample represents a variant of the CryptoWall malware delivered as payload by Angler Exploit Kit. There are some more elaborated analyses about this malware around the Internet [1][2][3]. 

CryptoWall is categorized as a ransomware by most anti-virus technologies. Its main behavior is: 
  • run some "anti-debugging" techniques (e.g. verifies if the caller process is Perl or Python);
  • deactivate native OS protections;
  • deactivate backup and file snapshotting policies;
  • start communication with C&C server;
  • generates a temporary crypto key for each request;
  • Uses this key to encrypt (RC4) the communication between the infected machine and the C&C server. This key is passed as parameter inside each HTTP request;
  • sends an unique identification of the infected machine to the server (called "CUUID");
  • the server sends back a public key and an unique image used to link the infected machine to its server side session;
  • this process uses this public key to encrypt specific files inside the disk of the victim (according an extensions table);
  • the malware redirects the victim to a web page where the user can pay (in Bitcoins) to obtain the private key and recover the encrypted files;
  • the malware deletes itself (files remain encrypted);
Once the amount requested is paid the user receives a private key which can be used for recovering the encrypted files. 

Monday, November 16, 2015

Fireeye - FlareOn 2015 (Challenge #4), Relocation Table and ASLR

Continuing the series of posts about the FlareOn reversing challenges 2015 [1]. If you interested, you can find my previous post with solutions for the first three challenges here [2]. 

I had some "extra work" due to infra-structure issues during this challenge and ended up learning new things. So I decided to dedicate a whole post for the challenge #4 not only explaining the solution but also explaining these issues. 

.::[ PREAMBLE
As the previous challenges the objective in this task is to recover the flag (an e-mail address) from a PE-32 binary file. The main insight for this challenge is that the binary was packed using UPX [3]. There are many ways to raise evidences that a binary was packed using UPX, the main is to examine sections. We could obtain this information using rabin2 by the following command. Figure 01 shows the sections labels for the analyzed binary.

Figure 01: Sections information for analyzed binary.
As we can notice there are 2 sections (marked with the red rectangles) called "UPX0" and "UPX1". These sections are used by UPX to store compressed data of the original binary.

Sunday, October 11, 2015

Fireeye - FlareOn 2015 (Challenges 1-3)

I took the day off today to solve the Fire Eye Reversing FlareOn challenges [1] and decided to publish my notes here. These challenges were very entertaining and I strongly recommend to anyone who is interested in reversing for fun.

In total, there are 11 challenges with different levels of difficult and covering the most diverse kind of technologies (from .NET to mobile reversing). This post goes through solutions for the first challenges. 


.::[ Challenge 01
This challenge is a win32 executable and basically validates a password inputed by the user. If the password is correct the binary outputs a flag otherwise an error.

Figure 01: Challenge 01 XOR encryption scheme
By analyzing the code we realize that the binary compares the user input with an encrypted string located at address "0x00402140". This string has 24 characters (according comparison at the address "0x0040105e") and is the targeted flag in this challenge.

Wednesday, October 7, 2015

Radare 2 - An Open Source alternative to IDA

Radare [1] is an open source and multi-platform framework for Reverse Engineering activities which supports assembly and disassembly many architectures and binary formats [2]. As any other reversing framework, Radare framework aims to recognize high level features on machine code, such as: data structures, functions and execution flows. Radare has buildings for the most populars Operating Systems, such as: Microsoft Windows, Mac OS X, Linux, BSD, iPhone OS, Solaris and MeeGo. Figure 01 presents the main command line interface for Radare

Figure 01: Radare command line interface
Radare offers few options of interactive graphical interfaces, such as: Web, GTK (Python) and ASCII-Art graph. Another very useful characteristic due to its designing is the capacity to easily implement new architectures, binary formats and analyses [3][4]. Radare provides an open API and with many bindings for many  programming languages, such as: Python, Java, Ruby, Go and Perl. Radare is also integrated with the most popular debuggers supporting local and remote debugging [5], such as: gdb, rap, webui, r2pipe, winedbg and windbg.

This post aims to presenting a comparison between reverse engineering features from IDA pro [6] and Radare 2. We are going to discourse in which situation it is more appropriate to use each tool. This article can be used as a condensed user guide for Radare (a complete guide is available online [7]). This post can be used also as a reference guide once the above mentioned official guide is outdated and most examples do not work with the most recent version of Radare any longer. In this small tutorial we show the main functionalities of Radare in practice by solving a small "crack me" challenge.