in APT

Analyzing Vietnam APT attack originating from China (Part 1)

We have reported about a vulnerability relating to Microsoft Word application and called as CVE-2012-0158. You can refer to here. [1] According to the result on Virus Total, we analyze a malware relating the CVE-2012-0158 vulnerability.Last month, I paid much attention to the following email:


Figure 1. E-mail in the inbox

This email was sent from an unknown address (** and attached with a .DOC file (Lịch-các-ngày-lễ.doc). Because of suspecting this file attached a keylogger, a type of malware, we had uploaded this file on Virus Total site ( in order to check whether it is a file infected malware or not. As expected, the result of Virus Total was 28/57 antiviruses including Kaspersky, Bitdefender, ESET-NOD32, which identified a vulnerability CVE-2012-0158 exploit. This exploit was owned by an author named Trần Duy Linh.

I started shaping of what you need to find the shellcode containing in .DOC file. I used Frank Boldewin’s the OfficeMalscanner toolkit [2] to scan this file. The result returned contains an OLE2 Compound Format embedded into this file.


Figure 2. The result was returned by the OfficeMalscanner toolkit

Continuing to scan OLE2 file by the OfficeMalscanner toolkit:


Figure 3. The result canning OLE file was returned by the OfficeMalscanner toolkit.

The scanned OLE file cannot detect malware. Therefore, we decided to find shellcode by hand.

We used 010 Editor [3] to analyze this .DOC file. As this file is not like the RTF file analyzed earlier, we decided to try to find the NOP string (90 90 90 90) from which the shellcode often start. The result returning included 2 offset addresses where the NOP string was started. I was particularly interested to 0x6DD0 offset


Figure 4. Signs of shellcode in .DOC file.

Before the NOP-Sled block, we noticed 4 bytes of 0x27583C30 (Litte Endian) value, an address of opcode (JMP ESP) located in Windows XP SP3’s MSCOMCTL.OCX. A remarkable byte string behind the NOP-Sled block was the same as some opcode assembly of familiar codes (PUSHAD và JMP [offset]).

To be sure, we tested to disassemble a hex code starting at 0x6E00 by a disassembly online[4]. The code began with PUSHAD and used the first 0x1F bytes to decode the following 0x167 bytes by XOR with 0xCC.


Figure 5. Shellcode transforms themselves by XOR with 0xCC

We extracted 0x167 bytes starting from 0x6E1F offset to a .bin file and used FileInSight [5] to perform XOR with 0xCC.

Thanks to disassemble the hex code by IDA Pro tool, we recovered the results after several times pressing the “C” button:


Figure 6. “kernel32” string stored in Stack.

As a result, we were capable of confirming this shellcode including:

  • Starting position: 0x6E00 offset of .DOC file.
  • Size of shellcode: 0x187
  • Shellcode transforms themselves with the first 0x1F bytes by XOR with 0xCC

Analyzing shellcode:

We started more thoroughly analyzing about shellcode. Now, there were two ways to be likely to analyze dynamically shellcode:

  • : We extracted shellcode by hand. Then, we were going to use a tool to transform shellcode to .exe file or write a program to jump into shellcode.
  • : Changing one byte of 0xCC value in .DOC file.

We decided to choose the 2nd method. We had changed one of the 0x90 bytes (NOP code) to 0xCC. Subsequently, we debugged this .DOC file by loading Microsoft Office 2007 SP3 into IDA Pro on a virtual machine running Windows XP SP3. The display of the debug process stopped at the point where we had changed by 0xCC.

Setting breakpoint at the decryption function’s the location of RET command, we debugged continuously. Shellcode was decrypted by the XOR algorithm and started the main job.

Firstly, the shellcode parsed PEB to get the address of kernel32.dll


After getting the base address of kernel32.dll, shellcode used a decryption function to find the addresses of 6 APIs owning encrypted strings as follows:


The decryption function performed the following tasks:

  • Parsing the address of ENT (Export Name Table) of kernel32.dll.
  • Browsing each APIs and decrypting the name of API with following algorithm.
  • Getting the address of the function of decrypted name coinciding with the input value.

This is an assembly code of the decryption function:

We set a breakpoint after the decryption function, and traced continuously to receive 6 respectively APIs:

  • GetFileSize
  • LoadLibrary
  • SetFilePointer
  • ReadFile
  • GetModuleHandle
  • GlobalAlloc

Thanks to encrypting and decrypting the name of the APIs, shellcode made the analyzing process become difficult.

After getting these addresses of APIs, shellcode allocated memory and retrieved the HANDLE of kernel32. We were wondering why the author of shellcode had used repeatedly the decryption function to get addresses of APIs before shellcode allocated memory and read data from .DOC file.

During analyzing, we detected that the shellcode executed a different code in .DOC file (we called this new code as shellcode2) by moving 0x7B2 bytes from the location of 0x1A830 value to allocated buffer, and then, shellcode were jumping straight into this buffer.

Because of getting errors when debugging shellcode2 by IDA Pro, we decided to extract shellcode2 by the 010 Editor tool and called shellcode2 as “Dropper” for convenient using. We stored the hex code of the Dropper into a .BIN file and started to analyze the Dropper.

(Part 2 link)

Write a Comment


  1. thanks for your intersted analyzing

    Do you mind if send to me the sample? Beacause i really want to practise

    to my email, please

    Thanks a lot

  • Related Content by Tag