<img src="https://ws.zoominfo.com/pixel/JrRu3vUM8j33QSR7Bwxw" width="1" height="1" style="display: none;">

PoorWeb - Hitching a Ride on Hangul

Tearing Down HWP Files To Expose a Trojan

Robert Simmons
Blog Author

Robert Simmons, Independent malware researcher and threat researcher at ReversingLabs. Read More...

PoorWeb - Exploiting Document Formats

Hangul Office is a popular office software suite in South Korea. 1 It shares the same compound file format as older versions of Microsoft Office, but has unique features that are abused to form malicious documents. The landscape of this type of attack has been analyzed closely in the VirusBulletin talk "DOKKAEBI: Documents of Korean and Evil Binary". 2 This type of malicious document is the first stage of an attack chain often leading to a PE executable trojan. Here we start with a set of three malicious Hangul Word Processor (HWP) documents targeting one victim organization, each with a slightly different set of stages, but ultimately leading to payloads in one malware family: PoorWeb.3 Pivoting outwards from these three, a large number of related attacks is found. This amount of data can be confusing especially when the attacks are so similar. However, looking at similarities and differences in malware behavior and code, delineations can be drawn between campaigns. Armed with this knowledge, the PoorWeb payloads and the various stages that deliver them are distinct from previous campaigns operated by the same adversary. Within this campaign, the earliest sample 4 was first seen on March 19, 2019 and the most recent sample on September 16, 2020. 5

HWP Summary Information Stream

HWP files use the Microsoft compound file format specification. This means they are made up of various streams. For the samples analyzed here, two types of streams are very important to analyze: "HwpSummaryInformation" and "BinData". The purpose of the former is to store metadata about the document and can include information about the title and author of the document. Starting with one HWP document 6, Figure 1 shows the stream that contains the string "HighExpert" as extracted by Cerbero Profiler.7

HighExpert String in HwpSummaryInfomation Stream

Figure 1: "HighExpert" String in HwpSummaryInfomation Stream

According to an ESTsecurity blog, this string appears in a variety of samples in a campaign dubbed "Operation High Expert".8 As shown below, this string appears in more than one location in a variety of samples related to this one.

The Hwp Summary Information (HSI) stream is derived from the specifications for the Document Summary Information stream seen in other types of compound files.9 Therefore, rather than writing YARA rules that detect text strings, one needs to use hexadecimal strings that include the structure of the HSI property in addition to the string. The rule starts with the text string "HighExpert" in wide ASCII:

"HighExpert" string in wide ASCII

Prepended to these bytes are two components of the HSI property: 1F 00 00 00 and 0B 00 00 00. The former signifies the start of the property record. The latter is the character length of the property data plus one trailing null byte terminator. In the example here, the string "HighExpert" has 10 characters and 0x0B is 11 in decimal. The resulting YARA rule is shown in Figure 2. The rule is provided at the end of the blog.

HighExpert HWP YARA Rule

Figure 2: HighExpert HWP YARA Rule

Using this rule, 11 other HWP files are identified. These files are shown in the Titanium Platform in Figure 3. File hashes for these samples are provided at the end of the blog.

HighExpert HWP Samples

Figure 3: HighExpert HWP Samples

However, these are not a complete picture of all the malicious HWP files that are used to deliver the same payload as the one we started with. To find more, a wider net must be cast. To find those additional files, we start with all the HWP documents in the Titanium Platform file lake that are detected as malicious and search among them for files with the same structure as the ones we just identified. From the list of files above, there are two general types of file. The first type is similar to the sample we started with which has a malicious encapsulated PostScript BinData stream. The second type contains an embedded compound file in a BinData stream that in turn contains the PE executable dropper.

PostScript Shellcode Loader

The first general type of malicious HWP document is one that contains an encapsulated PostScript BinData stream. The stream is zlib compressed, so the first step is to extract it. This step is shown in Figure 4.

Extract Encapsulated PostScript Stream

Figure 4: Extract Encapsulated PostScript Stream

Instructions for analysis of these types of HWP files can be found here10 and here11. Once the encapsulated PostScript (EPS) has been extracted, one can see that there is a large chunk of data encoded as hexadecimal ASCII as well as an XOR key. A truncated image of this EPS is shown in Figure 5.

Encapsulated PostScript with XOR Encoded Data

Figure 5: Encapsulated PostScript with XOR Encoded Data

Next, one needs to decode the XOR encoded data. This can be done using Ghostscript as shown in the blog above by replacing the "exec" instruction with a "print" instruction. The resulting output is a shellcode loader as can be seen in Figure 6 with calls to VirtualProtect and ExitProcess highlighted. This image is also truncated.

PostScript Shellcode Loader

Figure 6: PostScript Shellcode Loader

An alternative method for extracting the shellcode loader is to use CyberChef12 to convert the data from hexadecimal representation and then apply the XOR key. This is shown in Figure 7.

CyberChef XOR Decoding

Figure 7: CyberChef XOR Decoding

In addition to this sample, the other13 two14 HWP files analyzed use variants of this same shellcode loader. The latter, in fact, contains the exact same loader and shellcode as the file analyzed. This loader appears to be a template reused in other malicious HWP files that do not deliver the same eventual PE payload as these three. Figure 8 shows the change in variable names from one of these different samples to the variable names used in the sample analyzed above. The string "label" in one sample is "Y" in the other sample.

PostScript Shellcode Loader Template

Figure 8: PostScript Shellcode Loader Template

An ESTsecurity analysis of the sample15 with the string "label" can be read here.16 The other sample 17 that uses this same template also uses the same variable name "Y", but it does not deliver the same payload as the samples analyzed here.


Across these three samples, there are two different shellcodes loaded by the PostScript analyzed above. One of the two shellcodes is very simple with the download URL visible18. This shellcode is seen in the hex editor Hex Fiend19 in Figure 9. An analysis of this file can be found on ESTsecurity's blog20.

Smaller Shellcode with Visible Download URL

Figure 9: Smaller Shellcode with Visible Download URL

The other shellcode is much larger and more complex. It begins with a tiny decoder stub which decodes the rest of the shellcode in place. It additionally contains two URLs that appear to be download URLs, but are in fact decoys that dress the shellcode up to look like others attributed to a different adversary. Details of this subterfuge found in a similar sample can be read in another ESTsecurity blog here21. These decoy URLs can be seen in Figure 10.

Decoy Download URLs

Figure 10: Decoy Download URLs

So that this shellcode is easier to analyze, it needs to be converted to a full PE executable. One method for making this conversion is detailed here22. The output of this conversion is seen in Figure 11.

Conversion of Shellcode to PE Executable

Figure 11: Conversion of Shellcode to PE Executable

The new executable's process of extracting the rest of the malicious code can best be seen in Figure 12.



Figure 12: Video of Stub Extracting Malicious Code

The next stage is downloaded from a URL hosted on hpc[.]kau[.]ac[.]kr.23 This action is shown in Figure 13. The download is performed by URLDownloadToFileA24.

Download Next Stage

Figure 13: Download Next Stage

Once the download is complete, the shellcode executes the file using the WinExec API. This is MITRE ATT&CK technique "Native API" (T1106)25. This API is executed via a jump rather than a call instruction. This can be seen in Figure 14.

Execution via WinExec

Figure 14: Execution via WinExec

In a lab environment, this shellcode is observed executing the default binary downloaded from Inetsim26. This can be seen in Figure 15.

Shellcode Execution of Dummy Binary in Lab Environment

Figure 15: Shellcode Execution of Dummy Binary in Lab Environment


Returning to the set of HighExpert HWP documents, there is a second pattern in addition to the encapsulated PostScript stream with the shellcode loader seen above. This second pattern is a pure dropper. The first stage PE executable is nested inside a second compound file which is located in another BinData stream. Figure 16 shows this malicious stream with the compound file magic number 0xD0CF11E0A1B11AE1 highlighted. The file shown here has the same PE27 embedded as was downloaded from the URL28 shown above.

BinData Stream with Compound File

Figure 16: BinData Stream with Compound File

Going one more layer deep, this compound file is not very complex, but it does contain a path that includes a user directory name in DOS short name format. The full directory name is almost certainly "HighExpert". These bytes are followed immediately by the malicious PE. Both of these are highlighted in Figure 17. A full list of files from the large group of malicious HWP documents that match either of the two patterns of structure, shellcode/downloader and dropper, is provided at the end of the blog.

Path with Username and Embedded PE

Figure 17: Path with Username and Embedded PE

All of the droppers have a similar execution pattern. One example of this pattern from the embedded PE29 shown above can be seen in Figure 18.

Execution Graph

Figure 18: Execution Graph

Cast a Wide Net

With the malicious HWP files enumerated, the next step is to find more droppers, downloaders, and payloads that are potentially related to the ones found during HWP analysis. One avenue of inquiry is to find files that call out to the same infrastructure as the known samples. Another is to look at files that share one or more AV detection names. The results of one of these infrastructure searches in the Titanium Platform is shown in Figure 19 and an example of the results from a search for threat name is shown in Figure 20.

Domain Search

Figure 19: Domain Search

Threat Name Search

Figure 20: Threat Name Search


Once a wide enough net has been cast, the resulting file set must be whittled down to just the files that have similar behavior or structure to the ones analyzed above. First step is to send all the samples to a sandbox to observe the dynamic behavior. Fortunately, this can be done automatically in the Titanium Platform in the Cuckoo Sandbox feature. Some of the files from this large set have a very similar behavior pattern. One example30 of this is shown in Figure 21. The main difference is the filename of the dropped file, sorvices.exe.

Behavioral Analyses

Figure 21: Behavioral Analyses


Behavioral analysis shows which samples are related, but may not be precisely the same malware. Two variants or revisions of a particular malware may exhibit identical or similar behavior patterns. To really create more accurate groupings, one examines the control flow graphs of the PE executables. To make sure that apples are compared to apples and oranges to oranges, all the control flow graphs are generated using radare2 and Cutter32. Focusing on the main function as detected by radare231 or the very first subroutine with adversary code, a set of thumbnail images were generated and the resulting graphs separated into groups based on similar control flow. For the payload files as identified via automated dynamic analysis, there emerge three basic patterns of control flow. These three graphs are shown in Figure 22.

Control Flow Graph Comparison

Figure 22: Control Flow Graph Comparison

Using this same process on the droppers and downloaders, sets of similar control flow are grouped together. A full list of hashes of these groups are provided at the end of the blog.

Programming Errors

During dynamic analysis of the set of droppers, downloaders, and payloads, a specific, repeated programming error is observed within one set of droppers that all share similar control flow and execution graphs. The very first string that the malware decodes is treated like a download URL and is used as a call parameter to the InternetOpenUrlA33 function. This flaw is divided into two groups. In the first group, the string is a command line command. An example of this in one file34 is shown in Figure 23.

Not a URL

Figure 23: Not a URL

The second pattern is a bit better effort, but is still not a complete URL. This other pattern can be seen in Figure 24 in a different file35.

URL Fragment Not a URL

Figure 24: URL Fragment Not a URL

In the Titanium Platform, these same failed API calls can be surfaced easily by navigating to the Cuckoo Sandbox Behavioral Analysis, and then sorting the resulting table by status. Failed API calls will then be visible at the top of the table. The same two failed API calls shown above can be seen in Figures 25 and 26 in the Titanium Platform.

Failed API Call to InternetOpenUrlA

Figure 25: Failed API Call to InternetOpenUrlA

Failed API Call to InternetOpenUrlA

Figure 26: Failed API Call to InternetOpenUrlA

The file above which has a URL fragment rather than a full URL also is observed to drop a payload that is not in the group 1 according to control flow. This file is the odd one out and the only dropper within its group of similar control flow that drops a payload with a different control flow than the rest in the group. All groups and file hashes are provided at the end of the blog.

Evolution of String Obfuscation

Examining control flow is a decent method for differentiating among samples at a macro level, but one must also dig down to the code level to see what other similarities and differences can be found. As a rule of thumb, loops and code that obfuscates or decodes adversary strings are excellent locations to focus on. With this in mind, there are two basic patterns that can be seen in the groups of samples. Looking first at the group of samples that include the payloads from the HWP files analyzed above (control flow pattern 1), a two stage decoding and deobfuscation process is observed. An example of this process being used to decode a C2 hostname in one file from control flow group 1 is shown in Figure 27.    An example of this process being used to decode a C2 hostname in one file36 from control flow group 3 is shown in Figure 27.

C2 Hostname Decoding Process

Figure 27: C2 Hostname Decoding Process

A very similar process is used in payloads from control flow group 2 and 3, but it is a single XOR step applied to chunks data rather than character by character as seen above. Examples of these are shown in Figures 28 and 29. These two examples are from control flow group 2 and 3 respectively. Both are slight modifications of a single theme.

String Decoding Process

Figure 28: String Decoding Process

String Decoding Process

Figure 29: String Decoding Process

The two-step XOR pattern appears in a few different variants among control flow group 3.


According to the September 2020 Microsoft Digital Defense Report regarding nation state threats, "the most frequently targeted sector has been non-governmental organizations (NGOs), such as advocacy groups, human rights organizations, nonprofit organizations, and think tanks focused on public policy, international affairs, or security."37 This may explain why malware families such as the one analyzed above do not garner as much focus in the media today as do ransomware or attacks targeting critical infrastructure. However, these attacks are very important. Hopefully, the processes shown above can assist in enumerating and categorizing samples from this type of attack with the goal of detecting and preventing future attacks.

Campaign Indicators38

HWP Documents (Droppers)


HWP Documents (Downloaders)


Executables (Downloaders)


Executables (Droppers)


Executables (Payloads)


Outgroup Indicators

Executables (Droppers)


Executables (Payloads)


Executables (Payloads)


YARA Rules

rule HighExpert_HWP_HSIProp
author = "Malware Utkonos"
date = "2020-08-24"
description = "HWP summary information property entry in malicious Hangul Word Processor document: Operation High Expert."
reference = "https://blog.alyac.co.kr/2226"
$a = { 1F 00 00 00 0B 00 00 00 48 00 69 00 67 00 68 00 45 00 78 00 70 00 65 00 72 00 74 00 }
uint32(0) == 0xE011CFD0 and uint32(4) == 0xE11AB1A1 and $a

2  https://www.virusbulletin.com/conference/vb2018/abstracts/dokkaebi-documents-korean-and-evil-binary
3  https://malpedia.caad.fkie.fraunhofer.de/details/win.poorweb
4  fa7c09036e545cb4898df21e284d81aded9d1d86e85af899bfb14d16a19b625c
5  ec8cf2570f869c897ca9d898279d10b9c3b137eb4db6b7d68c7f524dc5332af9
6  ea91f1e475ab4eb54971a0e7adbd61d690136abf1b2ab76b94a246515f65b9a3
7  https://cerbero-blog.com/
8  https://blog.alyac.co.kr/2226
9  https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-doc/7dc15eb9-c84d-4eb5-844b-0e78e072214f
10  https://norfolkinfosec.com/how-to-analyzing-a-malicious-hangul-word-processor-document-from-a-dprk-th
11  https://www.fortinet.com/blog/threat-research/debugging-postscript-with-ghostscript
12  https://github.com/gchq/CyberChef
13  aa461e70ab464a503d1e647e693df7ececfff86d497ee0c57c9448302878a05a
14  6f111be4a0cb4f033639f906f512b7feb1632630bec58285c9cd5969ae8a6ff3
15  7c5db78537f3a28b9bcfe8f75e86c36038e5929d2206d59827fca4fb524d41c1
16  https://blog.alyac.co.kr/m/2336
17  2196a88f27c3f813e5b359b9be31ed5122a678bcec447827a98e5f2078b2d666
18  hxxp[://]ub-farm[.]com/admin/tmp/banner.gif
19  https://ridiculousfish.com/hexfiend/
20  https://blog.alyac.co.kr/2281
21  https://blog.alyac.co.kr/2453
22  https://www.hexacorn.com/blog/2015/12/10/converting-shellcode-to-portable-executable-32-and-64-bit/
23  hxxp[://]hpc[.]kau[.]ac[.]kr/rolling_banner/tmp4c5ae3[.]p3a
24  https://docs.microsoft.com/en-us/previous-versions/windows/internet-explorer/ie-developer/platform-apis/
25  https://attack.mitre.org/techniques/T1106/
26  https://www.inetsim.org/
27  4b249546ff2cab9ea49a98a10b200f7ebef76a5de116cdf31af31a045e743bfa
28  hxxp[://]hpc[.]kau[.]ac[.]kr/rolling_banner/tmp4c5ae3[.]p3a
29  73d65cd0b513cadaaa76b559ada28996eb06b68954538fc628e03893f5ff85e8
30  ce5cbdc387a4b988b8ed3caacc4ac2414e80258d2ee9b7889d6064c4cb436a23
31  https://github.com/radareorg/radare2
32  https://github.com/radareorg/cutter
33  https://docs.microsoft.com/en-us/windows/win32/api/wininet/nf-wininet-internetopenurla
34  7510b511093b09fe2bb0e9f7b60b80a40fabde9a6914842e10cf702b51393298
35  dc69b98da87a7b6b683359082b63d1e945cbef17a32d432d11c5f488399460ab
36  90f0582453f49d3b38da03b289d0ffcad4f691ab89f6acb922511e081d472667
37  https://blogs.microsoft.com/on-the-issues/2020/09/29/microsoft-digital-defense-report-cyber-threats/
38  This grouping of indicators is based on the payloads of type 1 and the stages that deliver them.
39  This file is only associated with the campaign through OSINT, not behavioral observation. The two
sources of this association are https://otx.alienvault.com/pulse/5c99ee9aa9e60a68a9b55aec and
40  This file is structurally different from the majority of the others in this grouping.
41  Ibid.

Watch our Webinar: Understanding attacks like Ryuk before it’s too late