|

Writing detailed YARA rules for malware detection

Laura Dabelić
Blog Author

Laura Dabelić, Threat analyst at ReversingLabs. Read More...

YARA Rules blog cover 2

New malware appears or evolves daily, so updating tools like YARA rules for detection is critical. Here's how my research team develops YARA rules.

The purpose of YARA rules is to improve our methods of malware detection. New malware families appear and evolve every day, so it is important to provide our clients with tools to protect themselves. This is why ReversingLabs' threat research team continually writes YARA rules, to deliver an open-source, working tool that detects the latest malware families.

The rules also must be as precise and verbose as possible to prevent the appearance of false positives. The creation of high quality YARA rules allows our clients to keep their defenses up to date, giving them the best chance at preventing security incidents.

Since the threat landscape is constantly changing, the research team at ReversingLabs continuously updates the company's public YARA rules repository on GitHub with new and actual threats. This blog post describes the process of how we write our high-quality YARA rules. Here's an example of writing detailed YARA rules, demonstrated by the YARA rule for the GwisinLocker ransomware.

Choose the target

Writing high-quality YARA rules is a time consuming process, which means that our team must choose their battles. There are many criteria for choosing a malware family, and usually samples which will be chosen for analysis are the ones which are known to have a big impact, and are highly popular in the threat landscape, such as:

  • New ransomware which targets big companies and businesses
  • Destructive wipers used as means of cyber warfare 
  • Spyware and backdoors used by the various APT groups

Most malware in the modern threat landscape is packed with custom or off-the-shelf packers, to make analysis and signature matching harder. This is why our team checks if the samples are packed before they start writing the YARA rule. YARA rules should match malicious code, not the packing layer, and we write them with the second, unpacked, layer in mind. This additionally makes them suitable to be deployed on dynamic analysis solutions, for runtime inspection. ReversingLabs TitaniumCore can automatically unpack more than 250 executable packer formats. 

When malware is packed by unidentified custom packers, the unpacking must be done manually. This typically involves using a debugger to analyze the packer layer, identify where execution is passed to the second layer, and extracting the payload. One common technique that packers use to execute the packed code is process injection. Process injection comes in several variants which include self-injection, PE injection, and process hollowing. All of the aforementioned variants can be recognized by the typical pattern of API calls which must occur during the unpacking. 

In a nutshell, the process into which the packer is injecting the payload needs to be created or opened (using CreateProcess or OpenProcess APIs). Additional memory in the process might then be allocated with VirtualAllocEx, and is populated with the payload by using WriteProcessMemory. Other APIs might also be invoked, among which are: 

  • VirtualProtectEx
  • ReadProcessMemory
  • CreateRemoteThread
  • ResumeThread
  • NtResumeThread

The execution of the malicious payload is then resumed instead of the original process’s contents. The malicious payload can be obtained in several ways from memory, and it’s important to dump the payload in an executable format for later analysis.

To make sure that we don't duplicate effort, every unpacked sample is matched against our entire YARA signature collection, to see which, if any, patterns are matched. This enables us to easily track novel malware, as well as new malware versions.

Do detailed, in-depth analysis

Every malware family has its own characteristics and set of behaviors. The way these are implemented in the code differs from one malware family to another. However, the behavior of malware types (like ransomware or backdoors, among others) can usually be described by a set of common actions that all malware families of a certain type share. For ransomware like GwisinLocker, the behaviors we are interested in are:

  • Finding the files
  • Encrypting files
  • Dropping the ransom note
  • Establishing a remote connection with the C2 server
  • Decrypting the malware configuration

One of the more interesting behaviors we found in GwisinLocker is the shutting down of the VMWare ESXi machines before the encryption. The part of code which implements this behavior can be seen in the picture below. The constants that can be seen in the picture are the stack strings which are used as a method of obfuscation. They represent the following command:

esxcli vm process kill --type=force --world-id="[ESXi] Shutting down - %s"

Stack strings are a method of obfuscation in which the string is built on the stack one (or few) character(s) at the time. The purpose of this technique is to confuse the reverser and make the reversing process slower. We will use this part of the code to create a behavior-focused pattern. The hardcoded stack strings are a good choice for a byte pattern because they make the pattern more unique and specific. By extension, this reduces the probability of catching false positives once the YARA rule is deployed. The created pattern can be seen in the picture below.

This small rule which represents the behavior-focused pattern is evaluated against samples in our cloud, to identify other potentially interesting samples with similar behavior, which might have been missed during initial sample collection stages. The results should be analyzed to see how similar (or different) the matched samples are. The possible conclusions derived from this step are:

  • The samples are very similar. This means that we are on the right track and that they probably belong to the same malware family
  • The samples are notably different. This means that the code pattern is not unique to the malware being analyzed, or it might be a part of a common library which is reused among different malware. Either way,  the pattern needs to be expanded with more specific data, or supplemented with other parts of the code which are more unique to this malware family.

YARA rule structure matters

Every rule consists of the "meta" section, the "strings" section, and the "condition" section. They are described in detail below.

The meat of the 'meta' section

Every rule needs to have a "metadata" section, which is divided in two parts:

CCCS YARA metadata

We've decided to conform to the publicly available CCCS YARA validator. The specification requires several fields to be present, among which the most important are “sharing” and “malware.” The "sharing" field describes the sharing limitations of the YARA rule. The value "TLP:WHITE" means that the YARA rule can be freely distributed. The "malware" field contains the information about the category of the samples that YARA rule detects. Our YARA rule detects the samples which belong to the "MALWARE" category, and have their family name.

author

Always set to "ReversingLabs"

source

Always set to "ReversingLabs"

status

Always set to "RELEASED"

sharing

Always set to "TLP:WHITE"

category

Always set to "MALWARE"

malware

Malware family name, in uppercase in the form MALWAREFAMILYNAME

description

Always needs to begin with "YARA rule that detects...", only the malware family name and malware family type are changed

 

If you're interested in the more detailed explanations of the fields, you can check out the CCCS YARA standard configuration page, and see how they’re used in our public YARA rules.

TitaniumCore-specific YARA metadata

ReversingLabs’ YARA rules are one of TitaniumCore’s classification methods, and they supplement more complex classifiers for added protection. In order for TitaniumCore to correctly classify files using YARA rules, additional metadata  must be present. The required metadata has the following structure:

tc_detection_type

MalwareFamilyType from the rule name

tc_detection_name

MalwareFamilyName from the rule name

tc_detection_factor

Usually set to 5, but often depends on the threat type

 

The example of the "meta" field for the GwisinLocker ransomware can be seen in the following image:

Another example can be seen in the YARA rule for the HermeticWiper malware which was covered in one of our previous From the Labs blog posts.

The "strings" section

As analysts, our team commonly needs to update each other’s rules, and must be thoughtful of how fast they are evaluated, given the millions of files TitaniumScale processes daily. There are some good practices which should be followed to increase the readability and speed of the YARA rule evaluation:

  • Standardize the indentation and be consistent with it. For example, if you use one tab for the indentation, make sure it applies in all your rules.
  • Break the longer patterns into more, sequentially named subpatterns (e.g. $encrypt_files_p1, $encrypt_files_p2, ...)
  • The pattern shouldn't start or end with the optional, masked bytes (question marks).
  • Use patterns with longer sequences of exact (non-optional) bytes, as they serve as anchors

The example of correctly written and split patterns is the kill_processes pattern from the GwisinLocker YARA rule, which can be seen in the following image:

The "condition" section

The rules are evaluated on PE and ELF files, so the "magic" bytes at the beginning of each file need to be checked:

  • uint16(0) == 0x5A4D - The "MZ" header for the PE files
  • uint32(0) == 0x464C457F - The ".ELF" header for the ELF files

When writing conditions, the team uses a whitespace-heavy style, to keep the rules consistent and readable. Additionally, we split the blocks by logical operators, to make it visually easy to see how the patterns are grouped. This organization makes it easy to troubleshoot and fix signatures as new versions appear, without compromising the logical validity of the condition.

The example of the GwisinLocker condition can be seen in the picture below. The first group of conditions covers the 32-bit version of the ransomware, while the second group covers the 64-bit version.

YARA rules: A continuous process

Threat actors keep developing the malware in their arsenal, and the ReversingLabs research team continuously monitors the threat landscape for new versions that our existing YARA rules do not cover. When a new version is discovered, the process outilined in this post is repeated. The YARA rule is then updated to keep pace with the new threats in the never ending cat-and-mouse game known as malware analysis.