| March 2, 2021

Malware in images: When you can’t see 'the whole picture'

Blog Author

Karlo Zanki, Reverse Engineer at ReversingLabs. Read More...

Introduction

Malicious actors often want to get information of interest from targeted computer environments. To achieve this goal, they usually decide to plant some kind of software that will provide that information continuously. Throughout history, the most common way of doing that was to plant an executable file and make it run. Over time, the defensive systems improved and became more successful at detecting such executable implants. In this cat-and-mouse game, both sides try to improve their tools and, as defensive tools get better, malware actors try to find new ways of smuggling malicious software into a system. There are several popular ways of doing this, suchs as embedding malicious code into various document formats or executing malicious code in memory without saving anything on disk. As time passes, security solutions are becoming increasingly more aware of such threats.

ReversingLabs continuously improves its malware-detection capabilities. One of the more novel methods that caught our eye is hiding malware inside image formats like PNG, BMP, GIF or JPEG. Recently, we enhanced our platform support for unpacking these image formats, and listed some of those improvements in one of our previous blog posts.

In this blog, we will demonstrate how these new enhancements can be used to discover novel malware threats and showcase several examples of images with hidden PHP executable content. Most of them try to fetch additional resources from a remote server and use different kinds of obfuscation to hide their malicious intents. As an example of threat hunting via this new functionality, a hidden web shell which led to discovery of a vulnerable web site will also be shown.

Malware hiding in images

Image formats are interesting to malware authors because they are generally considered far less harmful than executable files. Images can be used to deploy malware in combination with a dropper, where the dropper acts as a benign executable which parses malicious content hidden inside of an image.

One area where this technique can be used are web uploads. Many websites enable uploading image content, but improperly filter out executables and scripts. In such cases, malicious code can be packed into an image and uploaded to a web server containing a potential vulnerability which enables execution of its contents. Probably the most familiar type of such payloads are PHP web shells.

Threat actors discover and exploit vulnerabilities in applications used to parse image formats. To remain undetected and avoid attracting the attention of security tools, they typically try to create files which adhere to the image format specification whenever possible.

The simplest way to embed malicious content into an image is to append it to the image end, or, as it’s commonly referred to, the overlay. Malicious actors typically just take a benign image file and append some content. This makes it a well-known method that is quite easy to detect.

For example, in the case of a GIF file, all bytes after the GIF’s trailer byte (0x3B) can be considered an overlay. In the case of a PNG file, everything after the end of the IEND chunk can be considered an overlay. This is conceptually the same as appending content to any other regular file format, so we won’t go into more details about overlays in this blog post.

Another interesting place to look for malware when analyzing image samples are the EXIF tags. These tags are metadata fields used to store additional descriptive data about the image, like the model of the camera used to take the picture, the date and time when the picture was taken, or even the geolocation of the place where the image was taken. This data is part of the image format, but it isn’t required for the image's visual interpretation and some tools used to view the images opt to not present all of these tags to the users, which makes them a great hiding place.

Malware in PHP EXIF tags

Titanium Platform uses a proprietary parser to extract these metadata fields and makes it possible to search for images based on their EXIF metadata content. Therefore, an interesting starting point is searching for images containing PHP code in their EXIF tags. Even such a simple search query provides a few worthy results.

Using A1000’s advanced search to find sample with PHP code in exif tags

The first one in the list is a file with the b497e231d19934c5d96853985bdbc147589a9a77 SHA1 hash. Analyzing its Artist and ImageDescription EXIF tags reveals PHP code getting content from a URL. Even though this isn’t necessarily malicious behaviour as it can have some legitimate uses, it is also quite possible that such PHP code could be used to fetch malicious content from a C2 server.

EXIF data from b497e231d19934c5d96853985bdbc147589a9a77 sample

The second one in the list is a file with the 518178bdd959ca17eca15777d38499bc9f3d95ad SHA1 hash. It has quite a large code snippet in its Copyright tag. At the beginning of the snippet are some manipulations of PHP comments. It seems that the code is trying to conceal the fact that it is a PHP script by commenting out the PHP opening tag. Regular PHP parsers would ignore the comment opening before the tag. However, some tools might have a different implementation of the PHP parsing algorithm.

EXIF data from 518178bdd959ca17eca15777d38499bc9f3d95ad sample

A detailed look at the code reveals that it tries to open a socket to a specific IP address and port, then uses that socket to fetch a stream of data and execute it with the eval() function afterwards. Even though the IP address is from the private IP range, it is hard to imagine a legitimate reason for embedding this kind of code into an EXIF tag of an image. Such a sample could be used for lateral movement by a malicious actor within the private network after getting the initial foothold.

The third example is a file with the 1c308589a493469416df53acaa75a7fd4aed7e65 SHA1 hash. The only EXIF metadata it has is a Copyright tag. It is obvious that this is a specifically chosen sequence of bytes. A bit of googling provides a quick answer: this PHP code was used in the past to check if a server is vulnerable to file inclusion attacks. Mainly on sites using Content Management Systems like Joomla or Wordpress.

EXIF data from 1c308589a493469416df53acaa75a7fd4aed7e65 sample

While this PHP code on its own is detected by the majority of security tools, hiding it inside of an image drastically reduces the detection rate. It is understandable why detection rate was low 10 years ago when this code was first spotted in the wild, but the problem is that the detection rate of this type of code smuggling hasn’t significantly improved over the years.

How a packed web shell led to a vulnerable website

Previous samples were found using the Titanium Platform’s advanced search engine, but another way of finding interesting files is by using the ReversingLabs YARA Retrohunt feature.

rule image_eval_hunt
{
   strings:
      $png = {89 50 4E 47}
      $jpeg = {FF D8 FF}
      $gif = "GIF"
      $eval = "eval("
   condition:
      (($png at 0) or ($jpeg at 0) or ($gif at 0)) and $eval
}

YARA rule for hunting samples with eval call

The provided YARA rule is trying to match samples starting with some of the magic byte sequences characteristic for image formats and also have the string “eval (” within, meaning they potentially have a call to an eval function somewhere in the image content which isn’t expected in multimedia files. TitaniumCloud YARA Retrohunt provides quite a few samples, and after analyzing the results, two interesting ones emerge. Both of these samples have PHP code in a regular image segment.

The first one is a file with the e3a64475e1272f34fe8a9043b486d60595460aa2 SHA1 hash.

ReversingLabs A1000 - Sample summary

It is visible from the summary that this is a JPEG image. The summary also shows that Titanium Platform detected and extracted an additional file from it. Quick examination of the extracted files shows that it is recognized as a Text/PHP file, and by using the A1000’s Preview Sample feature its content is shown. This simple PHP script first decodes a base64 encoded string and then calls the eval function on the decoded content.

Content preview of the segment extracted from the image sample

The base64 encoded string can be decoded with a handy tool called CyberChef. This operation leads to more obfuscated PHP code which can be seen in the following image.

Result of the first layer base64 string decoding

Code above performs self-deobfuscation and results in yet another layer of obfuscated code. This obfuscation method includes inserting ‘I’ character at random places in some of the string literals, and inserting of ‘an’ character sequence to hide create_function string.

The simplest way to deobfuscate such PHP code is to copy/paste it to a PHP sandbox and replace the last line of code with an echo on the $H variable. This will print out the deobfuscated code.

Result of deobfuscating the second layer code

This is the last layer. It takes raw data after the HTTP-headers of the HTTP-request and tries to find content delimited by values specified by the $kh and $kf variables. When the regular expression gets matched, it takes the content between the delimiters, decodes it via base64, and passes it as an argument to function x that performs simple XOR decryption on it. The output of all these operations is a compressed stream which is decompressed and then executed by the eval function.

Beside the information on the functionality of the code embedded within the image, Titanium Platform also provides a way to find the origin of a sample. Looking at the sources from which this sample was acquired, an interesting URL reveals itself.

ReversingLabs A1000 - Source of the sample

This sample can be found in the wild on a live web location behinburg.com. This address hosts a legitimate-looking Iranian travel agency’s web-site. The URL path contains an interesting directory structure with “uploads” that have an unrestricted access to content uploaded by the users. The website also doesn’t try to restrict unauthorized users from exploring the directory structure. The contents of the directory where this image was located included 35 other files uploaded between the 11th and 12th of December 2020. They all contained some kind of a web shell and were obviously used in an attempt to compromise this server.

Directory listing

Using our telemetry, we weren’t able to conclude if the attacker attempt was successful. However, in this case the attacker didn’t need to get any additional privileges to get sensitive data. In the same upload directory, besides the files uploaded by the attacker, a lot of images containing passport scans could be found. This travel agency enables its users to apply for Iranian visas using their web page. In order to apply, the users need to upload a passport scan through the webform.

Part of the visa application form

The uploaded images appear to be kept on the server for an indefinite time. It is a very poor security practice to keep unprotected and unencrypted files in a publicly accessible web directory. Users are recommended to consider other options before uploading scans of their personal documents to any web page. There are many similar web sites that fail to follow the best security practices when it comes to handling personal information.

When small PHP code brings in his big friend

The last sample we will look into is a JPEG image with an embedded PHP script in one of its regular segments. This keeps it in line with the JPEG format specification. Titanium Platform can easily detect and extract such embedded malicious content.

Sample summary

Looking at the preview of the extracted image segment shows that this is yet another obfuscated PHP script. This time the obfuscation method is creating a URL string by calling the chr function on integer values representing ASCII codes. The output characters are then concatenated to form the resulting URL.

ReversingLabs A1000 - Preview of the segment content

The PHP comments between the $password and $c variable assignment are encoded in ISO 2022 Simplified Chinese, giving us a clue about the possible origins of this malicious script.

The entire PHP code, with deobfuscated constants in comments, can be seen in the following image.

Script contents

The deobfuscated URL “http://i.niupic.com/images/2017/05/21/v1QR1M.gif” was accessible at the time of writing. It hosted a file with the 370788d26150bba413082979e26da4cd6828a752 SHA1 hash.

This is a compressed Gzip stream containing a PHP webshell. It is 145KB in size with almost 3,000 lines of PHP code, comprising functionalities that include privilege escalation, operation on the SQL database, file download, port scanning and a few others.

Most of the string literals in the messages displayed by the webshell are encoded in the already mentioned Chinese character set.

Googling for intelligence on the specific strings shows that some Chinese sources call this type of webshell PHP Malaysia backdoor, with one similar sample found in this github repository.

Conclusion

Image formats can be as dangerous as executables, and Titanium Platform is a reliable partner that can quickly detect such embedded threats. Even though in most cases images are used as a non-executable container for the malware, there are instances where images can trigger execution if placed in an unexpected, misconfigured place. For example, the described PHP web shells placed on a vulnerable server.

This is why every piece of content entering a business network must be analyzed and checked for malicious content, regardless of the file format. Malware authors and threat actors will always look for blind spots where they can bypass defenses. Having detection gaps can lead to severe business operation interruption and cause brand damage.

ReversingLabs makes continuous improvements to its products in order to keep on track with never-sleeping malware authors. While some security solutions might help you detect if a non-executable file contains something that might be considered malicious, Titanium Platform provides you with additional information which helps you to understand how, where and why content is characterized as malicious. Its inspection capabilities give you the ability to analyze and collect metadata from over 400 file formats. To detect malware before it becomes a problem.

IOC list

The following list contains SHA1 hashes of the samples mentioned in this blog post.

b497e231d19934c5d96853985bdbc147589a9a77
518178bdd959ca17eca15777d38499bc9f3d95ad
1c308589a493469416df53acaa75a7fd4aed7e65
e3a64475e1272f34fe8a9043b486d60595460aa2
9b7284f89af7174a1d3ba91330f67c08a0054c60
370788d26150bba413082979e26da4cd6828a752

Read Other Research Blogs by Karlo:

Get up to speed on key trends and learn expert insights with The State of Software Supply Chain Security 2024. Plus: Explore RL Spectra Assure for software supply chain security.

Software Supply Chain Security

Automate SOC Support

Optimize Threat Hunting

Product & Technology

Partners

Alliances

Resources

Company

Events

Press

RL Blog

Malware in images: When you can’t see 'the whole picture'

Introduction

Malware hiding in images

Malware in PHP EXIF tags

How a packed web shell led to a vulnerable website

When small PHP code brings in his big friend

Conclusion

IOC list

Read Other Research Blogs by Karlo:

Keep learning

More Blog Posts

Announcing the General Availability of TitaniumScale v5.0: Enhancing File Analysis for Advanced Threat Detection

How NIST CSF 2.0 and C-SCRM help manage software supply chain risk

What’s hot at RSAC 2024: 7 must-see talks for security operations teams

Topics

Special Reports

Latest Blog Posts

RL Blog

Malware in images: When you can’t see 'the whole picture'

Introduction

Malware hiding in images

Malware in PHP EXIF tags

How a packed web shell led to a vulnerable website

When small PHP code brings in his big friend

Conclusion

IOC list

Read Other Research Blogs by Karlo:

Keep learning

More Blog Posts

Announcing the General Availability of TitaniumScale v5.0: Enhancing File Analysis for Advanced Threat Detection

How NIST CSF 2.0 and C-SCRM help manage software supply chain risk

What’s hot at RSAC 2024: 7 must-see talks for security operations teams

Topics

Follow us

Subscribe

Special Reports

Latest Blog Posts