2010
04.29

File analysis and unpacking in the age of 40M new samples per year

With daily unique malware counts exceeding 100,000 pressure is exerted at sample analysis and automated unpacking systems. Known 400+ packer families and custom packers can be mixed together in layers and in parallel. Today's system has to be able to handle all known format schemas statically and dynamically while anticipating increases in complexity.

We will discuss the creation of a complex file identity model which layers out the entire binary object. This then enables utilization of a correct unpacking and analysis model for each of the identified segments. Object segmenting is done to cover all aspects of the binary object including the multiple packing layers, resources, sections and overlay. Identification methods will cover traditional file identification with special attention to methods used to fool detection tools as well as generic detection methods. We will describe creation and performance of a complex system handling identification and unpacking of large quantities of files, and contrast it against methods in use today. Static, dynamic and generic file unpacking models will be described showing their benefits and flaws in all viable black and white listing scenarios. Utilization of those binary content processors for each identified segment will be queried for performance and scalability.

See you in Helsinki...

VN:F [1.9.17_1161]
Rating: +3 (from 3 votes)
Share
2010
04.28

As you remember few weeks ago ReversingLabs presented its NyxEngine to the World on BlackHat Europe security conference. Today the conference has published the presentation videos which can be found here, and here is a direct link to our talk video recording. Enjoy...

VN:F [1.9.17_1161]
Rating: +4 (from 4 votes)
Share
2010
04.26

Two weeks ago we introduced our NyxEngine to the World and we got nothing but positive comments and responses about it. That is why for today's blog we have decided make it do something its not primarily designed to do. With that in mind we decided to create a simple program based on the NyxEngine which does archive conversion from one file type to another. For the purpose of this blog we designed the program called gzip2zip which as its name implies converts GZIP archives to ZIP ones without any sort of decompression/compression procedure involved. And this is possible only do to the fact that both ZIP and GZIP use DEFLATE compression algorithm which is why no data manipulation other than moving is necessary. In order to do a quick conversion we need to perform the following steps:

  • Read GZIP data (file name, packed content & size and unpacked content size & CRC)
  • Recreate ZIP header data in memory (recreate local and central directories)
  • Write data to disk

This is quite a short and simple checklist which is why creation of such program is a relatively simple task. Reverse process is also possible and just as easy to create but since the ZIP file format is more popular we decided to stop at one way conversion. Until next week....

NyxEngine

ReversingLabs Corporation
GZip2Zip
(package contains the tool with source and the samples used

VN:F [1.9.17_1161]
Rating: +4 (from 4 votes)
Share
2010
04.19

Photo by Costin Raiu

We had a great time during this year's BlackHat Europe Conference last week.  Now it is the time to sort out our impressions.  First of all, thanks to all that have made it to our talk and have been asking us in hallways about the new engine that we were working on. In a packed full room we have discussed archive steganography and the impacts such and other malformed files have on security products. These two aspects of file tampering overlap and we have shown how steganography implementations can break archive processors thus causing vulnerabilities in file processing. However it is the vulnerability aspect of our presentation that got the most press.  It has been covered here by CNET and here by PCWorld, and it has also been blogged about by ESET here and by one of the conference attendees here.  In addition to all the media and web mentions we have published presentation, white-paper and NyxEngine all which can be found here.  But to give you the whole picture about the presentation we will talk about it and our findings in short.

Our research has been focused around the impact that file malformations have on archive processors. We wanted to see if data could be hidden inside the archive in such a fashion that data itself is invisible to the user no matter which archive processing program is used. Starting from the most basic of methods of string obfuscation to most complex file malformation our research lead us to conclusion that there are multiple ways of achieving our goal. Steganography was not only possible,  it was present all along in the "wild". We have found two existing solutions that successfully implemented file hiding in ZIP archives. To gauge how prevalent is this in the wild we have turned to AccessData, the pioneer in digital investigation software, and its COO Brian Karney.  His answer was that to the best of his knowledge no one is really looking for this kind of steganographic hidden content and is thus not finding any evidence. This answer didn't surprise us since ti is quite a novel technique. Historically, the most common use of steganography was hiding messages in multimedia files.  However amount of data which can be hidden in such fashion is commonly limited by the size and type of file in which the data was hidden. That's not a limitation when hiding data in archives. In such case there are no limitations to the size and type of steganographic content.

Steganography in archive formats, which in itself interesting, had some serious implications. During the course of our research we found that modifications that we do to the archive in order to hide data is interfering with some security applications.  It made them skip or totally stop scanning the archive content.  Each security product was differently impacted.

If we observe this kind of detection evasion from the standpoint of gateway scanner, the impact on scanning your email and incoming traffic would be high as it interferes with the basic software functionality.  But if we observe this from a desktop or an endpoint scanner perspective, the impact is low since the potentially malicious payload is detected right after extraction.  One would argue that even a desktop solution must be able to scan the packed content it supports.  But not everyone agrees on this point, as protection ability of an end point scanning product would not be lessened.

Regardless of how you look at it, there is a very low threat to protected endpoints. But the general rule for security software is that it does not want to have any potentially harmful files resident anywhere on the protected endpoints.  Such detection evasions must be handled.

During our presentation we have shown the possibility of steganography data hiding and the interference that it can have with anti-malware software.  All vulnerabilities that we have found were disclosed to affected vendors in cooperation with CERT-FI and all issues were patched before the public disclosure. Thanks to CERT-FI for their hard work and support.  Further research in this area by ReversingLabs will  have the same disclosure dynamic. ReversingLabs is proud to contribute to the overall endpoint security and we will continue to further our research in the same direction.

VN:F [1.9.17_1161]
Rating: +5 (from 5 votes)
Share
2010
04.12

Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. When it comes to digital steganography no stone should be left unturned in the search for viable hidden data. Although digital steganography is commonly used to hide data inside multimedia files, a similar approach can be used to hide data in archives as well. Steganography imposes the following data hiding rule: Data must be hidden in such a fashion that the user has no clue about the hidden message or file's existence. This can be achieved by either hiding existing packed content from all programs designed to unpack the selected file format, or adding new data to existing compressed files, so that the file's usability is unchanged. To discover this hidden information we must go into deep analysis of systems that have developed their own archive processors and see the implications of format specifications being interpreted differently across such solutions.

We have designed NyxEngine to ensure that no byte is left unchecked in the search for interesting archive data. Furthermore Nyx performs detailed data inspection by which it identifies possible vulnerabilities and corruptions in the binary content of archives. By integrating the NyxEngine as the top layer in archive processing, we can successfully detect and prevent all known and future vulnerability attack vectors against archive processors, thus effectively eliminating the possibility of archive bombs and other exploits. In addition to shielding against exploits, Nyx also searches for viable hidden data that was intentionally cloaked from sight using steganographic principles. And since the engine does detailed data inspection, it can correct vulnerabilities and recover files, making it a perfect archive preprocessor.

Nyx engine’s exploit shield functionality checks the following archive areas: stored file name length and content, compression ratio, extract algorithm requirements, checksum tampering, multi-disk tampering, file entry duplication and other miscellaneous header data checks. Serving as a common denominator among all known archive processing solutions, Nyx classifies each instance of tampering in a functional group as vulnerabilities that affects that group.

By performing detailed checks and on-the-fly corrections, the maximum possible archive data is recovered and identified. This is the best way to find files that are present in the archive, but unreported in the archive header and to extract every possible bit from the archive. This method this works not only with unreported files, but with any kind of binary data present in the archive which isn’t assigned to any of the file content.

The detailed file analysis provided by Nyx makes it possible to recover the maximum amount of damaged, corrupt and invalid data.

http://www.youtube.com/watch?v=Zzf88TljU3I

Introducing NyxEngine

http://www.youtube.com/watch?v=v-3UgPZc-UE

Recovering steganography content with NyxEngine
VN:F [1.9.17_1161]
Rating: +9 (from 11 votes)
Share
2010
04.05

TitanEngine is primarily envisioned as a portable executable file format unpacker and handling framework. However due to its static unpacking functions it can be used to unpack other file format types such as installers and archives. That is why today we are showing the utilization of the new static unpacking functions that will be available with the next update. We are discussing the upcoming features which is something we generally like to avoid but it is for a good reason. It is only because of the unveiling of the new SDK we have secretly been working on during these last few months that we are even touching the archive unpacking subject. What is it and what does it do will be talked about on BlackHat Europe next week. Until then we will tickle your imagination with an unusual blog about unpacking archives with TitanEngine.

Format we have selected is a simple a Debian archive file format called Deb. Debian packages (DEB files) are standard Unix ar archives that include two gzipped, bzipped or lzmaed tar archives: one that holds the control information and another that contains the data. These two files present in the archive are not compressed but instead they are just stored inside the binary package. Each stored item has its own header, which is defined like this:

typedef struct DEB_HEADER{
    char FileName[16];
    char FileTime[12];
    char Reserved0[6];
    char Reserved1[6];
    char Mode[8];
    char ItemSize[10];
    char TerminateQuote;
    char TerminateNewLine;
}DEB_HEADER, *PDEB_HEADER;

Preceding the first header which is used to describe the archive is the magic string "!<arch>\n" which is used to identify the binary package type. Therefore unpacking the DEB archive format is essentially reading the archive header and copying the binary content that follows it to the selected folder. Header for each binary content contains the file name and time information which can be used during the unpacking process to restore the packed item to its pre-packing state. Because this file format doesn't employ any compression by itself unpacking the DEB format only refers to extraction of the stored binary content. That content is additionally packed but with a different file format which commonly uses compression to reduce the size of the packed file on disk.

This is just one of the many uses for TitanEngine outside the area of unpacking and processing portable executable file format. As we have seen unpacking archives with TitanEngine is quite possible as long as there is no compression or content decompression is supported by the engine. Keep an eye out for our blog next week when we unveil our super secret project. Until then...

TitanEngine

ReversingLabs Corporation

DEB unpacker
(package contains source code, binary unpacker and a sample archive)

VN:F [1.9.17_1161]
Rating: +5 (from 5 votes)
Share
2010
03.29

In the last couple of years we have seen a drastic increase in numbers of malicious sample we see a day. These numbers are quickly closing to 40M samples a year mark that we expect to see hit this year. That is why the sheer volume of data we are bombarded with each day raises an important question, where is the relevant data in this sea of information?  And even is all data we have relevant?

Prioritization is the main way of extracting relevant data with the techniques and methods used to highlight interesting information varying from one antivirus company to another. However we can think differently in order to sort this information. We can think in reverse asking ourselves which of this data isn’t interesting. With that question in mind we developed a system to exclude damaged, invalid and broken files from our sample bases. In depth file analysis tell us exactly which files have zero chance of execution on any system flagging them as crapware. But is everything broken to that extent?

If you remember recently we gave you a good idea what to do with broken files and how to implement TitanEngine statical validity analysis to identify and fix broken files. For this purpose we will update the TitanEngine Nexus plugin to automatically identify and fix broken files. This will extend this plugin functionality from creating missing dynamic link library dependencies to fixing every aspect of the broken inputted file. And since the plugin will work automatically it needs to be compatible with all existing unpackers. To achieve this we must recognize the basic dynamic unpacker model which looks like this:

As we can see from this flow chart all dynamic unpackers share a certain logic model. Perfect place for Nexus to handle the inputted file is at the start of unpacking process which is achieved by hooking TitanEngine's function IsPE32FileValidExW. This function is called before the unpacking process starts by all unpackers and if it estimates the file as invalid or broken unpacking is aborted. So what does our hook need to do? List of steps to do would be:

  • Perform statical validity analysis by calling IsPE32FileValid
  • Determine if file is valid or not and if it isn't do the following
    • Create a backup for inputted file
    • Perform statical file fixing by calling FixBrokenPE32FileEx
    • Validate the file fixing success
    • Return TRUE

But this is just the first step because in order to fix the file the TitanEngine can temporarily disable certain fields by removing them from PE header. To revert these changes we must add another hook to revert these changes. Since we are improving Nexus to automatically correct broken files for dynamic unpackers the function to hook is easily recognized as DumpProcessW. This function is called at the start of the unpacking process finalization, just before the necessary data is exported to file on the disk. That makes this function a perfect place to revert the changes to temporarily disabled PE fields. To do this we just need to call FixBrokenPE32FileEx again with the saved FILE_FIX_INFO structure.

By implementing these changes to TitanEngine's Nexus plugin we convert it to all purpose dynamic unpacker helper module because with its help we can unpack broken files and files that are missing their dependencies. And all this done with no modification to the source code of any unpacker we made in the past. As a demonstration of the plugin capabilities we have attached it and a broken UPX sample file with this blog. Until next week...

TitanEngine

ReversingLabs Corporation

Nexus plugin
(package contains Nexus plugin, UPX unpacker and a broken sample file)

VN:F [1.9.17_1161]
Rating: +4 (from 4 votes)
Share
2010
03.29

In addition to TitanEngine course in Montreal on Recon there is another course that will be teaching you how to use the TitanEngine. So, if you are in Vegas for BlackHat you might want to check out Advanced Malware Deobfuscation training by Jason Geffner & Scott Lambert. Here is the course description:

Advanced Malware Deobfuscationby Jason Geffner & Scott Lambert

Overview:

Security researchers are facing a growing problem in the complexity of malicious executables. With an ever-increasing number of tools that malware authors use to compress and obfuscate executables, and the pressing urgency that analysts often face, it is vital for analysts to know the best methods to remove protections that they have never seen before.

Unpacking is the process of removing the compression and obfuscation applied by a “packer” (or “protector”) to a compiled and linked binary. This class will focus on teaching attendees the steps required to effectively deal with both known and previously unknown packing techniques.

This is a hands-on course. Attendees will work on real-world malware through a series of lab exercises designed to build their expertise in thwarting anti-debugging and anti-disassembling techniques.

Day One:

The first day will focus on understanding the problems presented by obfuscated malware and the steps required to effectively return the malware to an analyzable state. You will begin the day by learning the fundamentals of the Portable Executable (PE) file format. Then, through a series of lab exercises you will learn reliable methods for finding the Original Entry Point. With this knowledge in-hand, you will write software to construct a valid PE file on disk from the memory of a running process. You will complete this exercise by reconstructing the Import Table, effectively returning the executable to its pre-obfuscated state. With this virgin executable, you will apply static analysis techniques to determine the malware’s malicious capabilities.

The day will include a series of lab exercises focused on defeating anti-debugging tricks such as hardware/software breakpoint detection, generic/specific debugger detection, unpacker stub detection, Thread Local Storage callback functions, and more.

  • PE File Format Essentials
  • Fundamentals of Win32 Debugging
  • Methods for Finding the Original Entry Point
  • Manual and Assisted Import Table Reconstruction
  • Overcoming Anti-Debugging Tricks
  • User-Mode and Kernel-Mode Hooking and Code-Splicing

Day Two:

The second day will focus on how to unpack a heavily armored malware sample. You will learn about the concept of protected processes and how to decouple parent/child processes. Next, you will learn how API redirection utilizes stolen bytes. Then, you will master everything there is to know about Structured Exception Handling injection and redirection. Lastly, you will learn how chunked packing works, how to recognize it, and how to defeat it.

The day will end in a contest in which attendees will pit their wits against one another to analyze a heavily armored executable.

  • Protected Processes
  • Exception Injection and Redirection
  • API Redirection
  • Chunked Packing
  • Utilizing TitanEngine from ReversingLabs as an Unpacking Framework

Who Should Attend:

This class is for skilled security analysts who wish to learn how to remove binary obfuscation from malware for analysis purposes. It is expected that attendees have a firm understanding of x86 assembly language and the Microsoft Windows API. Reverse engineering experience is desired, though not required.

What do i get:

  • Hard copies of lecture slides and lab exercises.
  • A CD containing links to all tools and reference materials used throughout the course.
  • Solutions and written walkthroughs for all lab exercises.

Course Length:

Two days. All course materials, lunch and two coffee breaks will be provided. A Certificate of Completion will be offered. You must provide your own laptop.

Software Requirements:

Attendees must bring their own laptop with a 32-bit version of Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008 or Windows 7 installed inside of a virtual machine (such as Microsoft® Virtual PC 2007 or VMware Workstation). Prior to the first day of the course, attendees are expected to have the following software installed in a virtual machine:

Trainers:

Jason Geffner joined Next Generation Security Software Ltd. in June of 2007 as a Principal Security Consultant. Jason focuses on performing security reviews of source code and designs, reverse engineering software protection methods and DRM protection methods, penetration testing web applications and network infrastructures, and developing automated security analysis tools.

Prior to joining NGS, Jason spent three years as a Reverse Engineer on Microsoft Corporation's Anti-Malware Team, where his work involved analyzing malware samples, deobfuscating binaries, and writing tools for analysis and automation. Jason was the Security Research & Response owner of the Windows Malicious Software Removal Tool (MSRT). He chose which new malware families for the MSRT to detect and clean each month based on his analysis of the telemetry and trends of the underground malware community. Jason authored tens of thousands of malware signatures and dozens of malware analyses based on static and dynamic analyses of obfuscated binaries. His work on the MSRT helped hundreds of millions of Windows users each month keep their computers safe and secure. While at Microsoft, Jason was recognized for his reverse engineering skills and for his efforts to drive awareness of reverse engineering practices throughout the company by being given the formal job title "Reverse Engineer"; Jason was the only Microsoft employee with this title.

Jason holds several patents in the fields of reverse engineering and network security. He is a Program Committee member of the Reverse Engineering Conference (REcon) and of the International Conference on Malicious and Unwanted Software, is a regular trainer at Black Hat and other industry conferences, is often credited in industry talks and publications, and has been actively reverse engineering and analyzing software protection methods since 1995.

Scott Lambert is a senior Security Researcher on the Microsoft Malware Protection Center (MMPC) team. Much of Scott's current research centers around binary reverse engineering frameworks that leverage a combination of both static and dynamic binary instrumentation, taint analysis and SMT solvers to aid in vulnerability analysis and signature development. In his spare time he supports the Microsoft Vulnerability Research (MSVR) program by developing proof of concept code execution exploits and serving as a technical expert on 3rd party vendor engagements.

Prior to joining Microsoft, Lambert developed, maintained and supported numerous computer security applications ranging from Vulnerability Assessment and Risk Management software to Network and Host-Based Intrusion Detection/Prevention Systems for companies such as L-3 Network Security, Veridian Information Solutions, Symantec Corporation and TippingPoint, a division of 3Com.

VN:F [1.9.17_1161]
Rating: +5 (from 5 votes)
Share
2010
03.04

Coding Unpackers for Fun and Profit: TitanEngine Training by
Tomislav Pericin and Nicolas Brulez

Learn how to analyze, unpack and code unpackers for software packers and protectors. Attendees will receive hands-on experience working with the ReversingLabs TitanEngine framework, designed for unpacker creation.

Instructors: Tomislav Pericin and Nicolas Brulez
Dates: 6-8 July 2010
Availability: 10 Seats

Day 1: Static file analysis and static unpacker coding

The focus of the first day is manual file unpacking and static file analysis. We go into deep format analysis to create both simple and more complex static unpackers.

We will focus on real-world protections you are likely to encounter on a day-to-day basis.

Day 2: Dynamic file analysis and dynamic unpacker coding

The second day will cover manual file unpacking and dynamic file analysis. We go into deep format analysis for creating simple and more complex dynamic unpackers. Special attention will be given to dynamic unpacker coding layout and the benefits of using TitanEngine to minimize the time it takes to create an unpacker.

Our focus will be on real world packers you are likely to encounter on a day-to-day basis. These packers top the charts in legitimate software compression, but are often used as malware envelopes.

Day 3: Advanced file analysis and coding complex unpackers

On day 3, we will cover the manual unpacking of complex file packing and protection systems. Special attention will be given to methods used to harden against format reverse engineering and prevent unpacking. We will describe common protection techniques utilized by both legitimate software protectors and those specifically designed for use in malware. We will then use information to show coding techniques needed for such complex dynamic unpackers and ways to counter all the tricks used to harden detection, analysis and unpacking.

Our focus will be on the real-world protections you are likely to encounter on a day-to-day basis.

More info here...

VN:F [1.9.17_1161]
Rating: +5 (from 5 votes)
Share
2010
03.01

This is the second "Ask a developer Monday," in which we answer the most common question we've received recently. The current No. 1 question is: "Why is the entry point after unpacking located in the section named UPX0?"

This is a more complex question than you might think, because it requires understanding the memory models used by software packers, which makes it a perfect question for "Ask a developer Monday"! To answer it we must explore the memory models used, and the possible results that an unpacker could produce.

The first memory model is the one  typically used by software packers. Its main characteristic is a greatly reduced number of portable executable sections - lowered to only two or three, depending on the packer solution itself. With this model, all sections in the file before packing are merged into a single section that is always the first section in the packed file. Most commonly this first section in the packed file holds no data and it is used just as a slot to reserve memory which will be filled after the packer stub finishes the decompression. The virtual size of this first packed file section is equal to the SizeOfImage of the file before it was packed. This gives the packers a powerful option for compressing all code and data in one pass as a single compressed data stream. It also speeds up the time needed to decompress the entire packed content and lowers the compressed file's size because it needs only a single decompression dictionary. The compressed data is commonly stored in the packer section which is either in the second or the third section of the packed file. Since resources must be aligned to SectionAlignment, they commonly get their own section - usually the last section of the file packed  using this memory model.

Although the first model brings faster decompression and smaller files to the table, it has the disadvantage of slightly increasing the memory usage for the packed file. Since memory usage is only increased by the size of the compressed content if that content is displaced from its original location and moved to the packer section, the problem can be avoided by  using a packer that uses a memory model in which the compressed data is stored at its original location. This kind of packer individually compresses the portable executable sections and stores the compressed data inside the same section. With this model, the packer preserves the  section layout the file had prior to packing. Commonly, only one section is added to the original file layout, and that section only contains the packer stub. Compression here is achieved by reducing the physical size of the individual sections. There is a hybrid approach which combines these two memory models, but there isn't a software compression solution that uses it.

Now how does this apply to dynamic unpackers? Since a dynamic unpacker executes the file until it reaches its original entry point and performs a memory dump once that point is reached, it has no impact whatsoever on the memory model used by the packer. That means that the file section layout before and after unpacking will remain the same, with exception of the sections added to the file by the unpacker. These new sections contain the import and relocation data, while the old sections hold the decompressed code, resources and data. There is no way to restore the memory model to its original layout if that kind of data isn't preserved in the packer stub. Since that data isn't commonly preserved by any software packers, a dynamic unpacker can't restore the original memory model layout. And since UPX uses the first memory model as its entry point after the file is unpacked, it will be moved from the section UPX1 to UPX0.  The thing to remember is that section names are not important, what is important is that the data and the code itself are decompressed and this is achieved by the dynamic unpackers.

  • Examples for the fist memory model are: UPX, FSG, RLPack, etc.
  • Examples for the second memory model are: PackMan, ASPack, AlexProtector, etc.

That is it for this weeks Q&A, until next time...

VN:F [1.9.17_1161]
Rating: +2 (from 2 votes)
Share