2010
06.03

We had a great time during this year's CARO Workshop Conference held in Helsinki last week.  Now it is the time to sort out our impressions.  First of all, thanks to all that have made it to our talk and asked us many intriguing questions. Slides for our talk are available here. The picture you see above is from the brilliant keynote held by Dr. Alan Solomon. We absolutely enjoyed the keynote and Dr. Solomon's remark regarding the perfect antivirus represented by his three batch files.

Our talk was focused on improving the file analysis metrics and on unpacking technology performance testing,  comparing different solutions. During the talk we have presented a new idea for unpacking optimization.  We proposed unpacking through "binary layering" which enables the reuse of unpacking technology as much as possible. Put simply binary layering enables scanning various parts of the binary object and attributing them to known packing formats. Since multiple segments of the same file can have different formats attached to them we recognize that files commonly don't have simple identities but instead their complex layout is viewed as file's complex identity. These complex identities give much more detailed picture about the file itself and enable easy file categorization and further analysis.

We also talked about optimization that can improve file analysis system's metrics.   In this regard, we have shown that binary layering can improve the unpacking speed when identified segments are processed in parallel. Most objects can take advantage of this kind of optimization, but with some exceptions.  Specifically, this applies to cases where binary object requires other objects to be present in predefined way prevents unpacking one file at the time.   Similarly, it also applies to cases where there are multiple one way unpacking layers with output of the previous layer serving as input for the next one.

To test our hypothesis we did a comparative test using our lab tools and Kaspersky Anti-virus, which incorporates both file unpacking and malicious payload detection.   For the test to be relevant enough and to avoid inclusion of  malware scanning into unpacking metrics we have performed the following:

  • Inspecting metrics for our internal lab unpacking tools
  • Inspecting metrics for KAV on the predefined set of packed files
  • Inspecting metrics for KAV on the set of unpacked files produced by our internal lab tools

It is necessary to perform these three steps together in order to obtain relevant results.  Third step excludes unpacking from scanning results and therefore gets a relatively good comparison for unpacking metrics.  For the purpose of our presentation we performed two distinct tests, one on packed portable executable files and one on installer packages.  The first test has employed one way unpacking while the second test has used non-parallel "binary layering" to detect and unpack files. Here are the results for the first test:

This first test was performed on 1627 portable executable files packed with 140 different packer families. It  demonstrated that our internal tool (referred here as the "BlackBox") has successfully unpacked 95% of the files in 530 seconds. Remaining 74 files we declared as invalid either for static or dynamic analysis, indicating that file recovery can not be applied to salvage corrupt data. This means that reported 1568 objects is the number of output files that were processed by this unpacking library. KAV processed the same amount of files in 534 seconds reporting 4533 objects and 249 events. To clarify, KAV counts all files it finds inside the packed content (every packing level is counted) and then reports the actual number of files detected by its signatures. Number of events refers to all additional operations KAV performs on scanned files such as malware detection, quarantine or deletion action. Finally, in the last step KAV scanned 1568 unpacked files that were produced by BlackBox. Third step eliminates the need for unpacking since all files are already unpacked. This part completed in 300 seconds and KAV reported2042 objects and 35 events. To take into the account the unpacking that was initially performed with BlackBox we have added its execution time to the scan time. Results: KAV performs its scan faster with fewer objects that need scanning. Additionally, there are less events indicating false positive detection on the packer formats themselves .  Granted a small amount of packers used in our test base should be blacklisted as their main use historically has been to hide malicious payload.

Its important to note that the unpacking methods used by BlackBox and KAV are completely different. While KAV mostly uses static unpacking to decompress data to memory, our BlackBox uses both dynamic and static unpacking  methods to decompress data to disk with multiple drive accesses. It is slowed down even further when unpacking dynamic link libraries due to snapshot comparison to repair relocation table. Optimizations can be performed to improve these unpacking results, but none were used. Hence we feel confidant that if all of these unpackers were done using TitanEngine, a significant unpacking speed increase would be gained.

Now, lets move to our second, more interesting test.  Here are the results:

Our second test was performed on 20 selected non-malicious installer packages. We used another internally developed tool, here referred to as "Core", to produce 4275 files in 95 seconds.  In comparison, KAV scanned these same input packages in 300 seconds, reporting 9174 found files. In our last step, we have performed the scan on unpacked files produced by Core.  In that case KAV reported 12175 files with the unpacking finishing in 360 seconds (this is with the added time for file unpacking done by Core). Number of events reported is two and they refer to scan start and scan finish. No malicious objects were detected. Results: This test shows that when performing unpacking on files  that have been already unpacked by Core, KAV is able to scan 3000 more files in the time that is very close to the time needed to scan the packed content.  Further optimizations could certainly apply that would reduce this number even further.

In conclusion, our initial "binary layering" experiment has performed great in comparison to existing solutions., while our first test has demonstrated the value of diligent support for various packing formats.  As these were only lab experiments, much space is left for further optimization and implementation improvements. Until next week...

We had a great time during this year's CARO Workshop Conference held in Helsinki last week.  Now it is the time to sort out our impressions.  First of all, thanks to all that have made it to our talk and asked us many intriguing questions. Slides for our talk are available here. The picture you see above is from the brilliant keynote held by Dr. Alan Solomon. We absolutely enjoyed the keynote and Dr. Solomon's remark regarding the perfect antivirus represented by his three batch files.

VN:F [1.9.13_1145]
Rating: +1 (from 1 vote)
Share

Comments are closed.