| May 6, 2020

Introducing Explainable Threat Intelligence

Next-generation threat detection and hunting algorithms built for humans

Blog Author

Tomislav Peričin, Chief Software Architect & Co-Founder at ReversingLabs. Read More...

When it comes to discussing what threat intelligence is, we believe the definition below is certainly the most encompassing one. While there are many types of threat intelligence, its use in protecting digital environments always represents a security force multiplier. Deploying protections powered by threat intelligence yields immediate value, and as a consequence, drastically improves security posture.

According to the October 2019 Gartner How to Implement a Computer Security Incident Response Program, “Gartner defines “threat intelligence” as 'evidence-based knowledge — including context, mechanisms, indicators, implications and action- oriented advice — about an existing or emerging menace or hazard to assets. This intelligence can be used to inform decisions regarding the subject’s response to that menace or hazard.'"

Organizations protect themselves from emerging threats using external threat intelligence. That type of intelligence data is usually thought of when discussing actionable threat intelligence. However, internally collected threat intelligence data is just as important. Organizations that collect it use this form of intelligence to better position their defenses against threats that target them specifically.

While any actionable threat intelligence is good, the best threat intelligence combines both external and internal data. It comes with context, transparency, and recommended next steps. This is what Explainable Threat Intelligence is -- a comprehensive system for object analysis that provides actionable intelligence and human interpretable data.

Objects are made of matryoshka-like layers of structured code and data. In terms of informational security, almost everything can be considered an object. When objects are stored permanently, they are called files. When they are being transmitted, they are called messages. Regardless of how they are formed, their use is a necessity for all organizations that want to exchange or store information. That is why all threat detection is based on a deep understanding of complex objects.

Explainable threat intelligence is a system designed with this complexity in mind. Combining static, dynamic, and machine learning analysis, this system pulls apart matryoshka dolls. Its analytical capabilities provide threat detection and intelligence for each object layer. The information this system collects is then correlated with external threat intelligence for threat discovery, and with internal threat intelligence to estimate possible organizational impact.

Since its ultimate goal is to automate the vast majority of repeatable, time-consuming, and error-prone analysis tasks, this approach employs a breadth of modern techniques. Object reputation, similarity analysis, and metadata correlation are the key explainable components for the system that discerns good from bad.

Object reputation is the essential first step towards keeping modern organizations secure. By pulling intelligence from the knowledge base of over 10B+ already classified objects, any information exchange or storage point can quickly decide to allow or block content. In its most basic form, reputation checks are implemented as simple object hash lookups. These unique identifiers are computed as object content digests. Since they are unchangeable object properties, hashes are perfect for reputation lookups.

Object similarity is a significant reputation lookup amplifier. Similarity algorithms compute fuzzy object hashes that help determine the reputation of previously unclassified objects. They are especially effective since the vast majority of today's threats are either polymorphic variants of already known malware strains or their newer versions. Both can be easily caught by using object reputation combined with similarity analysis.

Object structures are described with format-specific metadata properties. Those properties are a part of a digital fingerprint that distinguishes one object from another. But unlike human fingerprints, object metadata properties can be used to track down its ancestors and descendants. Objects are made out of thousands of such fingerprints. They get collected statically by inspecting the object structure, and dynamically by observing the object's behavior once it executes.

Metadata correlation automates detection by using digital fingerprints to discover new threats and identify variants of existing threats. Many metadata properties collected during analysis are unique enough to be used this way. For the most common ones, such as URLs and digital certificates, quick reputation lookups are sufficient to convict an object. For more obscure ones, such as product and function names, a more complex lookup is required to achieve the same effect. However, in both cases the goal is the same - convert an object to a set of unique reputation lookup queries that try to correlate its metadata to previously discovered threats.

Explainable Threat Intelligence brings all these analysis techniques and reputation lookups together in the ultimate object analysis system. While this system is platform-independent, the logic it uses can be visualized through a SOAR solution like Splunk Phantom.

The workflow above is designed to take any object through a set of analysis and correlation processes. Each workflow stage collects detection and threat intelligence information that enriches the final analysis report. Since the end goal is detection, each workflow stage can also make the analysis finish early if enough convicting information is found. However, these early exits can be skipped for use cases where complete analysis reports are required to estimate the impact a threat might have on the organization.

Starting at the top left, object reputation is checked first. This step prevents re-analysis of objects that have already been determined to be highly trusted goodware, or the ones that have already been detected as threats. Running some parts of this analysis system costs more than the others, which is why a good analysis workflow must take economics of scale into its design considerations. In terms of actionable threat intelligence data, the report at this point provides threat detection, its global prevalence, and the malware family description.

Assuming the object is not part of our ever-growing 10B+ collection, the analysis continues. The analysis system collects as much static information about the object as possible through ReversingLabs A1000. This malware analysis and threat hunting workstation uses an advanced static decomposition engine that extracts embedded objects and related metadata.

On the threat intelligence collection side, static analysis collects a plethora of information that the rest of the workflow uses for automated threat hunting. This includes hashes of extracted objects, various object format-specific metadata properties (such as PDB paths, section, resource and function names, etc.), embedded URLs and digital certificates, to name just a few.

From the threat detection standpoint, TitaniumCore - the ReversingLabs static analysis engine - employs a dozen classification technologies. By covering signatures, heuristics, metadata artifacts, functional similarity, YARA rules, and ultimately Explainable Machine Learning, this engine is more than capable to detect new and previously known threats.

However, even with predictive threat detection technologies, some new threats could still slip by undetected. This is where automated threat hunting comes in. The workflow takes into consideration every metadata point collected by previous analysis steps, and employs object reputation, similarity and metadata correlation in an attempt to close the new threat detection gap.

That last step is the most advanced one, as it replicates the common actions threat defenders take while investigating suspicious files. All object properties are put under scrutiny by querying the threat intelligence contained within global ReversingLabs TitaniumCloud and local A1000 repositories. More often than not, metadata properties are unique enough to automatically convict an object as a threat. Combining properties together to form complex queries yields even better results. The Explainable Threat Intelligence system does more than just performing these queries - it explains the logic for their selection with human-readable descriptions. The explainability and transparency in the system's decision-making process are the key to building confidence in the analysis outcomes.

Finally, dynamic analysis is performed. Being a time-intensive operation, it is invoked only when absolutely necessary. Like its static counterpart, dynamic analysis provides threat detection and intelligence information. The artifacts it creates, such as network connections and behaviors, are also utilized for automated threat hunting. The final object analysis report is created by merging all these parts together.

While threat discovery is achieved through external threat intelligence, local threat intelligence must be consulted to understand its impact. Plenty of options are available at this point. Collected data can be cross-referenced through a SIEM solution, which is certainly a traditional way of doing impact analysis. A more modern approach would be augmenting SIEM with a real-time query through a deployed EDR solution. Both options are equally valid; choosing one or the other (or both) depends only on what the organization has already deployed.

This workflow is more than familiar to anyone who already tried to roll out an in-house threat detection and analysis system. Connecting the moving parts, collecting the necessary data, and interpreting results provided by different solutions are battle scars of those who ventured this route before. Explainable Threat Intelligence is a uniform system designed to reduce operational friction and ensure that analysts spend less time doing repetitive tasks, allowing them to focus on making impactful decisions.

Our team is excited to finally be at the stage where we can present our automated threat analysis and hunting efforts. We hope to hear your impressions, fellow security practitioners and defenders, as you start using this workflow in your daily battles. Please contact us for ways to give this exciting new technology a try.

Get up to speed on key trends and learn expert insights with The State of Software Supply Chain Security 2024. Plus: Explore RL Spectra Assure for software supply chain security.