BinClone used hashing to obtain a fixed-length value out of a variable-length instruction sequence and represents a piece of binary code as a bit-vector to compute similarity. iBinHunt utilized deep taint to identify semantic differences in control flow between programs, but still had low accuracy and high overheads. BinHunt employed a new graph isomorphism technique, symbolic execution, and theorem proving to identify the semantic differences. Traditional methods mainly include Binhunt 8, iBinHunt 9, BinClone 10, Multi_MH 11, discovRE 12. Traditional binary code similarity detectionĬonsidering its applications and challenges, many binary code similar methods have been proposed. An ideal goal of BCSD is that they can identify the similarity of binary codes corresponding to the same source code that has undergone different conversions. What’s more, it is possible to deliberately apply obfuscating transformations to generate polymorphic variants of the same source code. In order to be suitable for different architectures, the target platform can be changed. For example, in order to improve the efficiency of the program, the version of the compiler may be changed, or the compiler may be changed completely. 1, modifying any gray box in the figure will make the same source code compiled into different but semantically equivalent binary programs. One of the main challenges of BCSD is that the same source code will be compiled into different binary codes after using different versions of compilers and selecting different compilation options, etc. Binary code similarity can be applied to scenarios such as software plagiarism detection 2, 3, malware detection and analysis 4, 5, 6, vulnerability detection 7. Depending on the detection granularity, program components can be basic blocks granularity, functions granularity, or entire programs. The BCSD technique is used to measure the similarity relationship between two or more binary program components 1. At the same time, finding similar functions in the compiled code segment gets the most attention. Therefore, the automatic analysis of software artifacts in the compiled form (binary code) is of great significance. SUMAP pointed out in the 2021 CVE Vulnerability Trend Security Analysis Report that the number of CVE statistics in 2020 has ranked first, and the number of Q1 CVEs in 2021 has reached an astonishing 13,000. For example, the extensive use of open source software for resource sharing leads to increased security risks a large number of terminals are in an insecure physical environment, which is more likely to cause leakage of user and business sensitive information. This brings more security risks and challenges to terminals. With the rise of the mobile internet, Internet of things (IoT) and 5G, complex software find applications in all kinds of new devices: the number of architectures running the same program has multiplied and COTS software components are increasingly integrated into closed-source products. The experimental results show that MFFA-Net has better performance for BCSD. MFFA-Net can achieve a high degree of AUC at 99.6% and 98.3% respectively on the two datasets. In order to evaluate the proposed method, we made extensive experiments on two datasets. The AFF module is designed to find useful information from various features, which assigns an attention matrix to research the relationship between features. The SFF module concatenates multiple semantic features to represent the semantics of the function, which helps to obtain the overall semantic information of the function. MFFA-Net contains two critical modules: semantic feature fusion (SFF) and attention feature fusion (AFF). In this paper we propose a multi-semantic feature fusion attention network (MFFA-Net) for BCSD. Besides, exiting works simply extract high-level semantic features, lacking in-depth investigations on the potential mechanisms for fusing low-level and high-level semantic features. Most research is based on recurrent neural networks, which is difficult to get the overall or long-distance semantic information of functions. It can be applied in several fields, such as software plagiarism detection, malware analysis, vulnerability detection. Binary code similarity detection (BCSD) plays a big role in the process of binary application security test.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |