MADLIRA - Malware detection using learning and information retrieval for Android.

MADLIRA – Malware detection using learning and information retrieval for Android.

MADLIRA is a tool for Android malware detection. It consists in two components: TFIDF component and SVM learning component. In gerneral, it takes an input a set of malwares and benwares and then extracts the malicious behaviors (TFIDF component) or computes training model (SVM classifier). Then, it uses this knowledge to detect malicious behaviors in the Android application.

Functionality
This tool have two main components: TFIDF component and SVM component.
For this component, there are two functions: the training function (Malicious behavior extraction) and the test function (Malicious behavior detection)

MADLIRA

TFIDF component
* Malicious behavior extraction
– Collect benign applications and malicious applications and oput them in folders named benginAPKFolder and maliciousApkFolder, respectively.
– Prepare training data and pack them in two files named benignPack and maliciousPack by using the command:

– Extracting malicious behaviors from two packed files (benignPack and maliciousPack) by using the command:

* Malicious behavior detection
– Collect new applications and put them in a folder named checkApk.
– Detect malicious behaviors of applications in the folder checkApk by using the command:

SVM component
For this component, there are two functions: the training function and the test function.
* Training phase
– Collect benign applications in a folder named benignApkFolder and malicious applications in a folder named maliciousApkFolder.
– Prepare training data by using the commands:

– Compute the training model by this command:

* Malicious behavior detection
– Collect new applications and put them in a folder named checkApk
– Detect malicious behaviors of applications in the folder checkApk by using the command:

Installed Data:
+ MADLIRA.jar is the main application.
+ noAPI.txt declares the prefix of APIs.
+ family.txt lists malwares by family.
+ Folder TrainData contains the training configuration and training model.
+ Folder Samples contains sample data.
+ Folder TempData contains data for kernel computation.

Use and Download:

Source: https://github.com/dkhuuthe