Disponible sur ORBi depuis le
17 octobre 2017
Communication poster (Colloques et congrès scientifiques)
BIGMOMAL - Big Data Analytics for Mobile Malware Detection
Wassermann, Sarah; Casas, Pedro
2017 • ACM Internet Measurement Conference 2017


Texte intégral
Preprint Auteur (7.98 MB)

Tous les documents dans ORBi sont protégés par une licence d'utilisation.

Envoyer vers


Mots-clés :
Mobile Malware Detection; Big Data Analytics; Machine Learning
Résumé :
[en] Mobile malware is on the rise. Due to their popularity, smartphones represent an attractive target for cybercriminals, especially regarding unauthorized access to private user data; smartphones incorporate a lot of sensitive information about users, even more than a personal computer. Indeed, besides personal information such as documents, accounts, passwords, contacts, etc., smartphone sensors centralize other sensitive data such as user location, physical activities, etc. In this paper, we study the problem of malware detection in smartphones, using supervised machine learning models and big data analytics frameworks. Using a publicly available dataset for smartphone data analysis (the SherLock data collection, see http://bigdata.ise.bgu.ac.il/sherlock/), we train and benchmark different supervised machine learning models to detect malware apps activity.The Sherlock data collection is a crowdsourcing-based smartphone dataset in which hundreds of features from many different "sensors" or vantage points within the device are monitored, using a tailored smartphone agent. The collection is done during a long-term - 2 years (2015/16), field trial on 50 smartphones used as primary device for 50 different participants. The monitoring agent collects a wide variety of network, software and sensor data at a high sample rate (as low as 5 seconds); in addition, participant devices include a sandbox-like smartphone agent which runs controlled malware apps, perpetrating attacks on the user's device (such as contacts theft, spyware, phishing, etc.), while creating labels for the SherLock dataset. The complete labeled dataset contains more than 10 billion data records, with a total of about 4 TB of data. We additionally complement the labels for malicious apps which might have been installed by participants by analyzing the installed apps' hashes in Virus Total (https://www.virustotal.com), a well-known multi antivirus online scanning system. From the complete dataset, we keep two specific feature categories: all those features related to the network traffic generated by the apps, and all those features corresponding to the footprint of the app on the CPU and internal running processes (e.g., statistics on CPUs, memory usage, linux-level processes information, etc.). The rationale is that some malware activity would be more visible at the network traffic level, whereas some others would be better identified at the local processes level. Using this dataset, we train different machine learning models (e.g., decision trees, neural networks, SVMs, etc.) and verify their accuracy to automatically spot out malicious apps running on the users’ devices. We also apply feature selection strategies to improve results and reduce computational times. Given the size of the dataset, we rely on big data platforms (such as Spark) to perform the analysis, complementing the machine learning based analysis with scikit-learn like pipelines. We evaluate three different concepts, including (i) overall model performance (using multi-fold cross validation on the complete dataset), (ii) generalization of the learned models across different users (train in N-1 users, and test in the remaining user), and (iii) detection accuracy drift along time (train during first month/week, test the resulting model in the subsequent months/weeks). Initial results are very promising, especially regarding overall model performance for decision tree based models.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Wassermann, Sarah ;  Université de Liège - ULiège > Master sc. informatiques, à fin.
Casas, Pedro
Langue du document :
Titre :
BIGMOMAL - Big Data Analytics for Mobile Malware Detection
Date de publication/diffusion :
novembre 2017
Nom de la manifestation :
ACM Internet Measurement Conference 2017
Lieu de la manifestation :
London, Royaume-Uni
Date de la manifestation :
du 1 novembre 2017 au 3 novembre 2017
Manifestation à portée :
Intitulé du projet de recherche :


Nombre de vues
397 (dont 11 ULiège)
Nombre de téléchargements
57 (dont 1 ULiège)


Publications similaires

Contacter ORBi