A MACHINE LEARNING CLASSIFICATION APPROACH TO DETECT TLS-BASED MALWARE USING ENTROPY-BASED FLOW SET FEATURES

Author:

Keshkeh Kinan1,Jantan Aman1,Alieyan Kamal2

Affiliation:

1. School of Computer Sciences, Universiti Sains Malaysia, Gelugor, Pulau Pinang, Malaysia

2. Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan

Abstract

Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.Transport Layer Security (TLS) based malware is one of the most hazardous malware types, as it relies on encryption to conceal connections. Due to the complexity of TLS traffic decryption, several anomaly-based detection studies have been conducted to detect TLS-based malware using different features and machine learning (ML) algorithms. However, most of these studies utilized flow features with no feature transformation or relied on inefficient flow feature transformations like frequency-based periodicity analysis and outliers percentage. This paper introduces TLSMalDetect, a TLS-based malware detection approach that integrates periodicity-independent entropy-based flow set (EFS) features generated by a flow feature transformation technique to solve flow feature utilization issues in related research. EFS features effectiveness was evaluated in two ways: (1) by comparing them to the corresponding outliers percentage and flow features using four feature importance methods, and (2) by analyzing classification performance with and without EFS features. Moreover, new Transmission Control Protocol features not explored in literature were incorporated into TLSMalDetect, and their contribution was assessed. This study’s results proved EFS features of the number of packets sent and received were superior to related outliers percentage and flow features and could remarkably increase the performance up to ~42% in the case of Support Vector Machine accuracy. Furthermore, using the basic features, TLSMalDetect achieved the highest accuracy of 93.69% by Naïve Bayes (NB) among the ML algorithms applied. Also, from a comparison view, TLSMalDetect’s Random Forest precision of 98.99% and NB recall of 92.91% exceeded the best relevant findings of previous studies. These comparative results demonstrated the TLSMalDetect’s ability to detect more malware flows out of total malicious flows than existing works. It could also generate more actual alerts from overall alerts than earlier research.

Publisher

UUM Press, Universiti Utara Malaysia

Subject

General Mathematics,General Computer Science

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3