Framework Based on Simulation of Real-World Message Streams to Evaluate Classification Solutions

Author:

Hojas-Mazo Wenny1ORCID,Maciá-Pérez Francisco2ORCID,Berná Martínez José Vicente2ORCID,Moreno-Espino Mailyn3ORCID,Lorenzo Fonseca Iren2ORCID,Pavón Juan4ORCID

Affiliation:

1. Departamento de Inteligencia Artificial e Infraestructura de Sistemas Informáticos, Facultad de Ingeniería Informática, Universidad Tecnológica de La Habana, José Antonio Echeverría, Calle 114 #11901, entre 119 y 127, CUJAE, Marianao, La Habana 19390, Cuba

2. Department of Computer Science and Technology, University of Alicante, 03690 Alicante, Spain

3. Centro de Investigación en Computación, Instituto Politécnico Nacional, Ciudad de México 07738, Mexico

4. Instituto de Tecnología del Conocimiento, Universidad Complutense de Madrid, 28040 Madrid, Spain

Abstract

Analysing message streams in a dynamic environment is challenging. Various methods and metrics are used to evaluate message classification solutions, but often fail to realistically simulate the actual environment. As a result, the evaluation can produce overly optimistic results, rendering current solution evaluations inadequate for real-world environments. This paper proposes a framework based on the simulation of real-world message streams to evaluate classification solutions. The framework consists of four modules: message stream simulation, processing, classification and evaluation. The simulation module uses techniques and queueing theory to replicate a real-world message stream. The processing module refines the input messages for optimal classification. The classification module categorises the generated message stream using existing solutions. The evaluation module evaluates the performance of the classification solutions by measuring accuracy, precision and recall. The framework can model different behaviours from different sources, such as different spammers with different attack strategies, press media or social network sources. Each profile generates a message stream that is combined into the main stream for greater realism. A spam detection case study is developed that demonstrates the implementation of the proposed framework and identifies latency and message body obfuscation as critical classification quality parameters.

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3