ELINAC: Autoencoder Approach for Electronic Invoices Data Clustering-Reference-Cited by-同舟云学术

ELINAC: Autoencoder Approach for Electronic Invoices Data Clustering

Published:2022-03-16 Issue:6 Volume:12 Page:3008
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Schulte Johannes P.^ORCID,Giuntini Felipe T.^ORCID,Nobre Renato A.^ORCID,Nascimento Khalil C. do^ORCID,Meneguette Rodolfo I.^ORCID,Li Weigang^ORCID,Gonçalves Vinícius P.^ORCID,Rocha Filho Geraldo P.^ORCID

Abstract

The most common method used to document monetary transactions in Brazil is by issuing electronic invoices (NF-e). The audit of electronic invoices is essential, and this can be improved by using data mining solutions, such as clustering and anomaly detection. However, applying these solutions is not a simple task because NF-e data contains millions of records with noisy fields and nonstandard documents, especially short text descriptions. In addition to these challenges, it is costly to extract information from short texts to identify traces of mismanagement, embezzlement, commercial fraud or tax evasion. Analyzing such data can be more effective when divided into well-defined groups. However, efficient solutions for clustering data with characteristics similar to NF-es have not yet been proposed in the literature. We developed ELINAC, a service for clustering short-text data in NF-es that uses an automatic encoder to cluster data. ELINAC aids in auditing transactions documented in NF-e, clustering similar data by short-text descriptions and making anomaly detection in numeric fields easier. For this, ELINAC explores how to model the automatic encoder without increasing the calculation costs to suppress a large number of short text data. In the worst case, the results show that ELINAC efficiently groups data while performing three times faster than solutions previously adopted in the literature.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/6/3008/pdf

Reference53 articles.

1. Continuous Auditing: Building Automated Auditing Capability

2. Managing the Public Service in Developing Countries;Ozgediz,1983

3. Big Data in Public Affairs

4. Big data in the public sector: Uncertainties and readiness

5. Big data in the public sector;Munné,2016

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Systematic Literature Review and Bibliometric Analysis on Addressing the Vanishing Gradient Issue in Deep Neural Networks for Text Data;Communications in Computer and Information Science;2024

2. Generic Multimodal Gradient-based Meta Learner Framework;2023 26th International Conference on Information Fusion (FUSION);2023-06-28

3. Towards Intelligent Processing of Electronic Invoices: The General Framework and Case Study of Short Text Deep Learning in Brazil;Lecture Notes in Business Information Processing;2023

4. Topic Model with Contextual Outlier Handling: a Study on Electronic Invoice Product Descriptions;Progress in Artificial Intelligence;2023

5. AMANDA: A Middleware for Automatic Migration between Different Database Paradigms;Applied Sciences;2022-06-16