Dataset for file fragment classification of textual file formats-Reference-Cited by-同舟云学术

Dataset for file fragment classification of textual file formats

Published:2019-12 Issue:1 Volume:12 Page:
ISSN:1756-0500
Container-title:BMC Research Notes
language:en
Short-container-title:BMC Res Notes

Author:

Mansouri Hanis Fatemeh,Teimouri Mehdi^ORCID

Abstract

Abstract Objectives Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets. Data description In this study, we present a dataset that contains file fragments of five textual file formats: Binary file format for Word 97–Word 2003, Microsoft Word open XML format, portable document format, rich text file, and standard text document. This dataset contains the file fragments in three different languages: English, Persian, and Chinese. For each pair of file format and language, 1500 file fragments are provided. So, the dataset of file fragments contains 22,500 file fragments.

Publisher

Springer Science and Business Media LLC

Subject

General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

http://link.springer.com/content/pdf/10.1186/s13104-019-4837-4.pdf

Reference9 articles.

1. McDaniel M, Heydari MH, eds. Content based file type detection algorithms. In: 36th annual Hawaii international conference of system sciences. IEEE; 2003.

2. Calhoun WC, Coles D. Predicting the types of file fragments. Digit Investig. 2008;5:S14–20.

3. Fitzgerald S, Mathews G, Morris C, Zhulyn O. Using NLP techniques for file fragment classification. Digit Investig. 2012;9:S44–9.

4. Beebe NL, Maddox LA, Liu L, Sun M. Sceadan: using concatenated N-gram vectors for improved file and data type classification. IEEE TransInf Forensics Secur. 2013;8(9):1519–30.

5. Chen Q, Liao Q, Jiang ZL, Fang J, Yiu S, Xi G, et al., eds. File fragment classification using grayscale image conversion and deep learning in digital forensics. In: 2018 IEEE security and privacy workshops (SPW. IEEE); 2018.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Intra- and inter-sector contextual information fusion with joint self-attention for file fragment classification;Knowledge-Based Systems;2024-05

2. Classification of Low- and High-Entropy File Fragments Using Randomness Measures and Discrete Fourier Transform Coefficients;Vietnam Journal of Computer Science;2023-07-28

3. ByteRCNN: Enhancing File Fragment Type Identification With Recurrent and Convolutional Neural Networks;IEEE Access;2023

4. Anomaly Detection in File Fragment Classification of Image File Formats;2021 11th International Conference on Computer Engineering and Knowledge (ICCKE);2021-10-28

5. Dataset for file fragment classification of video file formats;BMC Research Notes;2020-04-15