Affiliation:
1. Department of ECE, AMCEC, (Affiliated of VTU), Bangalore, India.
Abstract
Data preprocessing is the first step in machine learning to ensure data quality and extract useful information from datasets.
Derived data after data processing is used for model training and has a direct impact on model efficiency. If there is no relevant and
dispensable information in the dataset, it will be removed from the dataset to ensure data quality. Data pre-processing includes
description of data, null value handling, categorical value coding, normalization, transformation, extraction and selection of various
features.
Subject
General Earth and Planetary Sciences,Earth-Surface Processes,General Engineering,Soil Science,General Environmental Science,Marketing,Management Science and Operations Research,Strategy and Management,Management Information Systems,Management Science and Operations Research,Management Science and Operations Research,General Decision Sciences,Atomic and Molecular Physics, and Optics,Law,Religious studies,Anthropology,History,Cultural Studies,History and Philosophy of Science,History,General Physics and Astronomy,Atomic and Molecular Physics, and Optics,Linguistics and Language,Education
Reference15 articles.
1. https://medium.com/analytics-vidhya/data-visualization-titanic-data-set -91531c3ab5a6
2. https://www.researchgate.net/publication/228084519_Data_Preprocessi ng_for_Supervised_Learning.
3. C. Cardie. Using decision trees to improve cased-based learning. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1995.
4. Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem. Data Mining and Knowledge Discovery 2(1):9-37, 1998.
5. Friedman, J.H. 1997. Data mining and statistics: What’s the connection? Proceedings of the 29th Symposium on the Interface Between Computer Science and Statistics.