Abstract
Abstract
Various data mining techniques are available today, resulting in different results with varying precisions; therefore, selecting the appropriate methodology can result in a more complete and accurate data analysis. Hence, there are several ways to evaluate the effectiveness of data mining techniques. Choosing the appropriate data mining techniques depends on the type of data on which they will be implemented. When it comes to using data, data in every field has its significance. However, data plays a more significant aspect in specific fields, such as healthcare and data collection for caners. Using data mining techniques to analyze sensitive data like cancers can be challenging if the available information is incomplete, which can significantly impact the results. When working with the information of people with lymphoma cancer, the frequency of factors causing the disease and the lack of information are significant challenges. Lymphoma cancers can be classified as either Hodgkin's disease or non-Hodgkin's disease, which are common cancers. In this research, the criterion for selecting factors tumor markers is the presence of commonality between two types of lymphoma cancer. Five tumor markers, CD3, CD15, CD20, CD30, and LCA, along with the type of lymphoma cancer and the patient's gender, were selected as the variables of this research. Hence, to evaluate two data mining techniques, the Bayesian Networks (Naive Bayes), and the decision tree, we will apply the criteria of accuracy, sensitivity, f-score, and error ratio. However, to determine whether lymphoma cancer diagnosis factors have a positive impact, a 90% confidence interval and a 65% support value have been selected to take into account the highest level of accuracy when determining which factor is effective in diagnosing lymphoma cancer. Based on the implementation of techniques and evaluations, it was determined that the decision tree technique outperformed the Bayesian Networks (Naive Bayes) technique with an accuracy of 82.66%, a sensitivity of 94.98%, a harmonic mean of 85.36%, and an error ratio of 17.33%.Our research also concluded that the presence of CD3 and CD15 positive tumor markers, .also the gender of the individual, do not play a role in the diagnosis of lymphoma cancer. However, CD20 and LCA tumor markers can be effective in diagnosing non-Hodgkin's lymphoma, while CD30 tumor markers can be effective in diagnosing Hodgkin's lymphoma.
Publisher
Research Square Platform LLC
Reference24 articles.
1. Padhy N, Mishra D, Panigrahi R. The survey of data mining applications and feature scope. ArXiv preprint arXiv: 12115723. 2012.
2. Data mining with big data;Wu X;IEEE transactions on knowledge and data engineering,2014
3. Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in knowledge discovery and data mining. 1996.
4. Introduction to the special issue on data mining for health informatics;Ng RT;ACM SIGKDD Explorations Newsletter,2007
5. Data mining for health executive decision support: an imperative with a daunting future!;Glover S;Health services management research,2010