A Comparison Study of Data Mining Algorithms for blood Cancer Prediction

Document Type : Original Article

Authors

1 Department of Information Technology, Kurdistan Technical Institute, Sulaimani 46001, Kurdistan Region, Iraq

2 Department of Information Technology, College of Informatics, Sulaimani Polytechnic University, Sulaimani 46001, Kurdistan Region, Iraq

doi:10.24271/psr.29

Abstract

Cancer is a common disease that threats the life of one of every three people. This dangerous disease urgently requires early detection and diagnosis. The recent progress in data mining methods, such as classification, has proven the need for machine learning algorithms to apply to large datasets. This paper mainly aims to utilise data mining techniques to classify cancer data sets into blood cancer and non-blood cancer based on pre-defined information and post-defined information obtained after blood tests and CT scan tests. This research conducted using the WEKA data mining tool with 10-fold cross-validation to evaluate and compare different classification algorithms, extract meaningful information from the dataset and accurately identify the most suitable and predictive model. This paper depicted that the most suitable classifier with the best ability to predict the cancerous dataset is Multilayer perceptron with an accuracy of 99.3967%.

Keywords

  1. References

    1. Durairaj and V. Ranjani, "Data Mining Applications In Healthcare Sector: A Study," Int. J. Sci. Technol. Res., vol. 2, no. 10, pp. 29–35, 2013.
    2. C. Koh and G. Tan, "Data mining applications in healthcare.," J. Healthc. Inf. Manag., vol. 19, no. 2, pp. 64–72, 2005, doi: 10.4314/ijonas.v5i1.49926.
    3. M. El-Halees and A. H. Shurrab, "Blood Tumor Prediction Using Data Mining Techniques," Heal. Informatics - An Int. J., vol. 6, no. 2, pp. 23–30, 2017, doi: 10.5121/hiij.2017.6202.
    4. Saichanma, S. Chulsomlee, N. Thangrua, P. Pongsuchart, and D. Sanmun, "The observation report of red blood cell morphology in Thailand teenager by using data mining technique," Adv. Hematol., vol. 2014, pp. 4–9, 2014, doi: 10.1155/2014/493706.
    5. N. Amin and A. Habib, "Comparison of Different Classification Techniques Using WEKA for Hematological Data," Am. J. Eng. Res., no. 43, pp. 2320–847, 2015, [Online]. Available: www.ajer.org.
    6. Li, M. Yang, G. Sablok, J. Fan, and F. Zhou, "Screening features to improve the class prediction of acute myeloid leukemia and myelodysplastic syndrome," Gene, vol. 512, no. 2, pp. 348–354, 2013, doi: 10.1016/j.gene.2012.09.123.
    7. Vijayarani and S. Sudha, "An efficient clustering algorithm for predicting diseases from hemogram blood test samples," Indian J. Sci. Technol., vol. 8, no. 17, 2015, doi: 10.17485/ijst/2015/v8i17/52123.
    8. A. S. A. Daqqa, A. Y. A. Maghari, and W. F. M. Al Sarraj, "Prediction and diagnosis of leukemia using classification algorithms," ICIT 2017 - 8th Int. Conf. Inf. Technol. Proc., no. October, pp. 638–643, 2017, doi: 10.1109/ICITECH.2017.8079919.
    9. Krishna, B. Kumar, N. Orsu, and S. B., "Performance Analysis and Evaluation of Different Data Mining Algorithms used for Cancer Classification," Int. J. Adv. Res. Artif. Intell., vol. 2, no. 5, pp. 49–55, 2013, doi: 10.14569/ijarai.2013.020508.
    10. Durairaj and R. Deepika, "Prediction of Acute Myeloid Leukemia Cancer Using Dataming - A Survey," Int. J. Emerg. Technol. Innov. Eng., vol. 1, no. 2, pp. 94–98, 2015, doi: ISSN: 2394-6598.
    11. David, A. Saeb, and K. Al Rubeaan, "Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics," Comput. Eng. Intell. …, vol. 4, no. 13, pp. 28–39, 2013, [Online]. Available: http://iiste.org/Journals/index.php/CEIS/article/view/9348.
    12. Fern and S. Garc, "SMOTE for Learning from Imbalanced Data : Progress and Challenges , Marking the 15-year Anniversary," vol. 61, pp. 863–905, 2018.
    13. Mylavarapu, Sachin; Kaban, "Random projections versus random selection of features for classification of high dimensional data," in Computational Intelligence (UKCI), 2013, pp. 305–312.
    14. Misra, H. Li, and J. He, Machine Learning for Subsurface Characterisation, 1st Editio. Gulf Professional Publishing, 2020.
    15. Rathi and A. K. Singh, "Breast Cancer Prediction using Naïve Bayes Classifier Breast Cancer Prediction using Naïve Bayes Classifier," vol. 1, no. 2, pp. 77–80, 2012.
    16. Ayer, J. Chhatwal, O. Alagoz, C. E. Kahn, R. W. Woods, and E. S. Burnside, "Informatics in radiology: Comparison of logistic regression and artificial neural network models in breast cancer risk estimation," Radiographics, vol. 30, no. 1, pp. 13–22, 2010, doi: 10.1148/rg.301095057.
    17. Mandák and J. Hančlová, "Use of logistic regression for understanding and prediction of customer churn in telecommunications," Statistika, vol. 99, no. 2, pp. 129–141, 2019.
    18. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, "A comprehensive survey on support vector machine classification: Applications, challenges and trends," Neurocomputing, no. xxxx, 2020, doi: 10.1016/j.neucom.2019.10.118.
    19. H. Hsieh, Z. Wang, P. H. Cheng, I. S. Lee, S. L. Hsieh, and F. Lai, "Leukemia cancer classification based on support vector machine," IEEE Int. Conf. Ind. Informatics, pp. 819–824, 2010, doi: 10.1109/INDIN.2010.5549638.
    20. M. Nasser and S. S. Abu-naser, "Predicting Tumor Category Using Artificial Neural Networks," vol. 3, no. 2, pp. 1–7, 2019.
    21. S. Agrawal and J. Agrawal, "Neural network techniques for cancer prediction: A survey," Procedia Comput. Sci., vol. 60, no. 1, pp. 769–774, 2015, doi: 10.1016/j.procs.2015.08.234.