@MASTERSTHESIS{pgi2016010, author = "S. O. Olafuyi", supervisor = "D. Roussinov", title = "Applying Business Analytics in Practice using Datasets from Thoracic Surgery Cases", school = "Department of Computer and Information Sciences, University of Strathclyde", year = "2015", abstract = "Data mining in the healthcare sector is a growing field that merges information technology and clinical practise. However, it has suffered some setbacks in terms of acceptability by medical professionals despite its high potential for extracting knowledge from information in medical records. This research sought to apply machine learning tools in making predictions on patient health status based on anonymised health dataset from a thoracic surgery dataset that included 470 instances and 17 attributes and it described the post-operative life expectancy of the patients within a year after undergoing a surgery procedure. This research revealed the advantages of the data mining process by comparing the performance of several algorithms and proposing the use of the random forest algorithm on an imbalanced health dataset based on its ease of dealing with noise and overfitting problems; its transparency; ease of description and performance measure. The Weka and R software were used to analyse the datasets. These software programs were first integrated in order to combine the strength of data visualization in R and classification algorithms in Weka during the analysis. Then different pre-processing activities were employed to prepare the data for mining and knowledge discovery, followed by algorithm selection and further post-processing activities. An iterative visualization process was employed throughout the process. Finally, the random forest method was used in predicting the post-operative live expectancy of lung cancer patients with the patients and the results explained. The random forest outperformed other algorithms by correctly diagnosing 84.1% of all thoracic surgery patients that died and 81.3% of all the patients that survived with an overall accuracy of 82.7%.", }