Here, we have used simpler datasets that are structured and manually curated for machine learning application development, and, of course, many of them show good classification accuracy. The Wisconsin Breast Cancer Dataset from the UCI machine learning repository (https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original), contains data that was donated by researchers at the University of Wisconsin and includes measurements from digitized images of fine-needle aspirations of breast masses. The values represent characteristics of the cell nuclei present in the digital images described in the following subsection:
0. Sample code number id number
1. Clump Thickness 1 - 10
2. Uniformity of Cell Size 1 - 10
3. Uniformity of Cell Shape 1 - 10
4. Marginal Adhesion 1 - 10
5. Single Epithelial Cell Size 1 - 10
6. Bare Nuclei 1 - 10
7. Bland Chromatin 1 - 10
8. Normal Nucleoli 1 - 10
9. Mitoses 1 - 10
10. Class: (2 for benign, 4 for malignant)
To read more about the Wisconsin Breast Cancer Dataset, refer to the authors' publication: Nuclear feature extraction for breast tumor diagnosis, IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, volume 1905, pp 861-870 by W.N. Street, W.H. Wolberg, and O.L. Mangasarian, 1993.