Document Abstract
Diabetes is precarious health issue and huge population of India are afflict from it. Entire world is adversely
affected by this problem. In the modern world, it affects the any individual regardless of Age, the factor
leading to diabetes problem are fatness, living style, bad diet, high blood pressure, less physical activity,
etc. People suffering from diabetes have more chance of getting stirred of various diseases like stroke, eye
problem, heart disease, kidney disease, nerve damage, etc. Data analysis concepts are helpful in detection
of complication of diabetes at the primary stage and prevent the patient form the bed effects of diabetics.
Healthcare industries generate huge amount of data which is used for analysis. Diabetes must be prevented
and cured in order to enhance the lives of all those who are impacted by it. Data analysis concepts are
helpful in the detection and prevention of the complication of diabetes at the primary phase. This paper
studies the diabetes data of various state of India. According to the data obtained, prevalence of diabetes in
percentage is almost half in rural area as compare to urban areas; prevalence of pre-diabetes is
approximately 10% to 20% less in rural area than urban areas. The experimental observation shows that the
performance of random forest and SMO are surpass than logistic regression, naive base and decision tree.
The accuracy of random forest is highly acceptable than others.