T2DM PREDICTION USING ML

4 min readNov 13, 2020

INTRODUCTION:

Two kinds of algorithmic tasks are frequently performed in the supervised machine learning environment. One is classified as regression, the other as classification.ML Techniques such as support vector machine, Random Forest, Adaboost, Neural Networks gives much Well Predicted Accuracy than Classical Algorithms. The trouble starts when we strive to explain the effect of various variables. In This Blog, I will be showing my outline of my project using all Ml Techniques.

ACKNOWLEDGMENT:

I would Like to say Whole Heartedly sincere Thanks To DR.Indrajeet Gupta Sir and DR.Deepak Garg sir for helping me throughout the project and giving me some crucial important points regarding how to improve my project in order to achieve a successful outcome and I would also like to say sincere thanks to Bennett University for allowing me to Do this Capstone project and getting the best out from me.

DATASET DESCRIPTION:

A Kaggle data set with 10 functionality and 15k records were collected. Dataset The file contains data on whether or not an entity is diabetic with a Diabetic performance parameter, the value of which is 1 if he or she is diabetic and 0 If she is not diabetic.

Some Features in my Dataset Are:

Patient ID — specific patient identification number

Pregnancy — The number of pregnancies a patient has encountered.

Glucose plasma — plasma glucose level in the patient’s body every two hours (represented in mM — millimolar units).

Diastolic Blood Pressure — (represented in the units of mm Hg) Triceps

Triceps — Triceps skinfold thickness (represented in the units of mm)

Insulin — insulin level in every two hours gap (represented in the units of mu U/ml)

Body Mass Index (BMI) — The ratio of weight is to height square of the person (represented in the units of kg/m2)

Diabetes Pedigree — Depends on the genetic conditions of the family of the person

Age — (represented in the units of years)

Diabetic (Output) — whether a person is diabetic or not

EXPERIMENTAL SETUP:

The experiment consists of different models used for binary evaluation. The programming language is python used and implemented with the aid of many libraries of python. Python is a programming language that allows the coder to quickly be implemented across many packages and libraries. This work imports embedded libraries such as numpies, pandas, scientists, and Keras. Numpy is used for many mathematical equations, in which data set pandas are used. For all machine learning programming, scikit learning is used while Keras is used to implement neural networks. These libraries contain numerous approaches, which were utilized entirely in this experiment.

PROPOSED SOLUTION:

In algorithms, the design follows a top-down process. Each branch from the same node represents a different execution process. Help for the vectors, decision-making trees, neural networks, and linear discriminant analysis are various approaches involved. Bagging and AdaBoost assembling methods were used to increase the precision of models. During the data extracted by the dataset to match classification algorithms, several data pre-processing tools including MinMaxScaler and StandardScaler were utilised.

EXPERIMENTAL RESULTS:

The experiment is conducted using many techniques including decision tree, supporting vector machines, linear discrimination analysis, AdaBoost using a decision tree, Adaboost using SVM, decision tree bagging, and neural networks. The experiment is performed with different data mining and machine learning techniques. Ensemble methods are often used to improve the precision and recall achieved by these methods. The study is conducted to predict the most reliable model for the given diabetes data set. The purpose of the analysis is to measure T2DM in an individual using different classification algorithms of machine learning.

The experiment demonstrated that ensemble methods, such as boosting, provide the best accuracy and hence the best model for predicting diabetes with a similar collection of data. The research shows that Adaboost is the best model to predict T2DM in a patient using decision tree classification. Bagging with a decision tree is the second-best model recorded. The experiment shows that models using a decision tree yield the best result followed by a model for neural networks.

SVM: