I apologise for how vague this question may appear but I am not finding any resources online to help with this issue.
I have a data frame loaded into R and split into two separate data frames: training and testing.
My data is around diabetes and has 8 variables including "Glucose" which is the primary variable I'm creating the regressional model against.
I have produced a lm of Glucose against all 7 other variables but I am now struggling to select which one needs to be removed.
This is the current output of my model:
Call:
lm(formula = Glucose ~ Pregnancies + BloodPressure + SkinThickness +
Insulin + BMI + DiabetesPedigreeFunction + Age, data = training)
Residuals:
Min 1Q Median 3Q Max
-68.652 -16.047 -3.082 13.346 75.723
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 61.14240 9.67267 6.321 1.08e-09
Pregnancies 0.04819 0.63083 0.076 0.93917
BloodPressure 0.14300 0.12764 1.120 0.26356
SkinThickness 0.10747 0.18138 0.592 0.55403
Insulin 0.12793 0.01291 9.911 < 2e-16
BMI 0.11406 0.28488 0.400 0.68921
DiabetesPedigreeFunction 6.95952 4.16151 1.672 0.09562
Age 0.63202 0.20269 3.118 0.00202
(Intercept) ***
Pregnancies
BloodPressure
SkinThickness
Insulin ***
BMI
DiabetesPedigreeFunction .
Age **
Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 23.78 on 268 degrees of freedom
Multiple R-squared: 0.4036, Adjusted R-squared: 0.3881
F-statistic: 25.91 on 7 and 268 DF, p-value: < 2.2e-16
```