Learning regularized linear regression modeling. Used the
Remember: need
Handout: http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/handout_04.pdf
6 Regularized Regression
Create a dataset with two nuisance variables:
Fit regularized L1 and L2 regression:
Plot L1 coefficients vs. lambda (do the same thing for L2):
-- Importantly, what one notices is that as lambda gets big, all variables go to zero because the penalty is so high. The last thing to go to zero is, thus, the most important.
Use cross validation to determine the best lambda for L1 (do the same thing for L2):
To get minimal lambda:
And then one can use
glmnet package. Just slapping a few of the most important things down here.Remember: need
reshape for melt().Handout: http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/handout_04.pdf
6 Regularized Regression
Create a dataset with two nuisance variables:
> data(mtcars)
> mtcars <- cbind(runif(nrow(mtcars)), runif(nrow(mtcars)), mtcars)
> colnames(mtcars)[1:2] <- c("dummy1", "dummy2")Fit regularized L1 and L2 regression:
> library(glmnet)
> target <- as.matrix(mtcars$mpg)
> features <- as.matrix(subset(mtcars, select=-c(mpg)))
> reg.l2 <- glmnet(features, target, alpha=0)
> reg.l1 <- glmnet(features, target, alpha=1)Plot L1 coefficients vs. lambda (do the same thing for L2):
> library(ggplot2)
> library(reshape)
> models <- data.frame(t(rbind(matrix(reg.l1$lambda, nrow=1), as.matrix(reg.l1$beta))))
> colnames(models)[1] <- "lambda"
> models <- melt(models, c("lambda"))
> ggplot(models) + aes(x=log(lambda), y=value, color=variable) + geom_line()-- Importantly, what one notices is that as lambda gets big, all variables go to zero because the penalty is so high. The last thing to go to zero is, thus, the most important.
Use cross validation to determine the best lambda for L1 (do the same thing for L2):
> cv.l1 <- cv.glmnet(features, target, alpha=1)
> plot(cv.l1, main = "L1 n-fold cross validation error")To get minimal lambda:
> l <- cv.l1$lambda.min
> l
[1] 0.9644503And then one can use
predict() to grab all the coefficients and the intercept for that lambda, ie:> mypredictions<- predict(reg.lasso, newx=testing_set, s = cv.mymodel$lambda.min)