helloworld | R Code: regularized linear regression modeling (Reply)

Learning regularized linear regression modeling. Used the glmnet package. Just slapping a few of the most important things down here.

Remember: need reshape for melt().

Handout: http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/handout_04.pdf

6 Regularized Regression

Create a dataset with two nuisance variables:

> data(mtcars)
> mtcars <- cbind(runif(nrow(mtcars)), runif(nrow(mtcars)), mtcars)
> colnames(mtcars)[1:2] <- c("dummy1", "dummy2")

Fit regularized L1 and L2 regression:

> library(glmnet)
> target <- as.matrix(mtcars$mpg)
> features <- as.matrix(subset(mtcars, select=-c(mpg)))
> reg.l2 <- glmnet(features, target, alpha=0)
> reg.l1 <- glmnet(features, target, alpha=1)

Plot L1 coefficients vs. lambda (do the same thing for L2):

> library(ggplot2)
> library(reshape)
> models <- data.frame(t(rbind(matrix(reg.l1$lambda, nrow=1), as.matrix(reg.l1$beta))))
> colnames(models)[1] <- "lambda"
> models <- melt(models, c("lambda"))
> ggplot(models) + aes(x=log(lambda), y=value, color=variable) + geom_line()

-- Importantly, what one notices is that as lambda gets big, all variables go to zero because the penalty is so high. The last thing to go to zero is, thus, the most important.

Use cross validation to determine the best lambda for L1 (do the same thing for L2):

> cv.l1 <- cv.glmnet(features, target, alpha=1)
> plot(cv.l1, main = "L1 n-fold cross validation error")

To get minimal lambda:

> l <- cv.l1$lambda.min
> l
[1] 0.9644503

And then one can use predict() to grab all the coefficients and the intercept for that lambda, ie:

> mypredictions<- predict(reg.lasso, newx=testing_set, s = cv.mymodel$lambda.min)

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

I don't derive. I just measure.

can a biologist program a computer?

R Code: regularized linear regression modeling

Profile

October 2015

Most Popular Tags

Style Credit

Expand Cut Tags