DID Handout #4 Problem #5
Mar. 3rd, 2014 10:31 amPractice problem for linear regression and predictions.
http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/handout_04.pdf
#5 on the handout:
Yes it does make sense to use a linear regression;
Now do the residuals:
(ouch, huge residuals)
Now the variance on the residuals:
Then: "Predict the Sunday circulation of this hypothetical newspaper and give the range that would encompass one standard deviation of the normal distribution induced by this prediction (use the variance estimated from the previous question)."
Do this by hand ...
http://www.umiacs.umd.edu/~jbg/teaching/DATA_DIGGING/handout_04.pdf
#5 on the handout:
> newsdata <- read.csv(url("http://terpconnect.umd.edu/~ying/did/hw3/newspaper.csv"))
> plot(newsdata$daily, newsdata$Sunday)Yes it does make sense to use a linear regression;
> attach(newsdata)
> newsdata.lm = lm(Sunday ~ daily)
> coeffs = coefficients(newsdata.lm); coeffs
(Intercept) daily
76.009807 1.276673 Now do the residuals:
> newsdata.res = resid(newsdata.lm)
> plot(newsdata$daily, newsdata.res, ylab="Residuals", xlab="Daily Readers", main = "Sunday Readership")
> abline(0,0)(ouch, huge residuals)
Now the variance on the residuals:
> var(newsdata.res)
[1] 23308.4
> sd(newsdata.res)
[1] 152.6709Then: "Predict the Sunday circulation of this hypothetical newspaper and give the range that would encompass one standard deviation of the normal distribution induced by this prediction (use the variance estimated from the previous question)."
Do this by hand ...
> newdata <- data.frame(daily=800)
> predict(newsdata.lm, newdata)
1
1097.348
The range is 1097 +/- 152 = 945 to 1249.