COMPLETED Code Archive for Columbia HW #1
Mar. 1st, 2013 05:00 pmAn archive of the code I used to complete part 1 of HW: http://stat.columbia.edu/~rachel/homework/homework1.pdf (with some mods for the actual assignment I was doing)
First download all data directly:
For each day, add a column denoting date:
Then create master file of all 31 days:
To delete all entries with age 0, first extract ages, replace all 0's with NAs, then place back in original.
Then, delete all rows with NAs.
Next, creating the new variable based on age:
Group by ages, making a new column to indicate:
Calculate/create click through rate:
For a single day, plot distributions across various facets:
For all the days, use aggregate() to create data frames containing what I want. Make sure to only deal with the variables that are relevant!
Then make my graphs, for instance, total clicks as a function of day, for each age group:
If I wanted to fix "gender" from 0 and 1 to female and male:
If I need to go back to any one day or age group or whatever:
First download all data directly:
> data1 <- read.csv(url("http://stat.columbia.edu/~rachel/datasets/nyt1.csv"))For each day, add a column denoting date:
> data1["Day"] <- 1Then create master file of all 31 days:
> final <- cbind(data1, ... data31)To delete all entries with age 0, first extract ages, replace all 0's with NAs, then place back in original.
> newages <- final$Age
> newages[newages == 0] <- NA
> final$Age <- newagesThen, delete all rows with NAs.
> final2 <- na.omit(final)Next, creating the new variable based on age:
> labs = c("<18", "18-24", "25-34", "35-44", "45-54", "55-64", "65+")Group by ages, making a new column to indicate:
> final3 <- cbind(final2, age_range = cut(newages$Age, breaks=c(-1, 18, 24, 34, 44, 54, 64, 120), labels = labs))Calculate/create click through rate:
final3$Click_Thru_Rate <- final3$Clicks / final3$Impressions (brackets also work)For a single day, plot distributions across various facets:
> sp <- ggplot(final3, aes(x=ctr1)) + geom_histogram(binwidth=.05)
> sp + facet_wrap( ~ age_range, ncol = 2)For all the days, use aggregate() to create data frames containing what I want. Make sure to only deal with the variables that are relevant!
> means <- aggregate(cbind(Clicks, Click_Thru_Rate, Impressions) ~ Day+Age_Group, final3, mean)
> totals <- aggregate(cbind(Clicks, Impressions) ~ Day+Age_Group, final3, sum)Then make my graphs, for instance, total clicks as a function of day, for each age group:
> ggplot(data=totals, aes(x=Day, y=Clicks, group=Age_Group, colour=Age_Group)) + geom_line() + geom_point()If I wanted to fix "gender" from 0 and 1 to female and male:
> gendervector <- final3$Gender
> gv <- replace(gendervector, gendervector==0, "Female")
> gv2 <- replace(gv, gv==1, "Male")If I need to go back to any one day or age group or whatever:
> output <- subset(dataframe, Signed_In==1)