helloworld: (Default)
[personal profile] helloworld
An archive of the code I used to complete part 1 of HW: http://stat.columbia.edu/~rachel/homework/homework1.pdf (with some mods for the actual assignment I was doing)



First download all data directly:

> data1 <- read.csv(url("http://stat.columbia.edu/~rachel/datasets/nyt1.csv"))

For each day, add a column denoting date:

> data1["Day"] <- 1

Then create master file of all 31 days:

> final <- cbind(data1, ... data31)

To delete all entries with age 0, first extract ages, replace all 0's with NAs, then place back in original.

> newages <- final$Age

> newages[newages == 0] <- NA

> final$Age <- newages


Then, delete all rows with NAs.

> final2 <- na.omit(final)

Next, creating the new variable based on age:

> labs = c("<18", "18-24", "25-34", "35-44", "45-54", "55-64", "65+")

Group by ages, making a new column to indicate:

> final3 <- cbind(final2, age_range = cut(newages$Age, breaks=c(-1, 18, 24, 34, 44, 54, 64, 120), labels = labs))

Calculate/create click through rate:

final3$Click_Thru_Rate <- final3$Clicks / final3$Impressions (brackets also work)

For a single day, plot distributions across various facets:

> sp <- ggplot(final3, aes(x=ctr1)) + geom_histogram(binwidth=.05)
> sp + facet_wrap( ~ age_range, ncol = 2)


For all the days, use aggregate() to create data frames containing what I want. Make sure to only deal with the variables that are relevant!

> means <- aggregate(cbind(Clicks, Click_Thru_Rate, Impressions) ~ Day+Age_Group, final3, mean)

> totals <- aggregate(cbind(Clicks, Impressions) ~ Day+Age_Group, final3, sum)


Then make my graphs, for instance, total clicks as a function of day, for each age group:

> ggplot(data=totals, aes(x=Day, y=Clicks, group=Age_Group, colour=Age_Group)) + geom_line() + geom_point()

If I wanted to fix "gender" from 0 and 1 to female and male:

> gendervector <- final3$Gender
> gv <- replace(gendervector, gendervector==0, "Female")
> gv2 <- replace(gv, gv==1, "Male")


If I need to go back to any one day or age group or whatever:

> output <- subset(dataframe, Signed_In==1)
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

helloworld: (Default)
apply(myLife, fuck)

October 2015

S M T W T F S
    123
4 5678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags