dplyr added versions for group_by.
This allows you to use the same functions as you would use with select().
For example:
data = data.frame(
zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
library(dplyr)
data1 <- data %>%
group_by_at(vars(one_of(columns))) %>%
summarize(Value = mean(value))
#Now compare with plyr for better understanding
data2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(data1 == data2, useNA = 'ifany')
## TRUE
## 27
The output is as expected
# A tibble: 9 x 3
zzz11def zbc123qws1 Value
<fctr> <fctr> <dbl>
1 A A 0.04095002
2 A B 0.24943935
3 A C -0.25783892
4 B A 0.15161805
5 B B 0.27189974
6 B C 0.20858897
7 C A 0.19502221
8 C B 0.56837548
9 C C -0.22682998
dplyr::summarize only strips of one layer of grouping at a time. But, we also have some grouping going on in the resultant tibble
If you want to avoid this unexpected behavior, you can add %>% ungroup to your pipeline after you summarize.