见:
plyr函数前两字表示输入及输出形式
a = array, d= data frame, l = list, _ = 不输出
[adl]*ply 是对列表的每一个元素执行指定操作
m*ply 多种输入,接收 矩阵 / list-array / 数据框,按行切分,执行fun,每行内容做为fun的参数传入
r*ply 重复执行某项操作,适合画随机数分布
参数
.margins 数据切片的方式
以二维变量为例,.margins = 1 为按行切,=2 为按列切, = c(1,2) 为按每个cell切分
切片key可以组合,例如 .( product = a * b, round_a = round(a) )
.fun
指定作用于每个切片的函数
.progress
输出进度条,例如.progress=”text”输出文本模式进度条
splat
作用于整个dataframe
ddply(mtcars, .(round(wt)), function(df) mean_hp_cyl(df$hp, df$cyl))
ddply(mtcars, .(round(wt)), splat(mean_hp_cyl))each:函数列表
each(min,max) 相当于 function (x) c(min = min(x), max = max(x))
例子
排序 arrange(myCars, cyl, desc(disp))
列转换(mutate功能与transform相同,但mutate中新增的列col_b可以引用刚刚新增的列col_a)
baseball <- ddply(baseball, .(id), transform, cyear = year - min(year) + 1)
base2 <- ddply(baseball, .(id), mutate,career_year = year - min(year) + 1)
mutate(df, cyear = year - min(year),cpercent = cyear / (max(year) - min(year)))子集
subset(somedata, somecol > 0.999)$id
ddply(coefs_df, .(lat, long), subset, value == min(value))聚合统计
ddply(coefs_df, .(lat, long), summarise,
ozone_min = min(value), ozone_max = max(value))
ddply(mtcars, .(logcyl = log(cyl)), each(nrow, ncol))对各year数据中的各列元素求平均
ddply(baseball, .(year), colwise(median))
ddply(baseball, .(year), colwise(nmissing, c("sb", "cs", "so")))按指定条件分块统计
count(baseball[1:100,], c("id", "year"))
函数f执行失败时,以NULL替代,不错误退出
safef <- failwith(NULL, f)对传入mutate时,还不存在于df的列做转换
df <- data.frame(a = rep(c("a","b"), each = 10), b = 1:20)
label <- "xxx"
ddply(df, "a", here(mutate), label = paste(label, b))合并两个data.frame
join(x, y, by = NULL, type = "left", match = "all")其中,
- by 指定合并的条件
- type 指定合并的方式 left/right/inner/full
- match 指定取出的数据集合 all/first,与sql类似
合并多个data.frame
dfs <- list(
a = data.frame(x = 1:10, a = runif(10)),
b = data.frame(x = 1:10, b = runif(10)),
c = data.frame(x = 1:10, c = runif(10))
)
join_all(dfs)
join_all(dfs, "x")迭代版本的llply
liply(baseball_id, summarise, mean_rbi = mean(rbi, na.rm = TRUE))替换值
mapvalues(z, from = c(1, 5, 9), to = c(10, 50, 90))match_df 根据指定的longterm,筛出baseball中符合条件的原始数据
longterm <- subset(count(baseball, "id"), freq > 25)
bb_longterm <- match_df(baseball, longterm, on="id")
bb_longterm[1:5,]更新列名
x <- rename(x, replace=c("d" = "c"))
更新取值
y <- factor(c("a", "b", "c", "a"))
revalue(y, c(a = "A", c = "C"))数据精度调整
round_any(135, 25, floor)
round_any(135, 10, ceiling)根据指定维度取出子集
x <- array(seq_len(3 * 4 * 5), c(3, 4, 5))
take(x, 3, 1)
take(x, 2, 1)
take(x, 3, 1, drop = TRUE)
没有评论:
发表评论