Skip to main content

dplyr::group_by

dplyr::group_by は、tidyverse コレクションの dplyr パッケージに含まれる関数で、データフレームを指定した列に基づいてグループ化するために使用される。

クイックリファレンス

library(tidyverse)

df %>%
group_by(col1, col2, ...)
penguins %>%
group_by(island, species)

# # A tibble: 344 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Torgersen 39.1 18.7 181 3750 male
# 2 Adelie Torgersen 39.5 17.4 186 3800 female
# 3 Adelie Torgersen 40.3 18 195 3250 female
# 4 Adelie Torgersen NA NA NA NA NA
# 5 Adelie Torgersen 36.7 19.3 193 3450 female
# 6 Adelie Torgersen 39.3 20.6 190 3650 male
# 7 Adelie Torgersen 38.9 17.8 181 3625 female
# 8 Adelie Torgersen 39.2 19.6 195 4675 male
# 9 Adelie Torgersen 34.1 18.1 193 3475 NA
# 10 Adelie Torgersen 42 20.2 190 4250 NA
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows

基本構文

group_by(.data, ..., .add = FALSE)
引数説明
.dataデータフレーム(または tibble)。
...グループ化の基準となる列名。

使用例

1. グループ集計

dplyr::summarise を使用して、グループ集計(Group Aggregation)する。

penguins %>%
group_by(island, species) %>%
summarise(
avg_body_mass_g = mean(body_mass_g, na.rm = TRUE),
n = n(),
)

# # A tibble: 5 × 4
# # Groups: island [3]
# island species avg_body_mass_g n
# <fct> <fct> <dbl> <int>
# 1 Biscoe Adelie 3710. 44
# 2 Biscoe Gentoo 5076. 124
# 3 Dream Adelie 3688. 56
# 4 Dream Chinstrap 3733. 68
# 5 Torgersen Adelie 3706. 52

2. グループ化集計

dplyr::mutate を使用して、グループ化集計(Grouped Calculation)する。

penguins %>%
group_by(island, species) %>%
mutate(
avg_body_mass_g = mean(body_mass_g, na.rm = TRUE),
n = n(),
)

# # A tibble: 344 × 9
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex avg_body_mass_g n
# <fct> <fct> <dbl> <dbl> <int> <int> <fct> <dbl> <int>
# 1 Adelie Torgersen 39.1 18.7 181 3750 male 3706. 52
# 2 Adelie Torgersen 39.5 17.4 186 3800 female 3706. 52
# 3 Adelie Torgersen 40.3 18 195 3250 female 3706. 52
# 4 Adelie Torgersen NA NA NA NA NA 3706. 52
# 5 Adelie Torgersen 36.7 19.3 193 3450 female 3706. 52
# 6 Adelie Torgersen 39.3 20.6 190 3650 male 3706. 52
# 7 Adelie Torgersen 38.9 17.8 181 3625 female 3706. 52
# 8 Adelie Torgersen 39.2 19.6 195 4675 male 3706. 52
# 9 Adelie Torgersen 34.1 18.1 193 3475 NA 3706. 52
# 10 Adelie Torgersen 42 20.2 190 4250 NA 3706. 52
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows

3. 各グループから特定の行を抽出

ヘルパー関数機能
dplyr::slice_head各グループから最初の行を抽出。
dplyr::slice_tail各グループから最後の行を抽出。
dplyr::slice_min各グループから指定した列の値が最小の行を抽出。
dplyr::slice_max各グループから指定した列の値が最大の行を抽出。
dplyr::slice_sample各グループからランダムサンプリング。
penguins %>%
group_by(island, species) %>%
slice_head()

# # A tibble: 5 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Biscoe 37.8 18.3 174 3400 female
# 2 Gentoo Biscoe 46.1 13.2 211 4500 female
# 3 Adelie Dream 39.5 16.7 178 3250 female
# 4 Chinstrap Dream 46.5 17.9 192 3500 female
# 5 Adelie Torgersen 39.1 18.7 181 3750 male
penguins %>%
group_by(island, species) %>%
slice_tail()

# # A tibble: 5 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Biscoe 42.7 18.3 196 4075 male
# 2 Gentoo Biscoe 49.9 16.1 213 5400 male
# 3 Adelie Dream 41.5 18.5 201 4000 male
# 4 Chinstrap Dream 50.2 18.7 198 3775 female
# 5 Adelie Torgersen 43.1 19.2 197 3500 male
penguins %>%
group_by(island, species) %>%
slice_min(body_mass_g)

# # A tibble: 6 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Biscoe 36.5 16.6 181 2850 female
# 2 Adelie Biscoe 36.4 17.1 184 2850 female
# 3 Gentoo Biscoe 42.7 13.7 208 3950 female
# 4 Adelie Dream 33.1 16.1 178 2900 female
# 5 Chinstrap Dream 46.9 16.6 192 2700 female
# 6 Adelie Torgersen 38.6 17 188 2900 female
penguins %>%
group_by(island, species) %>%
slice_max(body_mass_g)

# # A tibble: 5 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Biscoe 43.2 19 197 4775 male
# 2 Gentoo Biscoe 49.2 15.2 221 6300 male
# 3 Adelie Dream 39.8 19.1 184 4650 male
# 4 Chinstrap Dream 52 20.7 210 4800 male
# 5 Adelie Torgersen 42.9 17.6 196 4700 male
penguins %>%
group_by(island, species) %>%
slice_sample()

# # A tibble: 5 × 7
# # Groups: island, species [5]
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Biscoe 40.6 18.8 193 3800 male
# 2 Gentoo Biscoe 46.5 14.8 217 5200 female
# 3 Adelie Dream 41.1 18.1 205 4300 male
# 4 Chinstrap Dream 58 17.8 181 3700 female
# 5 Adelie Torgersen 44.1 18 210 4000 male