列の選択
準備
1. パッケージのインストール
- R
- Python
- SQL
install.packages("tidyverse")
pip install pandas
なし。
2. パッケージの読み込み
- R
- Python
- SQL
library(tidyverse)
import pandas as pd
なし。
プログラム
R | Python | SQL | |
---|---|---|---|
1. 列名を指定して選択 | Awesome | Awesome | Awesome |
2. 列番号を指定して選択 | Awesome | Awesome | - |
3. 列名を変更して選択 | Awesome | Not Awesome | Awesome |
4. 全ての列を選択 | Awesome | Awesome | Awesome |
5. 指定した列以外を選択 | Awesome | Awesome | Awesome |
6. データ型を指定して選択 | Awesome | Awesome | - |
7. 欠損値が含まれる列を選択 | Awesome | Not Awesome | - |
8. パターンに一致する列を選択 | Awesome | Awesome | - |
9. 最後の列を選択 | Awesome | Awesome | - |
1. 列名を指定して選択
- R
- Python
- SQL
構文
df %>%
select(col1, col2)
サンプルコード
penguins %>%
select(species, island)
# # A tibble: 344 × 2
# species island
# <fct> <fct>
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# 5 Adelie Torgersen
# 6 Adelie Torgersen
# 7 Adelie Torgersen
# 8 Adelie Torgersen
# 9 Adelie Torgersen
# 10 Adelie Torgersen
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df[["col1", "col2"]]
サンプルコード
penguins[["species", "island"]]
# species island
# 0 Adelie Torgersen
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# .. ... ...
# 339 Chinstrap Dream
# 340 Chinstrap Dream
# 341 Chinstrap Dream
# 342 Chinstrap Dream
# 343 Chinstrap Dream
# [344 rows x 2 columns]
構文
select
col1,
col2,
from tbl;
サンプルコード
select
species,
island
from penguins;
# | species | island |
# | ------- | --------- |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | ... | ... |
2. 列番号を指定して選択
- R
- Python
- SQL
構文
df %>%
select(1, 2)
サンプルコード
penguins %>%
select(1, 2)
# # A tibble: 344 × 2
# species island
# <fct> <fct>
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# 5 Adelie Torgersen
# 6 Adelie Torgersen
# 7 Adelie Torgersen
# 8 Adelie Torgersen
# 9 Adelie Torgersen
# 10 Adelie Torgersen
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df.iloc[:, [0, 1]]
サンプルコード
penguins.iloc[:, [0, 1]]
# species island
# 0 Adelie Torgersen
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# .. ... ...
# 339 Chinstrap Dream
# 340 Chinstrap Dream
# 341 Chinstrap Dream
# 342 Chinstrap Dream
# 343 Chinstrap Dream
# [344 rows x 2 columns]
SQL では実装できない。
3. 列名を変更して選択
- R
- Python
- SQL
構文
df %>%
select(new_col1 = col1, new_col2 = col2)
サンプルコード
penguins %>%
select(SPECIES = species, ISLAND = island)
# # A tibble: 344 × 2
# SPECIES ISLAND
# <fct> <fct>
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# 5 Adelie Torgersen
# 6 Adelie Torgersen
# 7 Adelie Torgersen
# 8 Adelie Torgersen
# 9 Adelie Torgersen
# 10 Adelie Torgersen
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df[["col1", "col2"]].rename(columns={"col1": "new_col1", "col2": "new_col2"})
サンプルコード
penguins[['species', 'island']].rename(columns={"species": "SPECIES", "island": "ISLAND"})
# SPECIES ISLAND
# 0 Adelie Torgersen
# 1 Adelie Torgersen
# 2 Adelie Torgersen
# 3 Adelie Torgersen
# 4 Adelie Torgersen
# .. ... ...
# 339 Chinstrap Dream
# 340 Chinstrap Dream
# 341 Chinstrap Dream
# 342 Chinstrap Dream
# 343 Chinstrap Dream
# [344 rows x 2 columns]
構文
select
col1 as new_col1,
col2 as new_col2,
from tbl;
サンプルコード
select
species as SPECIES,
island as ISLAND
from penguins;
# | SPECIES | ISLAND |
# | ------- | --------- |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | Adelie | Torgersen |
# | ... | ... |
4. 全ての列を選択
- R
- Python
- SQL
構文
df %>%
select(everything())
サンプルコード
penguins %>%
select(everything())
# # A tibble: 344 × 7
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <fct> <fct> <dbl> <dbl> <int> <int> <fct>
# 1 Adelie Torgersen 39.1 18.7 181 3750 male
# 2 Adelie Torgersen 39.5 17.4 186 3800 female
# 3 Adelie Torgersen 40.3 18 195 3250 female
# 4 Adelie Torgersen NA NA NA NA NA
# 5 Adelie Torgersen 36.7 19.3 193 3450 female
# 6 Adelie Torgersen 39.3 20.6 190 3650 male
# 7 Adelie Torgersen 38.9 17.8 181 3625 female
# 8 Adelie Torgersen 39.2 19.6 195 4675 male
# 9 Adelie Torgersen 34.1 18.1 193 3475 NA
# 10 Adelie Torgersen 42 20.2 190 4250 NA
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df
サンプルコード
penguins
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# 0 Adelie Torgersen 39.1 18.7 181 3750 male
# 1 Adelie Torgersen 39.5 17.4 186 3800 female
# 2 Adelie Torgersen 40.3 18.0 195 3250 female
# 3 Adelie Torgersen NaN NaN NaN NaN NaN
# 4 Adelie Torgersen 36.7 19.3 193 3450 female
# .. ... ... ... ... ... ... ...
# 339 Chinstrap Dream 55.8 19.8 207 4000 male
# 340 Chinstrap Dream 43.5 18.1 202 3400 female
# 341 Chinstrap Dream 49.6 18.2 193 3775 male
# 342 Chinstrap Dream 50.8 19.0 210 4100 male
# 343 Chinstrap Dream 50.2 18.7 198 3775 female
構文
select * from tbl;
サンプルコード
select * from penguins;
# | species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
# | ------- | --------- | -------------- | ------------- | ----------------- | ----------- | ------ |
# | Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male |
# | Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 | female |
# | Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 | female |
# | Adelie | Torgersen | null | null | null | null | null |
# | Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 | female |
# | Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 | male |
# | Adelie | Torgersen | 38.9 | 17.8 | 181 | 3625 | female |
# | Adelie | Torgersen | 39.2 | 19.6 | 195 | 4675 | male |
# | Adelie | Torgersen | 34.1 | 18.1 | 193 | 3475 | null |
# | Adelie | Torgersen | 42.0 | 20.2 | 190 | 4250 | null |
5. 指定した列以外を選択
- R
- Python
- SQL
構文
df %>%
select(!col1, !col2)
サンプルコード
penguins %>%
select(!species, !island)
# # A tibble: 344 × 5
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <dbl> <dbl> <int> <int> <fct>
# 1 39.1 18.7 181 3750 male
# 2 39.5 17.4 186 3800 female
# 3 40.3 18 195 3250 female
# 4 NA NA NA NA NA
# 5 36.7 19.3 193 3450 female
# 6 39.3 20.6 190 3650 male
# 7 38.9 17.8 181 3625 female
# 8 39.2 19.6 195 4675 male
# 9 34.1 18.1 193 3475 NA
# 10 42 20.2 190 4250 NA
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df.drop(columns=["col1", "col2"])
サンプルコード
penguins.drop(columns=["species", "island"])
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# 0 39.1 18.7 181 3750 male
# 1 39.5 17.4 186 3800 female
# 2 40.3 18.0 195 3250 female
# 3 NaN NaN NaN NaN NaN
# 4 36.7 19.3 193 3450 female
# .. ... ... ... ... ...
# 339 55.8 19.8 207 4000 male
# 340 43.5 18.1 202 3400 female
# 341 49.6 18.2 193 3775 male
# 342 50.8 19.0 210 4100 male
# 343 50.2 18.7 198 3775 female
構文
select * except(col1, col2) from tbl;
サンプルコード
select * except(species, island) from penguins;
# | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
# | -------------- | ------------- | ----------------- | ----------- | ------ |
# | 39.1 | 18.7 | 181 | 3750 | male |
# | 39.5 | 17.4 | 186 | 3800 | female |
# | 40.3 | 18.0 | 195 | 3250 | female |
# | null | null | null | null | null |
# | 36.7 | 19.3 | 193 | 3450 | female |
# | 39.3 | 20.6 | 190 | 3650 | male |
# | 38.9 | 17.8 | 181 | 3625 | female |
# | 39.2 | 19.6 | 195 | 4675 | male |
# | 34.1 | 18.1 | 193 | 3475 | null |
# | 42.0 | 20.2 | 190 | 4250 | null |
6. データ型を指定して選択
- R
- Python
- SQL
構文
df %>%
select(where(is.dtype))
サンプルコード
penguins %>%
select(where(is.numeric))
# # A tibble: 344 × 4
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
# <dbl> <dbl> <int> <int>
# 1 39.1 18.7 181 3750
# 2 39.5 17.4 186 3800
# 3 40.3 18 195 3250
# 4 NA NA NA NA
# 5 36.7 19.3 193 3450
# 6 39.3 20.6 190 3650
# 7 38.9 17.8 181 3625
# 8 39.2 19.6 195 4675
# 9 34.1 18.1 193 3475
# 10 42 20.2 190 4250
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df.select_dtypes(include=["dtype"])
サンプルコード
penguins.select_dtypes(include=["number"])
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
# 0 39.1 18.7 181 3750
# 1 39.5 17.4 186 3800
# 2 40.3 18.0 195 3250
# 3 NaN NaN NaN NaN
# 4 36.7 19.3 193 3450
# .. ... ... ... ...
# 339 55.8 19.8 207 4000
# 340 43.5 18.1 202 3400
# 341 49.6 18.2 193 3775
# 342 50.8 19.0 210 4100
# 343 50.2 18.7 198 3775
# [344 rows x 4 columns]
SQL では実装できない。
7. 欠損値が含まれる列を選択
- R
- Python
- SQL
構文
df %>%
select(where(~ any(is.na(.x))))
サンプルコード
penguins %>%
select(where(~ any(is.na(.x))))
# # A tibble: 344 × 5
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# <dbl> <dbl> <int> <int> <fct>
# 1 39.1 18.7 181 3750 male
# 2 39.5 17.4 186 3800 female
# 3 40.3 18 195 3250 female
# 4 NA NA NA NA NA
# 5 36.7 19.3 193 3450 female
# 6 39.3 20.6 190 3650 male
# 7 38.9 17.8 181 3625 female
# 8 39.2 19.6 195 4675 male
# 9 34.1 18.1 193 3475 NA
# 10 42 20.2 190 4250 NA
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df.loc[:, df.isna().any()]
サンプルコード
penguins.loc[:, penguins.isna().any()]
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# 0 39.1 18.7 181 3750 male
# 1 39.5 17.4 186 3800 female
# 2 40.3 18.0 195 3250 female
# 3 NaN NaN NaN NaN NaN
# 4 36.7 19.3 193 3450 female
# .. ... ... ... ... ...
# 339 55.8 19.8 207 4000 male
# 340 43.5 18.1 202 3400 female
# 341 49.6 18.2 193 3775 male
# 342 50.8 19.0 210 4100 male
# 343 50.2 18.7 198 3775 female
# [344 rows x 5 columns]
SQL では実装できない。
8. パターンに一致する列を選択
- R
- Python
- SQL
構文
df %>%
select(matches(r"(pattern)"))
サンプルコード
penguins %>%
select(matches(r"(_mm$)"))
# # A tibble: 344 × 3
# bill_length_mm bill_depth_mm flipper_length_mm
# <dbl> <dbl> <int>
# 1 39.1 18.7 181
# 2 39.5 17.4 186
# 3 40.3 18 195
# 4 NA NA NA
# 5 36.7 19.3 193
# 6 39.3 20.6 190
# 7 38.9 17.8 181
# 8 39.2 19.6 195
# 9 34.1 18.1 193
# 10 42 20.2 190
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
penguins.filter(regex=r"pattern")
サンプルコード
penguins.filter(regex=r"_mm$")
# bill_length_mm bill_depth_mm flipper_length_mm
# 0 39.1 18.7 181
# 1 39.5 17.4 186
# 2 40.3 18.0 195
# 3 NaN NaN NaN
# 4 36.7 19.3 193
# .. ... ... ...
# 339 55.8 19.8 207
# 340 43.5 18.1 202
# 341 49.6 18.2 193
# 342 50.8 19.0 210
# 343 50.2 18.7 198
# [344 rows x 3 columns]
SQL では実装できない。
9. 最後の列を選択
- R
- Python
- SQL
構文
df %>%
select(last_col())
サンプルコード
penguins %>%
select(last_col())
# # A tibble: 344 × 1
# sex
# <fct>
# 1 male
# 2 female
# 3 female
# 4 NA
# 5 female
# 6 male
# 7 female
# 8 male
# 9 NA
# 10 NA
# # ℹ 334 more rows
# # ℹ Use `print(n = ...)` to see more rows
構文
df.iloc[:, [-1]]
サンプルコード
penguins.iloc[:, [-1]]
# sex
# 0 male
# 1 female
# 2 female
# 3 NaN
# 4 female
# .. ...
# 339 male
# 340 female
# 341 male
# 342 male
# 343 female
# [344 rows x 1 columns]
SQL では実装できない。