我想根据分隔符":"和","将prod_code_date列(长度各异的字符串,从一对到多对)分成多列。
“ :”分隔符将相关信息分成产品代码和购买日期- 可以将其视为成对的信息,而“,”分隔符将同一产品编号( prod_no )的不同信息对分隔开。
预期的来自separate_wider_delim的中间结果 创建的列数应基于列中的分隔符数。列名应为code_1、date_1、code_2、date_2、code_x、date_x等。
最终预期结果(下面的示例数据):一个长表,包含以下列:prod_no、code、date,其中 prod_no 行重复,与 prod_code_date 列中的对数相同。
library(tidyverse)
# Data
df <- tibble(prod_no = 1:4, prod_code_date = c("' ZB10.90 : 2013-04-29'", "' XJ11.90 : 2016-10-20, ZB25.22 : 2013-10-16, ZB25.29 : 2011-12-06, XJ14.20 : 2022-03-23, ZB10.90 : 2022-12-16, ZB10.90 : 2011-12-06, QP50.19 : 2016-03-11, QP12.90 : 2012-01-20, MS44.9 : 2022-03-23'", "' MS34.3 : 2022-10-04, QP13.20 : 1998-05-26, QP50.13 : 2008-10-10, MS44.9 : 2017-05-16'", "' QP10.90 : 2008-08-11, QP11.90 : 2019-04-15'"))
# Attempt (failed) using separate_wider_delim() function. After which I would have pivoted the data to a long format.
intermediate_result <- df %>% separate_wider_delim(prod_code_date, delim = c(":", ","), names = c("code_1", "date_1", "code_2", "date_2", "code_x", "date_x"))
# Expected output: A long table with the following columns: prod_no, code, date, with repeated prod_no rows for as many pairs as there are in the prod_code_date column.
final_result <- tibble(prod_no = c(1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4), code = c("QP10.90", "XJ11.90", "QP25.22", "QP25.29", "XJ14.20", "QP10.90", "QP10.90", "QP50.19", "QP12.90", "MS34.3", "QP13.20", "QP50.13", "MS44.9", "QP10.90", "QP11.90"), date = c("2013-04-29", "2016-10-20", "2013-10-16", "2011-12-06", "2022-03-23", "2022-12-16", "2011-12-06", "2016-03-11", "2012-01-20", "2022-10-04", "1998-05-26", "2008-10-10", "2017-05-16", "2008-08-11", "2019-04-15"))