Adicionar número de série para atividade de cópia ao blob

Question

Chris Ruehlemann

Asked: 2024-05-04 01:45:47 +0800 CST2024-05-04 01:45:47 +0800 CST 2024-05-04 01:45:47 +0800 CST

Insira uma nova linha indicando o intervalo de tempo entre as linhas

772

Estou trabalhando com transcrições de fala:

  Utterance                       Starttime_ms Endtime_ms
  <chr>                                  <dbl>      <dbl>
1 on this                                  210        780
2 okay                                    3403       3728
3 cool thanks everyone um                 4221       5880
4 so yes in terms of our projects         5910      11960
5 let's have a look so the               11980      13740
6 LGBTQ plus                             13813      16110

e gostaria de inserir após cada Utteranceuma nova linha indicando o intervalo de tempo em relação ao anterior Utterance. A saída desejada seria mais ou menos assim:

  Utterance                       Starttime_ms Endtime_ms
  <chr>                                  <dbl>      <dbl>
1 on this                                  210        780
  NA                                       780       3403
2 okay                                    3403       3728
  NA                                      3728       4221
3 cool thanks everyone um                 4221       5880
  NA                                      5880       5910
4 so yes in terms of our projects         5910      11960
  NA                                     11960      11980
5 let's have a look so the               11980      13740
  NA                                     13740      13813
6 LGBTQ plus                             13813      16110

Eu sei como fazer isso em data.table:

library(data.table)
unq <- c(0, sort(unique(setDT(df)[, c(Starttime_ms, Endtime_ms)])))
df <- df[.(unq[-length(unq)], unq[-1]), on=c("Starttime_ms", "Endtime_ms")]

Mas estou procurando uma dplyrsolução.

Dados:

df <-   structure(list(Utterance = c("on this", "okay", "cool thanks everyone um", 
                                     "so yes in terms of our projects", 
                                     "let's have a look so the", "LGBTQ plus"), Starttime_ms = c(210, 
                                                                                                 3403, 4221, 5910, 11980, 13813), Endtime_ms = c(780, 3728, 5880, 
                                                                                                                                                 11960, 13740, 16110)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                          "tbl", "data.frame"))

5 respostas

Voted

LMc · Answer 1 · 2024-05-04T02:28:17+08:00

library(dplyr)

df |>
  mutate(Utterance = NA, 
         local(data.frame(Starttime_ms = lag(Endtime_ms), Endtime_ms = Starttime_ms))) |>
  filter(!is.na(Starttime_ms)) |>
  bind_rows(df) |>
  arrange(Starttime_ms)

Eu uso local()aqui para criar um ambiente de execução local Starttime_mse Endtime_mssubstituir um ao outro se você fizer isso:

 mutate(Utterance = NA, 
         Starttime_ms = lag(Endtime_ms), 
         Endtime_ms = Starttime_ms)

e em vez de gerar um único valor, retorno um quadro de dados que aproveita o fato de que as mutate()reticências ...podem usar um quadro de dados ou tibble para criar várias colunas na saída.

Saída

   Utterance                       Starttime_ms Endtime_ms
   <chr>                                  <dbl>      <dbl>
 1 on this                                  210        780
 2 NA                                       780       3403
 3 okay                                    3403       3728
 4 NA                                      3728       4221
 5 cool thanks everyone um                 4221       5880
 6 NA                                      5880       5910
 7 so yes in terms of our projects         5910      11960
 8 NA                                     11960      11980
 9 let's have a look so the               11980      13740
10 NA                                     13740      13813
11 LGBTQ plus                             13813      16110

Adriano Mello · Answer 2 · 2024-05-04T02:33:36+08:00

Não é elegante em particular, mas é dplyrum:

my_df %>% 
  pivot_longer(ends_with("ms"), values_to = "Starttime_ms") %>% 
  mutate(
    Endtime_ms = dplyr::lead(Starttime_ms, default = NA),
    Utterance = if_else(row_number() %% 2 == 0, NA_character_, Utterance)) %>% 
  slice_head(n = -1)

Saída:

# A tibble: 11 × 4
   Utterance                       name         Starttime_ms Endtime_ms
   <chr>                           <chr>               <dbl>      <dbl>
 1 on this                         Starttime_ms          210        780
 2 NA                              Endtime_ms            780       3403
 3 okay                            Starttime_ms         3403       3728
 4 NA                              Endtime_ms           3728       4221
 5 cool thanks everyone um         Starttime_ms         4221       5880
 6 NA                              Endtime_ms           5880       5910
 7 so yes in terms of our projects Starttime_ms         5910      11960
 8 NA                              Endtime_ms          11960      11980
 9 let's have a look so the        Starttime_ms        11980      13740
10 NA                              Endtime_ms          13740      13813
11 LGBTQ plus                      Starttime_ms        13813      16110

Andre Wildberg · Answer 3 · 2024-05-04T02:44:00+08:00

Uma abordagem usandouncount

library(dplyr)
library(tidyr)

df %>% 
  mutate(count = 2) %>% 
  uncount(count) %>% 
  mutate(Starttime_ms = if_else(n() == row_number(), lag(Endtime_ms), Starttime_ms), 
         Utterance = if_else(n() == row_number(), NA, Utterance), .by = Utterance) %>% 
  mutate(Endtime_ms = lead(Starttime_ms)) %>% 
  filter(!is.na(Endtime_ms))
# A tibble: 11 × 3
   Utterance                       Starttime_ms Endtime_ms
   <chr>                                  <dbl>      <dbl>
 1 on this                                  210        780
 2 NA                                       780       3403
 3 okay                                    3403       3728
 4 NA                                      3728       4221
 5 cool thanks everyone um                 4221       5880
 6 NA                                      5880       5910
 7 so yes in terms of our projects         5910      11960
 8 NA                                     11960      11980
 9 let's have a look so the               11980      13740
10 NA                                     13740      13813
11 LGBTQ plus                             13813      16110

ThomasIsCoding · Answer 4 · 2024-05-04T04:27:33+08:00

Você pode tentar o código abaixo

df %>%
   pivot_longer(-Utterance, values_to = "Starttime_ms") %>%
   mutate(Endtime_ms = lead(Starttime_ms)) %>%
   drop_na() %>%
   select(-name) %>%
   mutate(Utterance = replace(Utterance, !row_number() %% 2, NA_character_))

que dá

# A tibble: 11 × 3
   Utterance                       Starttime_ms Endtime_ms
   <chr>                                  <dbl>      <dbl>
 1 on this                                  210        780
 2 NA                                       780       3403
 3 okay                                    3403       3728
 4 NA                                      3728       4221
 5 cool thanks everyone um                 4221       5880
 6 NA                                      5880       5910
 7 so yes in terms of our projects         5910      11960
 8 NA                                     11960      11980
 9 let's have a look so the               11980      13740
10 NA                                     13740      13813
11 LGBTQ plus                             13813      16110

TarJae · Answer 5 · 2024-05-04T06:58:27+08:00

Aqui está uma dplyrsolução simples. Observe que a maioria dos elementos que usei também foram mencionados pelos colegas anteriormente!

library(dplyr)

df %>% 
    slice(rep(1:n(), each=2)) %>% 
    mutate(Starttime_ms = ifelse(!row_number() %% 2, Endtime_ms, Starttime_ms),
           Endtime_ms = lead(Starttime_ms), 
    ) %>%
    slice(-n()) %>% 
    mutate(Utterance = ifelse(row_number() %% 2, NA_character_, Utterance))

Algumas referências:

library(dplyr)
library(tidyr)
library(microbenchmark)

thomas <- function() {
  df |>
    pivot_longer(-Utterance, values_to = "Starttime_ms") %>%
    mutate(Endtime_ms = lead(Starttime_ms)) %>%
    drop_na() %>%
    select(-name) %>%
    mutate(Utterance = replace(Utterance, !row_number() %% 2, NA_character_))
}

adriano <- function(){
  df %>% 
    pivot_longer(ends_with("ms"), values_to = "Starttime_ms") %>% 
    mutate(
      Endtime_ms = dplyr::lead(Starttime_ms, default = NA),
      Utterance = if_else(row_number() %% 2 == 0, NA_character_, Utterance)) %>% 
    slice_head(n = -1)
}

lmc <- function(){
  df |>
    mutate(Utterance = NA, 
           local(data.frame(Starttime_ms = lag(Endtime_ms), Endtime_ms = Starttime_ms))) |>
    filter(!is.na(Starttime_ms)) |>
    bind_rows(df) |>
    arrange(Starttime_ms)
}

andre <- function(){
  df %>% 
    mutate(count = 2) %>% 
    uncount(count) %>% 
    mutate(Starttime_ms = if_else(n() == row_number(), lag(Endtime_ms), Starttime_ms), 
           Utterance = if_else(n() == row_number(), NA, Utterance), .by = Utterance) %>% 
    mutate(Endtime_ms = lead(Starttime_ms)) %>% 
    filter(!is.na(Endtime_ms))
}

tarjae <- function() {
  df %>% 
    slice(rep(1:n(), each=2)) %>% 
    mutate(Starttime_ms = ifelse(!row_number() %% 2, Endtime_ms, Starttime_ms),
           Endtime_ms = lead(Starttime_ms), 
    ) %>%
    slice(-n()) %>% 
    mutate(Utterance = ifelse(row_number() %% 2, NA_character_, Utterance))
}

# benchmark

mbm = microbenchmark(
  thomas = thomas_operation(),
  adriano = adriano(),
  lmc = lmc(),
  andre = andre(),
  tarjae = tarjae(),
  times = 100
)
autoplot(mbm)
print(mbm)

Insira uma nova linha indicando o intervalo de tempo entre as linhas

Vue 3: Erro na criação "Identificador esperado, mas encontrado 'import'" [duplicado]

Por que esse código Java simples e pequeno roda 30x mais rápido em todas as JVMs Graal, mas não em nenhuma JVM Oracle?

Qual é o propósito de `enum class` com um tipo subjacente especificado, mas sem enumeradores?

Como faço para corrigir um erro MODULE_NOT_FOUND para um módulo que não importei manualmente?

`(expression, lvalue) = rvalue` é uma atribuição válida em C ou C++? Por que alguns compiladores aceitam/rejeitam isso?

Quando devo usar um std::inplace_vector em vez de um std::vector?

Um programa vazio que não faz nada em C++ precisa de um heap de 204 KB, mas não em C

PowerBI atualmente quebrado com BigQuery: problema de driver Simba com atualização do Windows

AdMob: MobileAds.initialize() - "java.lang.Integer não pode ser convertido em java.lang.String" para alguns dispositivos

Estou tentando fazer o jogo pacman usando apenas o módulo Turtle Random e Math

Insira uma nova linha indicando o intervalo de tempo entre as linhas

5 respostas

relate perguntas