Adicionar número de série para atividade de cópia ao blob

Question

sjri

Asked: 2023-09-02 00:43:03 +0800 CST2023-09-02 00:43:03 +0800 CST 2023-09-02 00:43:03 +0800 CST

Como clonar linhas com uma condição específica?

772

Eu tenho alguns hiperlinks em um arquivo de texto. Quero comparar o link da primeira linha com a próxima linha adjacente e criar links conforme o número? Por exemplo,

Considere os links adjacentes abaixo

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/4/

Aqui o arquivo de saída será:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/2/
https://gp.to/ab/394/las69-02-09-2020/3/
https://gp.to/ab/394/las69-02-09-2020/4/

Da mesma forma, preciso fazer para outras linhas....

Exemplo de entrada:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/4/
https://gp.to/ab/563/dimp-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/3/
https://gp.to/ab/39443/lis-22-04-2018/
https://gp.to/ab/39443/lis-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/
https://gp.to/ab/39443/madi-22-04-2018/5/

Exemplo de saída:

https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/2/
https://gp.to/ab/394/las69-02-09-2020/3/
https://gp.to/ab/394/las69-02-09-2020/4/
https://gp.to/ab/563/dimp-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/2/
https://gp.to/ab/39443/omegs-02-07-2023/3/
https://gp.to/ab/39443/lis-22-04-2018/
https://gp.to/ab/39443/lis-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/
https://gp.to/ab/39443/madi-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/3/
https://gp.to/ab/39443/madi-22-04-2018/4/
https://gp.to/ab/39443/madi-22-04-2018/5/

Tentei..

# Function to extract the number from a URL
def extract_number(url):
    parts = url.split('/')
    for part in parts[::-1]:
        if part.isdigit():
            return int(part)
    return None

# Read the input file
with open('input.txt', 'r') as input_file:
    lines = input_file.readlines()

output_lines = []

# Iterate through the input lines and generate output lines
for i in range(len(lines)):
    current_url = lines[i].strip()
    output_lines.append(current_url)

    if i + 1 < len(lines):
        next_url = lines[i + 1].strip()
        current_number = extract_number(current_url)
        next_number = extract_number(next_url)

        if current_number is not None and next_number is not None:
            for num in range(current_number + 1, next_number):
                new_url = current_url.rsplit('/', 1)[0] + '/' + str(num) + '/'
                output_lines.append(new_url)

# Write the output to a file
with open('output.txt', 'w') as output_file:
    output_file.writelines(output_lines)

Mas não obtive o resultado desejado.

2 respostas

Voted

TimTeaFan · Answer 1 · 2023-09-02T01:11:59+08:00

Aqui está uma maneira em R de como poderíamos abordar o problema. Criamos uma função extrapolate_linke usamos purrr::accumulatenela. Então nós unlistos resultados.

library(purrr)
library(readr)
library(stringr)

z <- "https://gp.to/ab/394/las69-02-09-2020/
https://gp.to/ab/394/las69-02-09-2020/4/
https://gp.to/ab/563/dimp-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/
https://gp.to/ab/39443/omegs-02-07-2023/3/
https://gp.to/ab/39443/lis-22-04-2018/
https://gp.to/ab/39443/lis-22-04-2018/2/
https://gp.to/ab/39443/madi-22-04-2018/
https://gp.to/ab/39443/madi-22-04-2018/5/"


extrapolate_link <- function(x, y) {
  x_ln <- length(x)
  if (x_ln > 1) {
    x <- x[x_ln]
  }
  res <- sub(pattern = x, replacement = "", x = y) |> 
    readr::parse_number()
  
  if (!is.null(attr(res, "problems"))){
    return(y)
  }
  if (is.numeric(res)) {
    return(paste0(x, rep(seq(4)[-1]), "/"))
  }
  stop("something went wrong.")
}


str_split(z, pattern = "\n") |>
  unlist() |> 
  accumulate(.f = extrapolate_link) |> 
  unlist()

#>  [1] "https://gp.to/ab/394/las69-02-09-2020/"    
#>  [2] "https://gp.to/ab/394/las69-02-09-2020/2/"  
#>  [3] "https://gp.to/ab/394/las69-02-09-2020/3/"  
#>  [4] "https://gp.to/ab/394/las69-02-09-2020/4/"  
#>  [5] "https://gp.to/ab/563/dimp-02-07-2023/"     
#>  [6] "https://gp.to/ab/39443/omegs-02-07-2023/"  
#>  [7] "https://gp.to/ab/39443/omegs-02-07-2023/2/"
#>  [8] "https://gp.to/ab/39443/omegs-02-07-2023/3/"
#>  [9] "https://gp.to/ab/39443/omegs-02-07-2023/4/"
#> [10] "https://gp.to/ab/39443/lis-22-04-2018/"    
#> [11] "https://gp.to/ab/39443/lis-22-04-2018/2/"  
#> [12] "https://gp.to/ab/39443/lis-22-04-2018/3/"  
#> [13] "https://gp.to/ab/39443/lis-22-04-2018/4/"  
#> [14] "https://gp.to/ab/39443/madi-22-04-2018/"   
#> [15] "https://gp.to/ab/39443/madi-22-04-2018/2/" 
#> [16] "https://gp.to/ab/39443/madi-22-04-2018/3/" 
#> [17] "https://gp.to/ab/39443/madi-22-04-2018/4/"

^{Criado em 01/09/2023 com reprex v2.0.2}

score 1 · Answer 2 · 2023-09-02T01:33:21+08:00

Aqui está outra alternativa. Ele assume que a ordem é relevante, portanto, se o mesmo URL (ou base dele, sem o número final) for encontrado com URLs intermediários diferentes, será "novo".

library(dplyr)
tibble(url=vec) %>%
  mutate(
    urlbase = sub("/\\d+/?$", "/", url),
    num = as.integer(sub("/$", "", stringr::str_extract(url, "(?<=/)(\\d+)/?$"))),
    grp = consecutive_id(urlbase)
  ) %>%
  group_by(grp, urlbase) %>%
  mutate(
    num = if (n() > 1) coalesce(num, row_number()) else num
  ) %>%
  reframe(
    num = seq.int(max(coalesce(num, 1L))),
    url = paste0(urlbase, if_else(num == 1L, "", paste0(as.character(num), "/")))
  )
# # A tibble: 15 × 4
#      grp urlbase                                    num url                                       
#    <int> <chr>                                    <int> <chr>                                     
#  1     1 https://gp.to/ab/394/las69-02-09-2020/       1 https://gp.to/ab/394/las69-02-09-2020/    
#  2     1 https://gp.to/ab/394/las69-02-09-2020/       2 https://gp.to/ab/394/las69-02-09-2020/2/  
#  3     1 https://gp.to/ab/394/las69-02-09-2020/       3 https://gp.to/ab/394/las69-02-09-2020/3/  
#  4     1 https://gp.to/ab/394/las69-02-09-2020/       4 https://gp.to/ab/394/las69-02-09-2020/4/  
#  5     2 https://gp.to/ab/563/dimp-02-07-2023/        1 https://gp.to/ab/563/dimp-02-07-2023/     
#  6     3 https://gp.to/ab/39443/omegs-02-07-2023/     1 https://gp.to/ab/39443/omegs-02-07-2023/  
#  7     3 https://gp.to/ab/39443/omegs-02-07-2023/     2 https://gp.to/ab/39443/omegs-02-07-2023/2/
#  8     3 https://gp.to/ab/39443/omegs-02-07-2023/     3 https://gp.to/ab/39443/omegs-02-07-2023/3/
#  9     4 https://gp.to/ab/39443/lis-22-04-2018/       1 https://gp.to/ab/39443/lis-22-04-2018/    
# 10     4 https://gp.to/ab/39443/lis-22-04-2018/       2 https://gp.to/ab/39443/lis-22-04-2018/2/  
# 11     5 https://gp.to/ab/39443/madi-22-04-2018/      1 https://gp.to/ab/39443/madi-22-04-2018/   
# 12     5 https://gp.to/ab/39443/madi-22-04-2018/      2 https://gp.to/ab/39443/madi-22-04-2018/2/ 
# 13     5 https://gp.to/ab/39443/madi-22-04-2018/      3 https://gp.to/ab/39443/madi-22-04-2018/3/ 
# 14     5 https://gp.to/ab/39443/madi-22-04-2018/      4 https://gp.to/ab/39443/madi-22-04-2018/4/ 
# 15     5 https://gp.to/ab/39443/madi-22-04-2018/      5 https://gp.to/ab/39443/madi-22-04-2018/5/

Dados

vec <- c("https://gp.to/ab/394/las69-02-09-2020/", "https://gp.to/ab/394/las69-02-09-2020/4/", "https://gp.to/ab/563/dimp-02-07-2023/", "https://gp.to/ab/39443/omegs-02-07-2023/", "https://gp.to/ab/39443/omegs-02-07-2023/3/", "https://gp.to/ab/39443/lis-22-04-2018/", "https://gp.to/ab/39443/lis-22-04-2018/2/", "https://gp.to/ab/39443/madi-22-04-2018/", "https://gp.to/ab/39443/madi-22-04-2018/5/")

Como clonar linhas com uma condição específica?

destaque o código em HTML usando <font color="#xxx">

Por que a resolução de sobrecarga prefere std::nullptr_t a uma classe ao passar {}?

Você pode usar uma lista de inicialização com chaves como argumento de modelo (padrão)?

Por que as compreensões de lista criam uma função internamente?

Estou tentando fazer o jogo pacman usando apenas o módulo Turtle Random e Math

java.lang.NoSuchMethodError: 'void org.openqa.selenium.remote.http.ClientConfig.<init>(java.net.URI, java.time.Duration, java.time.Duratio

Por que 'char -> int' é promoção, mas 'char -> short' é conversão (mas não promoção)?

Por que o construtor de uma variável global não é chamado em uma biblioteca?

Comportamento inconsistente de std::common_reference_with em tuplas. Qual é correto?

Somente operações bit a bit para std::byte em C++ 17?

Como clonar linhas com uma condição específica?

2 respostas

relate perguntas