Pubg Mobile提出的问题 -coding

Pubg Mobile

Asked: 2025-03-22 15:34:01 +0800 CST

快速将多页 PDF 文件转换为 PNG

6

我有一个文件夹，里面有600 个 PDF 文件，每个 PDF 有20 页。我需要尽快将每页转换为高质量的 PNG 。

我为此任务编写了以下脚本：

import os
import multiprocessing
import fitz  # PyMuPDF
from PIL import Image

def process_pdf(pdf_path, output_folder):
    try:
        pdf_name = os.path.splitext(os.path.basename(pdf_path))[0]
        pdf_output_folder = os.path.join(output_folder, pdf_name)
        os.makedirs(pdf_output_folder, exist_ok=True)

        doc = fitz.open(pdf_path)

        for i, page in enumerate(doc):
            pix = page.get_pixmap(dpi=850)  # Render page at high DPI
            img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
            
            img_path = os.path.join(pdf_output_folder, f"page_{i+1}.png")
            img.save(img_path, "PNG")

        print(f"Processed: {pdf_path}")
    except Exception as e:
        print(f"Error processing {pdf_path}: {e}")

def main():
    input_folder = r"E:\Desktop\New folder (5)\New folder (4)"
    output_folder = r"E:\Desktop\New folder (5)\New folder (5)"

    pdf_files = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.lower().endswith(".pdf")]

    with multiprocessing.Pool(processes=multiprocessing.cpu_count()) as pool:
        pool.starmap(process_pdf, [(pdf, output_folder) for pdf in pdf_files])

    print("All PDFs processed successfully!")

if __name__ == "__main__":
    main()

问题：

这个脚本太慢了，特别是在处理大量 PDF 时。我尝试了以下优化，但速度并没有显著提高：

稍微降低 DPI – 从1200 DPI降低到850 DPI。（我也测试了 600-800 DPI。）
启用 – 减少内存使用量alpha=False 。 get_pixmap()
用来 ThreadPoolExecutor 代替 multiprocessing.Pool– 没有重大改进。
减少 PNG 压缩optimize=False–保存图像时设置。
将图像转换为灰度- 有点帮助，但我的任务需要彩色图像。

我考虑过的可能的解决方案：

并行处理页面而不是文件——不是一次处理一个文件，而是并行处理每个页面以充分利用 CPU 核心。
使用 ProcessPoolExecutor 而不是 ThreadPoolExecutor– 由于渲染是CPU 密集型的，因此多处理应该更好。
使用 JPEG 而不是 PNG – JPEG保存速度更快，占用存储空间更少，但我需要高质量的图像。
将 DPI 降低至 500-600 – 在速度和质量之间实现平衡。
批量写入文件而不是逐个保存- 减少 I/O 开销。

我需要帮助：

如何在保持高图像质量的同时显著加快PDF 到 PNG 的转换速度？
是否有更好的库或技术我应该使用？
有没有办法可以充分利用 CPU 内核？

任何建议都将不胜感激！

Pubg Mobile

Asked: 2024-09-10 13:57:21 +0800 CST

仅检测图像中最左侧的框

5

我有一张包含手机品牌名称的 JPG 图像：

现在我想通过 python 脚本检测每个单词的第一个字符，
为此我编写了以下 python 脚本：

import cv2
import numpy as np
from tkinter import Tk, Canvas, Frame, Scrollbar, BOTH, VERTICAL, HORIZONTAL
from PIL import Image, ImageTk

# Function to draw rectangles around shapes and display using Tkinter
def draw_rectangles(image_path):
    # Create a Tkinter window to display the image
    root = Tk()
    root.title("Image with Left-Most Rectangles Only")

    # Load the image
    image = cv2.imread(image_path)
    # Convert the image to grayscale
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Apply adaptive thresholding to get better separation of text
    thresh = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
    )

    # Find contours in the binary image
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # Dictionary to store contours grouped by Y-coordinate ranges
    contours_by_y = {}

    # Sort contours by X-coordinate to ensure we pick the left-most character first
    sorted_contours = sorted(contours, key=lambda c: cv2.boundingRect(c)[0])

    # Group contours by their Y coordinate to keep only the left-most rectangle per Y range
    for contour in sorted_contours:
        x, y, w, h = cv2.boundingRect(contour)
        if w > 15 and h > 15:  # Adjust the size filter to remove small artifacts
            aspect_ratio = w / float(h)
            # Ensure the aspect ratio is within the typical range of letters
            if 0.2 < aspect_ratio < 5:
                y_range = y // 20  # Group by a smaller Y coordinate range for better separation

                # Check if the current rectangle is more left-most in X within its Y range
                if y_range not in contours_by_y:
                    contours_by_y[y_range] = (x, y, w, h)  # Store the first contour found in this range
                else:
                    # Compare and keep the left-most (smallest X) rectangle
                    current_x, _, _, _ = contours_by_y[y_range]
                    # Check distance between new contour and the existing one to avoid close detection
                    if x < current_x and (x - current_x) > 20:  # Distance threshold to filter out close contours
                        contours_by_y[y_range] = (x, y, w, h)

    # Draw only the left-most rectangles
    for (x, y, w, h) in contours_by_y.values():
        cv2.rectangle(image, (x, y), (x + w, y + h), (0, 0, 255), 2)  # Red color in BGR

    # Convert the image to RGB (OpenCV uses BGR by default)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # Convert the image to a format Tkinter can use
    image_pil = Image.fromarray(image_rgb)
    image_tk = ImageTk.PhotoImage(image_pil)

    # Create a frame for the Canvas and scrollbars
    frame = Frame(root)
    frame.pack(fill=BOTH, expand=True)

    # Create a Canvas widget to display the image
    canvas = Canvas(frame, width=image_tk.width(), height=image_tk.height())
    canvas.pack(side="left", fill="both", expand=True)

    # Add scrollbars to the Canvas
    v_scrollbar = Scrollbar(frame, orient=VERTICAL, command=canvas.yview)
    v_scrollbar.pack(side="right", fill="y")

    h_scrollbar = Scrollbar(frame, orient=HORIZONTAL, command=canvas.xview)
    h_scrollbar.pack(side="bottom", fill="x")

    canvas.configure(yscrollcommand=v_scrollbar.set, xscrollcommand=h_scrollbar.set)
    canvas.create_image(0, 0, anchor="nw", image=image_tk)
    canvas.config(scrollregion=canvas.bbox("all"))

    # Keep a reference to the image to prevent garbage collection
    canvas.image = image_tk

    root.mainloop()

# Path to your image
image_path = r"E:\Desktop\mobile_brands\ORG_027081-Recovered.jpg"

# Call the function
draw_rectangles(image_path)

但我不知道为什么它效果不好。这个脚本的准确率是 90%。例如在上图中，它检测到“Samsung”中的“a”字符

我的脚本问题在哪里？
我该如何解决这个问题？
也许通过 Y 和 X 坐标无法检测图像中最左边的框。
请注意，我不想使用 OCR

Pubg Mobile

Asked: 2024-07-26 20:49:28 +0800 CST

在 Notepad ++ 中为两个正则表达式模式之间的行添加书签，但不包含模式本身[重复]

4

我有一个列表，这里有一个示例片段：

Newii
27,807,147
Supd
26,518,465
Ns.
26,175,538
Mai
24,930,812
Gas
0623,901,055
TEim
20,213,631
Tes
GrV
18,968,412
Mytyttyst
y
htththt
hyhyh
October 2013
/////////////////////////

我想将和之间的行添加到书签中18,968,412，但October 2013不包含这些行本身。以下正则表达式非常适合匹配这些行：

^\d+(?:,\d+)*$(?=(?:\R(?!\d+(?:,\d+)*$).*)*\R/{3,}$)[\s\S]+?^\h*\S.*(?=\R+/{24})

此正则表达式将放在和[\s\S]+?之间。然而，问题在于它还将模式行本身添加到书签中。^\d+(?:,\d+)*$(?=(?:\R(?!\d+(?:,\d+)*$).*)*\R/{3,}$)^\h*\S.*(?=\R+/{24})

应用“书签”后的输出如下：

18,968,412
Mytyttyst
y
htththt
hyhyh
October 2013

我只想为两种模式之间的行添加书签。例如，在上面的列表中，应该添加书签的行是：

Mytyttyst
y
htththt
hyhyh

有人能帮我修改正则表达式，以便它只为模式之间的行添加书签而不包括模式行本身吗？

请注意，我尝试遵循正则表达式，但它们也不起作用！

(?<=^\d+(?:,\d+)*$\R)[^\R]*(\R(?!^\d+(?:,\d+)*$|\h*\S.*(?=\R/{24}))[^R]*)*(?=\R^\h*\S.*(?=\R/{24}))
(?<=^\d+(?:,\d+)*$(?=(?:\R(?!\d+(?:,\d+)*$).*)*\R/{3,}$)\R)([\s\S]*?)(?=\R^\h*\S.*(?=\R+/{24}))
(?<=^\d+(?:,\d+)*$(?=(?:\R(?!\d+(?:,\d+)*$).*)*\R/{3,}$)\R)[\s\S]*?(?=\R^\h*\S.*(?=\R+/{24}))
(?<=^\d+(?:,\d+)*$(?=(?:\R(?!\d+(?:,\d+)*$).*)*\R/{3,}$))[\s\S]*?(?=^\h*\S.*(?=\R+/{24}))

Pubg Mobile

Asked: 2024-06-28 14:43:53 +0800 CST

如何在 Notepad++ 中为普通和十进制百分比数字添加书签？

5

我正在尝试在 Notepad++ 中为包含百分比数字的行添加书签。具体来说，我想为整数百分比（如 9%）和小数百分比（如 4.5%）添加书签。

例如我有以下列表：

VitrtertWW
44.98%
Liertertde
32.52%
Ltettth
Ltertrth9%
Mhrhrththw
4.5%
1992Q2
/////////////////////////

我想将所有百分比数字移动到下一行。
以下正则表达式运行良好：

Find: \d+\.\d+%
Replace: \n$0

但是我的正则表达式有问题。它只是将小数百分比数字移动到下一行，而普通百分比数字不会移动到下一行。
如何解决这个问题？

我也尝试过遵循正则表达式但没有效果：

(?<!\d)\d%(?!\d)
(|\s)\d%(\s|$)

Pubg Mobile

Asked: 2024-01-02 18:49:29 +0800 CST

为具有相同特定值的连续行添加书签

5

我有一个如下列表：

ABC
GFGFG
/////////////////////////
ggtrgrh
htrhrth
nbtnyumjyumu
myuuykukyyuk
/////////////////////////
/////////////////////////
AAAAA
AAAAAAAAAAA
RET5t4yy
HTH^565y56y
/////////////////////////
tertet
/////////////////////////
/////////////////////////
/////////////////////////

现在我想为/////////////////////////
仅在上面列表中具有此含义的连续行添加书签，并且在和three last lines之间的行必须通过正则表达式进行选择。我尝试遵循正则表达式但不适合我：myuuykukyyukAAAAA

(?:.*/{25,}\r?\n?){3}
^/*/{25,}$

我的正则表达式问题在哪里？

我只想匹配仅包含 25 个或更多斜杠的两个或更多行

Pubg Mobile

Asked: 2024-01-02 13:44:46 +0800 CST

如何为正则表达式匹配之前和之后的行添加书签，直到特定行？

6

我对这个帖子标题感到抱歉，但我没有找到更好的标题来解决这个问题。
我有一个如下列表：

/////////////////////////
Mitnhnhnksmuion
2,687,064
Etyjyjes
1,897,331
Pihjloyd
1,466,137
Edddlnnnnney
1,297,624
Thjtyjkujkes
1,241,307
Fnnhhnac
1,159,710
AfdBhhhghghBA
1,113,062
Elnhhyhjkukjhn
1,023,500
Bggggggel
1,009,075
Letjyjnhhtrh
991,284
Bahtyjtjyjd
849,265
1980Q4
/////////////////////////
Eayes
4,228,223
Elhyjtyjey
1,456,729
1,412,750
Lein
243
184
AA
1,129
672
Elejntyj345hn
002,570
Neerthty34ond
916
78
Biwertetoel
910,353
Qen
874,812
Bs
877,293
Pyd
850,146
1978Q1
/////////////////////////
Mteichrtertson
2,747,969
Eatertglertees
1,885,332
Pirtertd
1,490,156
Elverts
1,295,789
TtrrheBerteaerttles
1,239,194
Fleterteter
1,156,907
ABB
1,117,183
E
1,027,583
Bi
1,010,372
LedZ
987,821
Barb
850,687
1980Q4
/////////////////////////

以下正则表达式为上面的一些列表行添加了书签：

(?:^|\R)\K\d+(?:,\d+)*\R\d+(?:,\d+)*(?=\R)

使用此正则表达式添加书签的行：

但我不喜欢这个！我想为我的正则表达式匹配部分添加书签。
例如，我想在我的列表中为以下部分添加书签：

/////////////////////////
Eayes
4,228,223
Elhyjtyjey
1,456,729
1,412,750
Lein
243
184
AA
1,129
672
Elejntyj345hn
002,570
Neerthty34ond
916
78
Biwertetoel
910,353
Qen
874,812
Bs
877,293
Pyd
850,146
1978Q1
/////////////////////////

我尝试遵循正则表达式但不适合我：

^/////////////////////////\R((?:(?!^/////////////////////////).)*)\R\d+(?:,\d+)*\R\d+(?:,\d+)*(?=\R)
^(?:(?!^/////////////////////////).)*\R\d+(?:,\d+)*\R\d+(?:,\d+)*(?=\R/////////////////////////)

如何在记事本++中通过正则表达式来做到这一点？
换句话说，正则表达式必须为我的正则表达式匹配之前的所有行添加书签，直到最后一行包含/////////////////////////，以及我的正则表达式匹配之后的所有行直到第一行包含/////////////////////////

Pubg Mobile

Asked: 2023-11-10 23:48:30 +0800 CST

如何反转记事本++中的正则表达式区域？

5

我有以下清单：

  <th class="News">14</th>
  <td class="News"><a href="pclinuxos">PCLinuxOS</a></td>
  <td class="News" style="text-align: right" title="Yesterday: 341">341<img src="/web/20050131094820im_/http://distrowatch.com/images/other/alevel.png" alt="=" title="Yesterday: 341"></td>
</tr>
<tr>
  <th class="News">15</th>
  <td class="News"><a href="redhat">Red Hat</a></td>
  <td class="News" style="text-align: right" title="Yesterday: 290">289<img src="/web/20050131094820im_/http://distrowatch.com/images/other/adown.png" alt=">" title="Yesterday: 290"></td>
</tr>
<tr>
  <th class="News">16</th>
  <td class="News"><a href="slax">SLAX</a></td>
  <td class="News" style="text-align: right" title="Yesterday: 274">275<img src="/web/20050131094820im_/http://distrowatch.com/images/other/aup.png" alt="<" title="Yesterday: 274"></td>
</tr>
<tr>
  <th class="News">17</th>
  <td class="News"><a href="vine">Vine</a></td>
  <td class="News" style="text-align: right" title="Yesterday: 269">261<img src="/web/20050131094820im_/http://distrowatch.com/images/other/adown.png" alt=">" title="Yesterday: 269"></td>
</tr>
<tr>

我可以通过以下正则表达式选择目标行：

(.*)\R.+\.png" alt\b

现在我想使用正则表达式反转我的目标行。
我使用^(?!.*(.*)\R.+\.png" alt\b).+\R正则表达式来反转它，但失败了并得到以下结果！

为什么我的正则表达式只反转其中一行？问题出在哪里？

Pubg Mobile

Asked: 2023-09-07 00:26:24 +0800 CST

在正则表达式行中仅保留数字

6

我有一个列表，以下是我的列表的示例：

Bolt

®1421918Users

Classmates

666138Users

SixDegrees

470621Users$$

PlanetAll

AT308079Users

theGlobe

214442Users

1997

现在我只想保留number Users行例如：Users

Bolt

1421918Users

Classmates

666138Users

SixDegrees

470621Users

PlanetAll

308079Users

theGlobe

214442Users

1997

我尝试在记事本++中遵循正则表达式，但不起作用：

Find What = ^.*?(\d+Users)
Replace = \1

并尝试以下正则表达式：

Find What = ^(?!\d+Users).*\r?\n?
Replace = [Empty]

如何解决这个正则表达式问题？

Pubg Mobile

Asked: 2023-09-06 00:44:35 +0800 CST

将最后一行移动到第一行

1

我的目录中有大量 txt 文件E:\Desktop\Social_media\edit8\New folder (2)，每个文件的排列如下：

Bolt;539,110
Classmates;263,454
PlanetAll;126,907
theGlobe;73,063
SixDegrees;64,065
JANUARY 1997

现在我想将最后一行移到第一行，如下所示：

JANUARY 1997
Bolt;539,110
Classmates;263,454
PlanetAll;126,907
theGlobe;73,063
SixDegrees;64,065

我为此编写了以下 python 脚本：

import os

directory = r'E:\Desktop\Social_media\edit8\New folder (2)'  # Replace with the directory path containing your text files

# Get a list of all text files in the directory
files = [file for file in os.listdir(directory) if file.endswith('.txt')]

# Process each file
for file in files:
    file_path = os.path.join(directory, file)
    
    # Read the file content
    with open(file_path, 'r') as f:
        lines = f.readlines()
    
    # Extract the last line and strip the newline character
    last_line = lines.pop().strip()
    
    # Insert the last line at the beginning
    lines.insert(0, last_line)
    
    # Write the modified content back to the file
    with open(file_path, 'w') as f:
        f.writelines(lines)

我的脚本运行良好，但我不知道为什么它将最后一行移动到第一行的第一行，如下所示：

JANUARY 1997Bolt;539,110
Classmates;263,454
PlanetAll;126,907
theGlobe;73,063
SixDegrees;64,065

我的脚本问题出在哪里？以及如何解决它？

Pubg Mobile

Asked: 2023-09-06 00:09:57 +0800 CST

按特定排列合并每个文本文件的最后 3 行

6

我的目录中有大量 txt 文件E:\Desktop\Social_media\edit8\New folder，每个文件的排列类似于以下内容：

Bolt
2,739,393
Classmates
1,267,092
SixDegrees
1,077,353
PlanetAll
552,488
theGlobe
437,847
OpenDiary
9,251
1998
MARCH
034+

现在我想合并每个 txt 文件的最后 3 行，如下所示：

Bolt
2,739,393
Classmates
1,267,092
SixDegrees
1,077,353
PlanetAll
552,488
theGlobe
437,847
OpenDiary
9,251
034+ MARCH 1998

这意味着最后 3 行必须有这样的排列number+ month year

我为此编写了以下 python 脚本，但我不知道为什么不起作用：

import os

# Define the directory where your text files are located
directory_path = r'E:\Desktop\Social_media\edit8\New folder'

# Function to rearrange the lines and write to a new file
def rearrange_lines(file_path):
    with open(file_path, 'r') as file:
        lines = [line.strip() for line in file.readlines() if line.strip()]  # Read non-empty lines

    # Check if there are at least 3 non-empty lines
    if len(lines) >= 3:
        lines[-1], lines[-2], lines[-3] = lines[-3], lines[-2], lines[-1]  # Rearrange the last 3 lines

        # Create a new file with the rearranged lines
        with open(file_path, 'w') as file:
            file.write('\n'.join(lines))

# Iterate through each file in the directory
for root, dirs, files in os.walk(directory_path):
    for file_name in files:
        if file_name.endswith('.txt'):
            file_path = os.path.join(root, file_name)
            rearrange_lines(file_path)
            print(f'Rearranged lines in {file_name}')

print('Done!')

我的脚本问题出在哪里？以及如何解决这个问题？

Pubg Mobile

Asked: 2023-09-04 01:27:14 +0800 CST

为以正则表达式开头和结尾的连续非空行添加书签

7

在下面的列表中，我想选择以以下内容开头和结尾的连续非空行^(?!.*\+\s*$).*?(?<!\d)(?<!\d,)(\d{1,3}(?:,\d{3})*)(?!,?\d).*

1,754,085

Bolt

817,653

classmates

cm

623,592

SixDegrees

PlanetAll

361,908

274,553

274,493

1997

SEPTEMBER

021+

在上面的列表中，我只想选择以下行：

我怎样才能通过记事本++中的正则表达式来做到这一点？

Pubg Mobile

Asked: 2023-08-18 03:28:20 +0800 CST

正则表达式后 3 个字符后换行

5

我有以下清单：

Intel(USA)
Pfizer(USA)6
GeneralElectric(USA)43
Alphabet(Google)(USA)

我可以(通过正则表达式在每一行中选择最新的^.*\K\((?=[^(]*$)。
现在我想在正则表达式后的 3 个字符后换行。
例如我得到以下结果：

Intel(USA
)  
Pfizer(USA
)6  
GeneralElectric(USA
)43  
Alphabet(Google)(USA
)

如何通过正则表达式执行此操作以及我必须对我的正则表达式应用哪些更改？
请注意，由于某种原因，我必须为最新创建一个正则表达式(，并且不能使用最新的正则表达式) ，并注意为 Notepad++ 提供正则表达式

快速将多页 PDF 文件转换为 PNG

仅检测图像中最左侧的框

在 Notepad ++ 中为两个正则表达式模式之间的行添加书签，但不包含模式本身[重复]

如何在 Notepad++ 中为普通和十进制百分比数字添加书签？

为具有相同特定值的连续行添加书签

如何为正则表达式匹配之前和之后的行添加书签，直到特定行？

如何反转记事本++中的正则表达式区域？

在正则表达式行中仅保留数字

将最后一行移动到第一行

按特定排列合并每个文本文件的最后 3 行

为以正则表达式开头和结尾的连续非空行添加书签

正则表达式后 3 个字符后换行

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

Pubg Mobile's questions