关于【selenium-webdriver】的问题- 第1页

Moran Hanane

Asked: 2025-02-07 17:15:56 +0800 CST

我如何抓取此 URL 的“详细信息”部分中的数据：https://gallica.bnf.fr/ark:/12148/cb42768809f/date？

6

大家好，我是网络抓取方面的新手。

我正在尝试将此网页（ https://gallica.bnf.fr/ark:/12148/cb42768809f/date ）的“详细信息”部分中的数据进行网络抓取，以便能够使用其每个字段填充 SQL 数据库。

这是一个测试 URL。我通过该网站的 API 请求了一个包含 500 个类似 URL 的列表。我打算在 Python 函数运行后将其应用于此列表的所有 URL。

有什么建议可以帮助我从这个网页中提取我需要的信息吗？非常感谢！

首先，我尝试使用 beautifulsoup，但问题是只有单击下拉按钮时才会出现“详细信息”部分。

我尝试了几个漂亮的代码片段，比如下面的代码，但是没有起作用：

def get_metadata_bs4(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    try:
        
        title = soup.find("h1").text.strip() if soup.find("h1") else "Titre inconnu"

        
        publisher = soup.select_one("dl dd:nth-of-type(1)").text.strip() if soup.select_one("dl dd:nth-of-type(1)") else "Auteur inconnu"

        
        Date of publication = soup.select_one("dl dd:nth-of-type(2)").text.strip() if soup.select_one("dl dd:nth-of-type(2)") else "Date inconnue"

        return {"title": title, "author": author, "Date of publication": Date of publication}
    
    except Exception as e:
        print(f"Erreur pour {url}: {e}")
        return None

# Tester avec un seul lien
url_test = "https://gallica.bnf.fr/ark:/12148/cb42768809f/date"
print(get_metadata_bs4(url_test))

因此我尝试了 selenium，但这是我第一次使用这个 Python 库...我尝试找到源代码的正确 X-Path，并在以下代码块中用这个 X-path 替换“metadata-class”：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Configuration Selenium
chrome_options = Options()
chrome_options.add_argument("--headless")  # Mode sans interface graphique
driver = webdriver.Chrome(options=chrome_options)

def get_metadata_from_notice(url):
    driver.get(url)
    time.sleep(2)  # Laisser le temps de charger
    
    try:
        # Cliquer sur le dropdown "Informations détaillées"
        dropdown = WebDriverWait(driver, 5).until(
            EC.element_to_be_clickable((By.XPATH, "//div[contains(text(), 'Informations détaillées')]"))
        )
        dropdown.click()
        time.sleep(2)  # Attendre le chargement après le clic
    except Exception as e:
        print(f"⚠️ Erreur lors du clic sur {url} : {e}")
        return None

    try:
        # Extraction des métadonnées après ouverture du dropdown
        metadata_section = driver.find_element(By.XPATH, "//div[@class='metadata-class']")  # À remplacer par la bonne classe
        metadata_text = metadata_section.text
        return {"url": url, "metadata": metadata_text}
    except Exception as e:
        print(f"⚠️ Impossible de récupérer les métadonnées pour {url} : {e}")
        return None

# Test sur une URL
test_url = "https://gallica.bnf.fr/ark:/12148/cb42768809f/date"
print(get_metadata_from_notice(test_url))

# Fermer Selenium
driver.quit()

但它一直给我这样的结果：

⚠️ Impossible de récupérer les métadonnées pour https://gallica.bnf.fr/ark:/12148/cb42768809f/date
⚠️ Erreur sur https://gallica.bnf.fr/ark:/12148/cb452698066/date : Message: 
Stacktrace:
    GetHandleVerifier [0x00007FF7940A02F5+28725]
    (No symbol) [0x00007FF794002AE0]
    (No symbol) [0x00007FF793E9510A]
    (No symbol) [0x00007FF793EE93D2]
    (No symbol) [0x00007FF793EE95FC]
    (No symbol) [0x00007FF793F33407]
    (No symbol) [0x00007FF793F0FFEF]
    (No symbol) [0x00007FF793F30181]
    (No symbol) [0x00007FF793F0FD53]
    (No symbol) [0x00007FF793EDA0E3]
    (No symbol) [0x00007FF793EDB471]
    GetHandleVerifier [0x00007FF7943CF30D+3366989]
    GetHandleVerifier [0x00007FF7943E12F0+3440688]
    GetHandleVerifier [0x00007FF7943D78FD+3401277]
    GetHandleVerifier [0x00007FF79416AAAB+858091]
    (No symbol) [0x00007FF79400E74F]
    (No symbol) [0x00007FF79400A304]
    (No symbol) [0x00007FF79400A49D]
    (No symbol) [0x00007FF793FF8B69]
    BaseThreadInitThunk [0x00007FFC0A7D259D+29]
    RtlUserThreadStart [0x00007FFC0BA0AF38+40]

sudoExclamationExclamation

Asked: 2025-02-04 01:26:14 +0800 CST

如何在 CDP 模式下在 Selenium Base 中设置窗口大小？

6

我在 CDP 模式下使用SeleniumBase 。

我想知道如何在网站加载后在 CDP 模式下设置窗口大小？

如果我使用非 CDP 功能sb.set_window_size(x,y)，那么它会被检测为在https://bot-detector.rebrowser.netruntimeEnableLeak上打开 devtools的机器人：

我尝试过sb.cdp.set_window_size(x,y)，但是该功能似乎不存在，因为它崩溃了：

sb.cdp.set_window_size(x,y)
AttributeError: 'types.SimpleNamespace' object has no attribute 'set_window_size'

编辑：我能够找到一种解决方法sb.cdp.set_window_rect：

[screenwidth,screenheight,innerwidth,innerheight,scrollwidth,scrollheight] = sb.cdp.evaluate("return [window.screen.width, window.screen.height, window.innerWidth, window.innerHeight, document.documentElement.scrollWidth, document.documentElement.scrollHeight];")

print(f"Size: {screenwidth}, {screenheight}, {innerwidth}, {innerheight}, {scrollwidth}, {scrollheight}")

sb.cdp.set_window_rect(0,0,scrollwidth + screenwidth - innerwidth, scrollheight + screenheight - innerheight + 100)

print(f"Sleeping for some time...")
sb.sleep(random.randint(5, 8))

有没有更好的方法？

RobM

Asked: 2024-12-09 01:51:40 +0800 CST

selenium xpath 语句选择特定表中第一次出现的按钮

6

我有一个包含多个表格的网页，所有表格都有具有类似 TD 的 TR

这是 ObservationStation 表中的按钮

    <input type="button" class="gwf-round-button" value="^" onclick="InsertRowBefore('ObservationStations', this, 'TextBox')" title="Add row before">

这是 MeteorlogicalVariables 表中的内容

    <input type="button" class="gwf-round-button" value="^" onclick="InsertRowBefore('MeteorologicalVariables', this, 'TextBox')" title="Add row before">

我想要做的是选择 MeteorlogicalVariables 表中第一次出现的按钮并单击它。区分每个表中按钮的唯一标识符是在 onclick 中，我如何访问 MeteorlogicalVariables 表中的第一个按钮？

这是我现在所拥有的：

edit_button = driver.find_element("xpath", '//button[text()="^" and @class="gwf-round-button"][1]')
edit_button.click()

这是 MeteorologicalVariables 表的样子（突出显示的按钮是我想要单击的）：

任何帮助解决这个问题的帮助都会很感激

这是特定表格的 HTML，我想单击其中第一次出现的 ^

        <table id="MeteorologicalVariables" class="gwf_variable_table">
   <colgroup>
      <col>
      <col style="min-width:150px">
      <col style="min-width:200px">
      <col style="min-width:200px">
      <col style="min-width:200px">
      <col style="min-width:220px">
      <col style="min-width:220px">
      <col style="min-width:275px">
   </colgroup>
   <tbody>
      <tr style="display: table-row;">
         <td class="gwf_variable_table_control_column"></td>
         <td style="max-width:150px" class="fmc-table-label-font fmc-tb-height">Variable</td>
         <td style="max-width:200px" class="fmc-table-label-font fmc-tb-height">Station Name</td>
         <td style="max-width:200px" class="fmc-table-label-font fmc-tb-height">Sensor(s)</td>
         <td style="max-width:200px" class="fmc-table-label-font fmc-tb-height">Height / Depth (m)</td>
         <td style="max-width:220px" class="fmc-table-label-font fmc-tb-height">Record Period</td>
         <td style="max-width:220px" class="fmc-table-label-font fmc-tb-height">Measurement Frequency</td>
         <td style="max-width:275px" class="fmc-table-label-font fmc-tb-height">Notes / Details</td>
      </tr>
      <tr>
         <td class="gwf_variable_table_control_column">
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="^" onclick="InsertRowBefore('MeteorologicalVariables', this, 'TextBox')" title="Add row before"></div> 
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="X" onclick="DeleteRow('MeteorologicalVariables', this)" title="Delete current row"></div>
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="v" onclick="InsertRowAfter('MeteorologicalVariables', this, 'TextBox')" title="Add row after"></div>
         </td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:200px;width:200px;max-width:200px;"></textarea></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:200px;width:200px;max-width:200px;"></textarea></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:275px;width:275px;max-width:275px;"></textarea></td>
      </tr>
      <tr>
         <td class="gwf_variable_table_control_column">
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="^" onclick="InsertRowBefore('MeteorologicalVariables', this, 'TextBox')" title="Add row before"></div>
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="X" onclick="DeleteRow('MeteorologicalVariables', this)" title="Delete current row"></div>
            <div class="gwf_tooltip"><input type="button" class="gwf-round-button" value="v" onclick="InsertRowAfter('MeteorologicalVariables', this, 'TextBox')" title="Add row after"></div>
         </td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:200px;width:200px;max-width:200px;"></textarea></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:200px;width:200px;max-width:200px;"></textarea></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><input type="text" class="fmc-tb-appearance fmc-tb-font fmc-tb-height" style="width:100%"></td>
         <td><textarea class="fmc-tb-appearance fmc-tb-font" style="height:60px;min-width:275px;width:275px;max-width:275px;"></textarea></td>
      </tr>

antimus è.é

Asked: 2024-11-21 18:50:51 +0800 CST

Selenium 中的文本提取问题

5

我在循环中提取文本时遇到问题。在第一个循环中提取了文本，但在第二个循环中，感兴趣的字段返回为空。
问题出在答案字段上，而每次都会提取问题字段。

driver = web_driver()
driver.get('https://www.medicitalia.it/consulti/?tag=cefalea')

data = []
# 2. Loop per navigare tra le pagine
for i in range(20):
    for urls in url:

        url = urls.get_attribute("href")
        print("URL QA: {}".format(url))
        print()

        curr_driver = web_driver()
        curr_driver.get(url)

        # 4. Apertura URL domande e estrazione dati:
        WebDriverWait(curr_driver, 20).until(
            EC.presence_of_element_located((By.XPATH, '//*[@id="question"]'))
        )
        

        #5. Extracting and Storing Data:
        try:
            question = curr_driver.find_element(By.XPATH, "//h1[contains(@class, 'consulto-h1') and contains(@class, 'px-2')]").text
        except:
            question = None


        try:
            answer = curr_driver.find_element(By.XPATH, "//div[contains(@class, 'col cons px-4 pt-4 pb-0')]").text
        except:
            answer = None

        # 6. Closing WebDriver Instances:
        curr_driver.quit()



# 6. Closing WebDriver Instances:
driver.quit()

<div class="col cons px-4 pt-4 pb-0">
                                    "Cosa posso fare?"<br>
<br>
Una visita gnatologica!<br>
<br>
E' assolutamente verosimile che i sintomi che lei accusa derivano dalla posizione e dal movimento della mandibola.<br>
I condili della mandibola si attaccano al cranio proprio vicino all'orecchio, ed ecco perchè vi è confusione dei sintomi attribuendoli falsamente all'apparato uditivo.<br>
<br>
La tensione muscolare spiega invece mal di testa e cervicale, contribuendo inoltre ad aumentare gli acufeni.<p class="my-2 text-right small"><i><a href="https://www.medicitalia.it/specialita/gnatologia-clinica/?qurl=http%3A%2F%2Fwww.studioformentelli.it">www.studioformentelli.it</a>  <br>
Attività prevalente: Gnatologia e<br>
Implantologia (scuola italiana)</i></p>
                                    </div>

我想了解为什么从第二个 for 循环开始答案仍然是空的。

Raha

Asked: 2024-09-04 21:58:05 +0800 CST

使用 selenium 进行网页抓取无法分页

5

我正在尝试抓取此网页https://mst.dk/publikationer，它有分页，查看源代码，它看起来好像发生在我下面添加的部分中。

<div class="Container_Container__G5vVd Container_Container___width_std__y2_Pn">
    <div class="Pagination_Pagination_wrapper__kp62j">
        <ul class="Pagination_Pagination__UOZ60" role="navigation" aria-label="Pagination">
            <li class="Pagination_Pagination_prev__zIUqn Pagination_Pagination_item___disabled__g5CaR">
                <a class="Pagination_Pagination_link__Z2LW0 Pagination_Pagination_prevLink__HDKS4" tabindex="-1" role="button" aria-disabled="true" aria-label="Previous page" rel="prev"></a>
            </li>
            <li class="Pagination_Pagination_item__suqyV selected">
                <a rel="canonical" role="button" class="Pagination_Pagination_link__Z2LW0 Pagination_Pagination_link___active__to_Os" tabindex="-1" aria-label="Side 1" aria-current="page">1</a>
            </li>
            <li class="Pagination_Pagination_item__suqyV">
                <a role="button" class="Pagination_Pagination_link__Z2LW0" tabindex="0" aria-label="Side 2" rel="next">2</a>
            </li>
            <li class="Pagination_Pagination_break__dKVzB">
                <a class="Pagination_Pagination_breakLink__jB8Rd" role="button" tabindex="0">...</a>
            </li>
            <li class="Pagination_Pagination_item__suqyV">
                <a role="button" class="Pagination_Pagination_link__Z2LW0" tabindex="0" aria-label="Side 321">321</a>
            </li>
            <li class="Pagination_Pagination_next__N6tkt">
                <a class="Pagination_Pagination_link__Z2LW0 Pagination_Pagination_nextLink__mytrA" tabindex="0" role="button" aria-disabled="false" aria-label="Next page" rel="next"></a>
            </li>
        </ul>
    </div>

我尝试了多种方法，包括在 URL 中添加 page=x，或使用 selenium 不同的定位器和选择器，增加等待时间，尝试使用下一个按钮，或模拟单击列表项。似乎没有什么对我有用。有人能帮我弄清楚这个页面的动态以及如何对其进行分页吗？我想做的是打开每个页面中的每个链接，找到 pdf 并下载它，对于第一页来说，使用以下代码可以正常工作：

def parse_epa_filtered_keywords():
    # Get number of search results
    page_no = int(int(get_number_of_results(link_filtered)) / 10) + 1
    driver = webdriver.Chrome(options=options)
    search_query = '+'.join(keywords.split())
    
    for i in tqdm(range(1, page_no + 1)):
        try:
            search_url = f"{link_filtered}?search={search_query}&page={i}"
            print(f"Fetching URL: {search_url}")
            
            # Load the search URL
            driver.get(search_url)
            
            # Wait for the page to load completely
            time.sleep(5)  # Adjust the sleep time as needed
            
            # Wait for the main page to load again
            publications = driver.find_elements(By.CSS_SELECTOR, 'a[class^="Link_Link__lzynb SearchResultItem_SearchResult"]')
            ....
driver.quit()

显然，这是使用该页面的努力，它会一遍又一遍地打开第一页。然后我尝试使用以下项目：

next_button = driver.find_element(By.XPATH, "//li[contains(@class, 'Pagination_Pagination_next')]/a[@rel='next']")

或者

next_button = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "li.Pagination_Pagination_next_N6tkt a")))

并使用不同的元素进行了更多尝试，这些尝试要么导致一般的 chrome 驱动程序错误，要么导致类似的错误：

An error occurred: Message: element click intercepted: Element is not clickable at point (732, 2911)
  (Session info: chrome=128.0.6613.114)
Stacktrace:
0   chromedriver                        0x0000000104f83998 cxxbridge1$str$ptr + 1887096
1   chromedriver                        0x0000000104f7be00 cxxbridge1$str$ptr + 1855456
2   chromedriver                        0x0000000104b80be0 cxxbridge1$string$len + 89508
3   chromedriver                        0x0000000104bca6fc cxxbridge1$string$len + 391360
4   chromedriver                        0x0000000104bc8d28 cxxbridge1$string$len + 384748
5   chromedriver

Jason Fan

Asked: 2024-08-30 11:39:01 +0800 CST

Selenium WebDriver 无法导航至新标签页主体中的 URL

5

我在 VB.NET 中使用 Selenium WebDriver 项目时遇到了问题。我尝试在 Edge 中打开一个新选项卡并导航到特定 URL，但该选项卡仍然为空白。以下是相关代码片段：

Imports OpenQA.Selenium
Imports OpenQA.Selenium.Edge
Imports OpenQA.Selenium.Support.UI
Imports System.Threading

Public Class Form1
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        ' Setup EdgeOptions
        Dim options As New EdgeOptions()

        options.AddArgument("--user-data-dir=C:\Users\jasonfan\AppData\Local\Microsoft\Edge\User Data")
        options.AddArgument("--remote-debugging-pipe")
        options.AddArgument("--allow-running-insecure-content")
        options.AddArgument("log-level=3")

        Dim driver As IWebDriver = New EdgeDriver(options)

        Try
            driver.Navigate().GoToUrl("http://www.bing.com")

            ' Open a new tab
            Dim jsExecutor As IJavaScriptExecutor = CType(driver, IJavaScriptExecutor)
            jsExecutor.ExecuteScript("window.open('about:blank', '_blank');")
            Dim newTabHandle As String = driver.WindowHandles.Last
            driver.SwitchTo().Window(newTabHandle)

            ' Navigate to the new URL
            driver.Navigate().GoToUrl("https://www.google.com/")

            MessageBox.Show("Process completed.")

        Catch ex As WebDriverException
            MessageBox.Show("Error: " & ex.Message)
        Catch ex As Exception
            MessageBox.Show("Error: " & ex.Message)
        Finally
            ' Clean up
            driver.Quit()
        End Try
    End Sub

End Class

我已确保 WebDriver 和 Edge 浏览器都是最新的。我是否可能遗漏了其他配置或设置？

任何帮助或建议都将不胜感激！

谢谢你！

（顺便说一下，我目前使用的是 Windows 11 平台。我的 Microsoft Edge 和 WebDriver 版本都是 128.0.2739.42）

anukalps

Asked: 2024-08-07 18:04:12 +0800 CST

用于启动 Chrome 浏览器的 Python 脚本

4

我正在编写 Python 脚本来启动 Chrome 浏览器以及绕过 Cloudfare 机器人检测（验证码、Java 渲染等）。我正在使用以下脚本，但 Chrome 无法启动。

“pip 安装未检测到的 chromedriver

导入未检测到的_chromedriver 作为 uc

def open_webpage(url): # 设置 Chrome 选项 options = uc.ChromeOptions()

# Switch Undetected ChromeDriver to Headless Mode
options.add_argument('--headless')  # Correct argument for headless mode
options.add_argument('--user-agent=Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Mobile Safari/537.36')

# Proxy settings: Specify your proxy address and port
proxy_address = "socks://username:password@proxyserver:port"
options.add_argument(f'--proxy-server={proxy_address}')

# Create a Chrome browser instance with undetected-chromedriver
driver = uc.Chrome(options=options)

# Fetch the current user agent to verify
current_user_agent = driver.execute_script("return navigator.userAgent;")
print("Current User Agent:", current_user_agent)

# Open the specified URL
driver.get(url)

open_webpage('https://www.indeed.com')

“

Kamilia Bouras

Asked: 2024-06-27 22:47:49 +0800 CST

使用 selenium Java 通过 Xpath 查找元素

6

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;

public class SeleniumBasics {
    
public static void main(String[] args) {  
        
        WebDriver driver= new ChromeDriver();  
        driver.get("https://formy-project.herokuapp.com/");
        driver.manage().window().maximize(); 
        driver.findElement(By.xpath("//body/div[1]/div[1]/li[1]")).click();
        
    }  

}

我正在尝试单击此页面https://formy-project.herokuapp.com/中的元素“自动完成” ...不起作用..请帮忙吗？

绝对 Xpath 相对 Xpath 按元素查找

Chris Li

Asked: 2024-05-19 20:28:59 +0800 CST

Selenium 不可靠地返回查找到的元素

6

我被困住了..这么多个小时..在这里查找了数百个问题和答案..

我想从银行产品网站 grep 数据，例如来自以下位置的“Delta”：

https://wertpapiere.ing.de/Investieren/Derivat/DE000HS2JL06

（链接将于 2024 年 9 月 17 日失效，因为产品届时将终止）

delta.text 应该是 -0,0193

第一次尝试：

delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/following-sibling::td')

有时有效..大多数情况下无效..出于什么原因？情况可能是这样的：“Delta”在网站上出现了 10 次，但随后：

delta = driver.find_element(By.XPATH, '//*[text()=\'Delta\']/[5]following-sibling::td')

应该可以解决问题，但没有。

另一种尝试：

delta = driver.find_element(By.XPATH, '//td[contains(text(), "Delta")]/following-sibling::td')

应该可以，但也不行。

尝试使用完整路径应该可以解决问题：

delta = driver.find_element(By.XPATH, '/html/body/main/div[2]/div/div[2]/div[1]/sh-derivative-greeks/div/div[1]/div/table/tbody/tr[2]/td[2]')

但找不到该元素；我认为是因为网站生成的动态 ID。

有谁有决定性的提示吗？

非常感谢！克里斯

greenbananas2

Asked: 2024-03-27 23:37:20 +0800 CST

Selenium 未通过类名、CSS 选择器、XPATH 查找特定元素

5

我上个月问了这个问题，但尚未找到解决方案，因此我决定切换到 Selenium 看看是否有帮助。

我正在尝试抓取该网站，但由于“下一步”按钮的问题，我只能抓取第一页，无论我使用什么作为定位器，都无法找到该按钮。我正在尝试检查“下一步”按钮的类名称中是否有“disabled”字符串，以便我的脚本可以知道它尚未到达最后一页并且可以继续抓取。无法通过 find_elements 或作为单个元素找到该按钮。分页区域中的所有元素都会发生这种情况，所以我什至无法解决这个问题。我收到空字符串、空列表或“无此类元素”异常。

这是一个最小的可重现示例：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
driver.implicitly_wait(0.5)

driver.get("https://ksa.carswitch.com/en/saudi/used-cars/search?page=1")

next_page_buttons = driver.find_elements(By.CSS_SELECTOR, ".page-number--next")
next_button = next_page_buttons[0]
if next_button.text == "":
    print("The element is empty.")

我尝试过的一些例子包括：

#next_page_button = self.driver.find_element(By.CSS_SELECTOR, ".page-number--next")

#next_page_button = self.driver.find_element(By.CLASS_NAME, "page-number page-number--next page-number--icon") 

#page_links = self.driver.find_elements(By.CSS_SELECTOR, ".page-number")
#lastpagelink = page_links[-1]
#last_page = int(lastpagelink.text) #trying to see if it returns last page number to loop through instead. Returns empty string.

#wait = WebDriverWait(self.driver, 100)
#pagination_links = [link.text for link in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '.page-number--next')))] #Raised Timeout Exception
#pagination_links = [link.text for link in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@class='page-number page-number--next page-number--icon']")))] #Timeout Exception again

...以及所有其他组合。这个问题可能与样式标签有关吗？我没有看到任何可能影响元素可见性的东西。

我将非常感谢任何帮助。谢谢。

我如何抓取此 URL 的“详细信息”部分中的数据：https://gallica.bnf.fr/ark:/12148/cb42768809f/date？

如何在 CDP 模式下在 Selenium Base 中设置窗口大小？

selenium xpath 语句选择特定表中第一次出现的按钮

Selenium 中的文本提取问题

使用 selenium 进行网页抓取无法分页

Selenium WebDriver 无法导航至新标签页主体中的 URL

用于启动 Chrome 浏览器的 Python 脚本

使用 selenium Java 通过 Xpath 查找元素

Selenium 不可靠地返回查找到的元素

Selenium 未通过类名、CSS 选择器、XPATH 查找特定元素

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

问题[selenium-webdriver](coding)