我一直在按照本教程从 URL 中抓取数据,因为它非常符合我的需求(3 个 div 深)。不幸的是,StackOverflow 不再支持 IE,我无法测试本教程的代码以查看它是否按原样工作。就我而言,我无法使用 chrome 插件,并且必须先向网站进行身份验证,然后才能导航到 URL。我还尝试了问题 15191847 的解决方案 - 特别是gembird 的解决方案 - 它给了我同样的错误。
当我运行下面的程序时,我收到“运行时错误‘91’。我将 ie.document 打印到文本文件中,并验证我正在搜索的 div id 是正确的,并且它们已被捕获。错误继续出现Set Questions = QuestionList.Children
。有什么想法为什么它会向我显示错误?
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim QuestionList As IHTMLElement, QuestionField As IHTMLElement
Dim Questions As IHTMLElementCollection, QuestionFieldLinks As IHTMLElementCollection, QuestionFields As IHTMLElementCollection
Dim Question As IHTMLElement
Dim RowNumber As Long
Dim votes As String, url As String, views As String, QuestionId As String
url = "<<my url>>"
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate url
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to " & url
DoEvents
Loop
Cells.Clear
'show text of HTML document returned
Set html = ie.Document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'put heading across the top of row 3
Range("A3").Value = "Field"
Range("B3").Value = "Values"
Set QuestionList = html.getElementByID("fieldgroup ")
Set Questions = QuestionList.Children
RowNumber = 4
For Each Question In Questions
If Question.className = "fieldrow _text-field" Then
'get a list of all of the parts of this question, and loop over them
Set QuestionFields = Question.all
For Each QuestionField In QuestionFields
'if this is the question's votes, store it (get rid of any surrounding text)
If QuestionField.className = "fieldlabel" Then
Cells(RowNumber, 1).Value = Trim(QuestionField.innerText)
End If
'likewise for views (getting rid of any text)
If QuestionField.className = "fieldvalue" Then
Cells(RowNumber, 2).Value = Trim(QuestionField.innerText)
End If
Next QuestionField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
HTML 输出如下所示。
<div class="fieldgroup " style="" group-title="">
<div class="fieldrow _text-field">
<div class="fieldlabel">Reporting</div>
<div class="fieldvalue">Yes</div>
</div>
<div class="fieldrow _text-field">
<div class="fieldlabel">Annotate ''Yes''</div>
<div class="fieldvalue">Yes</div>
</div>
...
您混淆了
id
哪些class
是不同的东西,例如,请参见id 和 class 之间有什么区别?。您的元素具有类属性,而不是 ID。要搜索具有特定类属性的元素,请使用函数
getElementsByClassName
。请注意,这是一个“复数”函数,它将返回具有该特定类属性的所有元素。即使它只找到一个元素,它也会返回一个可以容纳任意数量元素的数据结构,您需要使用索引来访问其中一个元素。据我记得,在 VBA 中,这是一个基于 0 的数组。如果你确定总能找到一个元素,那么可以使用
或者(但在这种情况下你需要对 QuestionList 有一个不同的定义):
我省略了错误检查,因此如果 HTML 不包含任何具有该类名的元素,您仍会收到运行时错误。要编写可靠的代码,您应该添加该检查。