html lists - Strip multiple li elements from google with VBA -


my goal strip google search results page 1 vba excel. until far managed strip first result. head, link , date stored in cells 4, 5, 6. have make loop other li's, can't straight. function stores date isn't optimal coded think. knows answer?

sub xmlhttp()  dim url string, lastrow long dim xmlhttp object, html object, objresultdiv object, objh3 object, link object, objdatum object, ddatum object dim start_time date dim end_time date  lastrow = range("a" & rows.count).end(xlup).row  dim cookie string dim result_cookie string  start_time = time debug.print "start_time:" & start_time  = 2 lastrow      url = "https://www.google.co.in/search?q=" & cells(i, 3) & "skipr" & "&rnd=" & worksheetfunction.randbetween(1, 10000)      set xmlhttp = createobject("msxml2.serverxmlhttp")     xmlhttp.open "get", url, false     xmlhttp.setrequestheader "content-type", "text/xml"     xmlhttp.setrequestheader "user-agent", "mozilla/5.0 (windows nt 6.1; rv:25.0) gecko/20100101 firefox/25.0"     xmlhttp.send          set html = createobject("htmlfile")     html.body.innerhtml = xmlhttp.responsetext     set objresultdiv = html.getelementbyid("rso")      set objh3 = objresultdiv.getelementsbytagname("h3")(0)     set link = objh3.getelementsbytagname("a")(0)     set objdatum = objresultdiv.getelementsbytagname("span")(2)      str_text = replace(link.innerhtml, "<em>", "")     str_text = replace(str_text, "</em>", "")      dat_text = objdatum.innerhtml      cells(i, 4) = str_text     cells(i, 5) = link.href     cells(i, 6) = dat_text      doevents next  end_time = time debug.print "end_time:" & end_time  debug.print "done" & "time taken : " & datediff("n", start_time, end_time) msgbox "done" & "time taken : " & datediff("n", start_time, end_time)  end sub 

you need iterate collections returned getelementsbytagname call instead of returning first element array index (0)

i had similar project, below tips & approach reference, might assist in working & maintaining code in future:

first, instead of using createobject prefer reference object libraries expose com objects, gives me ability browse functions , properties of each object f2 , gives me code completion (speed & less bugs) within vba editor (f7 takes code view).

add reference dialog

giving me documentation , code completion: code completion

also, use these const clarity

'see ready state : https://msdn.microsoft.com/en-us/library/ie/ms534361(v=vs.85).aspx const readystate_uninitialized = 0 const readystate_loading = 1 const readystate_loaded = 2 const readystate_interactive = 3 const readystate_complete = 4 

finally, using domdocument60 parse xml document object model in memory.

and mshtml.htmldocument parse html document , iterate table rows.

below code iterate returned rows table within html document embedded in initial xml document returned webserver:

dim xmldoc domdocument60 set xmldoc = getxmldocument("http://www.nbg.ge/rss.php")  'extract publication date debug.print xmldoc.getelementsbytagname("pubdate")(0).text  'unwrap html document cdata in "//item/description" element dim htmldoc new mshtml.htmldocument htmldoc.body.innerhtml = xmldoc.selectnodes("//item/description")(0).text  'extract table data html document dim tr ihtmlelement, td ihtmlelement each tr in htmldoc.getelementsbytagname("tr")     each td in tr.children         'each cell in current row         debug.print "  " & td.innerhtml     next td     'next row     debug.print "-----" next tr 

sample data returned webservice calling:

<rss version="2.0"> <channel> <title>rss nbg currency rates</title> <link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link> <description>currency rates</description> <language>geo</language> <copyright>copyright 2015, nbg</copyright> <pubdate>wed, 29 apr 2015 12:39:50 +0400</pubdate> <lastbuilddate>wed, 29 apr 2015 12:39:50 +0400</lastbuilddate> <managingeditor>alex@proservice.ge</managingeditor> <webmaster>alex@proservice.ge</webmaster> <item> <title>currency rates 2015-04-29</title> <link>https://www.nbg.gov.ge/index.php?m=236&lang=geo</link> <description> <![cdata[ <table border="0">     <tr>          <td>aed</td>          <td>10 არაბეთის გაერთიანებული საამიროების დირჰამი</td>          <td>6.2858</td>          <td><img src="https://www.nbg.gov.ge/images/green.gif"></td>          <td>0.0640</td> </tr><tr> <td>amd</td> <td>1000 სომხური დრამი</td>          <td>4.8676</td>          <td><img src="https://www.nbg.gov.ge/images/green.gif"></td>          <td>0.0414</td>      </tr>    </table> ]]> </description> <pubdate>wed, 29 apr 2015 12:39:50 +0400</pubdate> <guid> https://www.nbg.gov.ge/index.php?m=236&lang=geo&date=2015-04-29 </guid> </item> </channel> </rss> 

and function gets document webserver (only works if added references shown in above pictures)

function getxmldocument(url string) msxml2.domdocument60     dim xhr new xmlhttp60     dim doc new domdocument60     dim msg string      xhr         .open bstrmethod:="get", bstrurl:=url, varasync:=false          on error goto senderror         .send         on error goto 0          'http status codes - http://en.wikipedia.org/wiki/list_of_http_status_codes         '200 = success - ok         if .readystate = readystate_complete , .status = 200             'debug.print .responsetext             doc.loadxml (.responsetext)         else             msg = "error" & vbnewline & "ready state: " & .readystate & _                 vbnewline & "http request status: " & .status             goto error         end if      end      set getxmldocument = doc exit function  senderror:     'by default access data source accross internet dissabled     'go internet options & under security>custom level>misc>access data sources accross domains> enable     'see: http://stackoverflow.com/a/17402920     msgbox "make sure access data sources accross domains enabled under internet options>security>custom", vbokonly, "could not send request server"  error:     msgbox msg, vbokonly, "unexpected error"  end function 

Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -