Python crawler is not crawling all sites -

June 15, 2012

i'm trying links websites through code

import requests bs4 import beautifulsoup def get_links(max_pages):     page = 1     while page <= max_pages:         address = 'http://hamariweb.com/mobiles/nokia_mobile-phones1.aspx?page=' + str(page)         source_code = requests.get(address)         plain_text = source_code.text         soup = beautifulsoup(plain_text)         link in soup.findall('a', {'class': 'textclass8pt'}):             href = link.get("href")             print(href)          page += 1  get_links(3)

and it's giving expected output. when tried this

address = 'http://propakistani.pk/category/cellular/page/' + str(page)

for link in soup.findall('a', {'class': 'aa_art_hdng'}):

it's showing error

typeerror: getresponse() got unexpected keyword argument 'buffering'

i tried site time neighter showed output nor error. why it's showing proper output different sites? there problem code? please me. thanks

there no tag match condition soup.findall('a', {'class': 'textclass8pt'})

try following

demo:

import requests bs4 import beautifulsoup  def get_links(max_pages):     page = 1      while page <= max_pages:         address = 'http://propakistani.pk/category/cellular/page/' + str(page)         source_code = requests.get(address)         plain_text = source_code.text         soup = beautifulsoup(plain_text)         link in soup.findall('a'):             href = link.get("href")             print(href)          page += 1  get_links(3)

there a tag class value aa_loop_h2a e.g. <a class="aa_loop_h2a" href="http://propakistani.pk/2015/04/20/mobile-data-usage-in-pakistan-grows-600-during-2014/" title="mobile data usage in pakistan grows 600% during 2014">mobile data usage in pakistan grows 600% during 2014</a>

so try soup.findall('a', {'class': 'aa_loop_h2a'}) condition.

Search This Blog

Ruby Code

Python crawler is not crawling all sites -

Comments

Post a Comment

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

command line - Use qwinsta in PowerShell ISE -

java - Show Soft Keyboard when EditText Appears -