Python crawler is not crawling all sites -
i'm trying links websites through code
import requests bs4 import beautifulsoup def get_links(max_pages): page = 1 while page <= max_pages: address = 'http://hamariweb.com/mobiles/nokia_mobile-phones1.aspx?page=' + str(page) source_code = requests.get(address) plain_text = source_code.text soup = beautifulsoup(plain_text) link in soup.findall('a', {'class': 'textclass8pt'}): href = link.get("href") print(href) page += 1 get_links(3)
and it's giving expected output. when tried this
address = 'http://propakistani.pk/category/cellular/page/' + str(page)
for link in soup.findall('a', {'class': 'aa_art_hdng'}):
it's showing error
typeerror: getresponse() got unexpected keyword argument 'buffering'
i tried site time neighter showed output nor error. why it's showing proper output different sites? there problem code? please me. thanks
there no tag match condition soup.findall('a', {'class': 'textclass8pt'})
try following
demo:
import requests bs4 import beautifulsoup def get_links(max_pages): page = 1 while page <= max_pages: address = 'http://propakistani.pk/category/cellular/page/' + str(page) source_code = requests.get(address) plain_text = source_code.text soup = beautifulsoup(plain_text) link in soup.findall('a'): href = link.get("href") print(href) page += 1 get_links(3)
or
there a
tag class value aa_loop_h2a
e.g. <a class="aa_loop_h2a" href="http://propakistani.pk/2015/04/20/mobile-data-usage-in-pakistan-grows-600-during-2014/" title="mobile data usage in pakistan grows 600% during 2014">mobile data usage in pakistan grows 600% during 2014</a>
so try soup.findall('a', {'class': 'aa_loop_h2a'})
condition.
Comments
Post a Comment