python - Webscrape with bad wifi: Can I make my scrape 'go online' again? -
i'm doing big python scrape 10.000+ webpages, , it's taking me several hours do. if disconnect internet during proces, script stalls, , doesn't reconnect when wifi again.
is there way insert a; 'if internet stops, pick left off'?
there framework building scrapers - scrapy. has such capabilities - can save execution state, , resume crawling point (a year later, example).
or if want build scratch, need implement saving of state of crawler. think bad idea trying save interpreter state, need design crawler in such way, state can serialized. example, scrapy designed in such way - crawler has methods, has method, generates initial requests. each requests has callback. each callback can generate additional requests. , on. scrapy's call callbacks, , enqueue requests, , call callbacks them. such design makes able scrapy save requests queue disk , resume execution last request(s).
Comments
Post a Comment