python - How can I assign a value to a null node in Xpath? -
this parts need crawl using scrapy xpath:
<tr class="o"><td>alabama</td><td><code>us.al</code></td><td><code>us01</code></td><td>ala.</td><td>-6~</td><td class="n">4,779,736</td><td class="n">133,916</td><td class="n">51,705</td><td>2</td><td>montgomery</td><td>alabamian</td><td>350-369</td></tr> <tr class="e"><td>alaska</td><td><code>us.ak</code></td><td><code>us02</code></td><td></td><td>-9~</td><td class="n">710,231</td><td class="n">1,530,700</td><td class="n">591,007</td><td>6</td><td>juneau</td><td>alaskan</td><td>995-999</td></tr> my xpath expression is:
response.xpath('//tr[@class="o" or @class="e"][2]/descendant::*').extract() but there null node in "alaska". <td> node after <code> "us02". not happen in alabama.
when use expression:
response.xpath('//tr[@class="o" or @class="e"][2]/descendant::*/text()').extract() to extract text, null node ignored.
but have comply format. how can set null node space?
by way, have better solution crawl page in scrapy?
i here explicit possible , data in "by column" fashion:
for state in response.xpath('//tr[@class="o" or @class="e"]'): item = state() item["hasc"] = state.xpath(".//td[2]/code/text()").extract() ... yield item where state item class. note extract() return list. using item loader takefirst or join processor have string values inside item fields.
Comments
Post a Comment