Parsing html using Xpath in python -
i have html below trying parse using xpath. empty sting in return. can please tell me mistaken. have tried couldn't succeed.
xpath code label :
divlbl=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']")
xpath code value of corresponding label :
divval=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']/strong")
html value :
<div> <h2 class="rowbreak"><strong>information of car</strong></h2> <ul class=" list-unstyled row"> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">make year:</span> <strong>aug 2009</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">kilometers:</span> <strong>127,553</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">city:</span> <strong class="carcity_795606"> <a href="javascript:void(0);" onclick="javascript: $( "#maplinkbtn" ).trigger( "click" ); "> sambalpur </a> </strong> </li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">listing date:</span> <strong>27 apr 2015</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">no. of owners:</span> <strong> first owner</strong> </li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">fuel type:</span> <strong> petrol</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">posted by:</span> <strong> dealer</strong> </li> </ul> </div>
edited html:
<div> <h2 class="rowbreak"><strong>information of car</strong></h2> <ul class=" list-unstyled row"> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">make year:</span> <strong>aug 2009</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">kilometers:</span> <strong>127,553</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">city:</span> <strong class="carcity_795606"> <a href="javascript:void(0);" onclick="javascript: $( "#maplinkbtn" ).trigger( "click" ); "> sambalpur </a> </strong> </li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">listing date:</span> <strong>27 apr 2015</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">no. of owners:</span> <strong> first owner</strong> </li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">fuel type:</span> <strong> petrol</strong></li> <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">posted by:</span> <strong> dealer</strong> </li> </ul> </div> <h2 class="rowbreak"></h2> <ul class=" list-unstyled row"> <li class="col-sm-6 mrg-bottom"><span class=" text-light">one time tax :</span> <strong>individual</strong></li> <li class="col-sm-6 mrg-bottom"><span class=" text-light">registration no. :</span> <strong>or03f3141</strong></li> <li class="col-sm-6 mrg-bottom"><span class=" text-light"> insurance & expiry :</span> <strong>no insurance </strong></li> <li class="col-sm-6 mrg-bottom"><span class=" text-light">registration place: </span> <strong> sambalpur</strong></li> <li class="col-sm-6 mrg-bottom"><span class=" text-light">transmission :</span> <strong>manual</strong></li> <li class="col-sm-6 mrg-bottom"><span class=" text-light">color :</span> <strong>silver</strong></li> </ul>
the xpath using quite fragile - checking every single element in chain , using "layout-oriented" classes.
i start h2
element containing strong
element "information of car" text , following ul
element. e.g. labels:
//h2[strong = 'information of car']/following-sibling::ul/li/span/text()
demo:
in [3]: ch = fromstring(data) in [4]: ch.xpath("//h2[strong = 'information of car']/following-sibling::ul/li/span/text()") ['make year:', 'kilometers:', 'city:', 'no. of owners:', 'fuel type:', 'posted by:']
sample (getting names , values):
in [25]: field in ch.xpath("//h2/following-sibling::ul/li"): name = ''.join(field.xpath(".//span/text()")).strip() value = ''.join(field.xpath(".//strong//text()")).strip() print name, value ....: make year: aug 2009 kilometers: 127,553 city: sambalpur listing date: 27 apr 2015 no. of owners: first owner fuel type: petrol posted by: dealer 1 time tax : individual registration no. : or03f3141 insurance & expiry : no insurance registration place: sambalpur transmission : manual color : silver
Comments
Post a Comment