Parsing html using Xpath in python -


i have html below trying parse using xpath. empty sting in return. can please tell me mistaken. have tried couldn't succeed.

xpath code label :

divlbl=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']") 

xpath code value of corresponding label :

divval=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']/strong") 

html value :

<div>                         <h2 class="rowbreak"><strong>information of car</strong></h2>                         <ul class=" list-unstyled row">                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">make year:</span> <strong>aug 2009</strong></li>                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">kilometers:</span> <strong>127,553</strong></li>                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">city:</span>                                  <strong class="carcity_795606">                                                                           <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">                                     sambalpur                                    </a>                                                                     </strong>                              </li>                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">listing date:</span> <strong>27 apr 2015</strong></li>                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">no. of owners:</span> <strong> first owner</strong>                             </li>                             <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">fuel type:</span> <strong> petrol</strong></li>                               <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">posted by:</span> <strong>                                    dealer</strong>                             </li>                         </ul>            </div> 

edited html:

 <div>                     <h2 class="rowbreak"><strong>information of car</strong></h2>                     <ul class=" list-unstyled row">                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">make year:</span> <strong>aug 2009</strong></li>                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">kilometers:</span> <strong>127,553</strong></li>                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">city:</span>                              <strong class="carcity_795606">                                                                       <a href="javascript:void(0);" onclick="javascript: $( &quot;#maplinkbtn&quot; ).trigger( &quot;click&quot; ); ">                                 sambalpur                                    </a>                                                                 </strong>                          </li>                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">listing date:</span> <strong>27 apr 2015</strong></li>                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">no. of owners:</span> <strong> first owner</strong>                         </li>                         <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">fuel type:</span> <strong> petrol</strong></li>                           <li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">posted by:</span> <strong>                                dealer</strong>                         </li>                     </ul>        </div>   <h2 class="rowbreak"></h2>     <ul class=" list-unstyled row">                             <li class="col-sm-6 mrg-bottom"><span class=" text-light">one time tax :</span> <strong>individual</strong></li>                             <li class="col-sm-6 mrg-bottom"><span class=" text-light">registration no. :</span> <strong>or03f3141</strong></li>                             <li class="col-sm-6 mrg-bottom"><span class=" text-light"> insurance &amp; expiry :</span> <strong>no insurance&nbsp;</strong></li>                             <li class="col-sm-6 mrg-bottom"><span class=" text-light">registration place: </span> <strong> sambalpur</strong></li>                             <li class="col-sm-6 mrg-bottom"><span class=" text-light">transmission :</span> <strong>manual</strong></li>                             <li class="col-sm-6 mrg-bottom"><span class=" text-light">color :</span> <strong>silver</strong></li>                         </ul> 

the xpath using quite fragile - checking every single element in chain , using "layout-oriented" classes.

i start h2 element containing strong element "information of car" text , following ul element. e.g. labels:

//h2[strong = 'information of car']/following-sibling::ul/li/span/text() 

demo:

in [3]: ch = fromstring(data)  in [4]: ch.xpath("//h2[strong = 'information of car']/following-sibling::ul/li/span/text()") ['make year:', 'kilometers:', 'city:', 'no. of owners:', 'fuel type:', 'posted by:'] 

sample (getting names , values):

in [25]: field in ch.xpath("//h2/following-sibling::ul/li"):     name = ''.join(field.xpath(".//span/text()")).strip()     value = ''.join(field.xpath(".//strong//text()")).strip()     print name, value    ....:      make year: aug 2009 kilometers: 127,553 city: sambalpur listing date: 27 apr 2015 no. of owners: first owner fuel type: petrol posted by: dealer 1 time tax : individual registration no. : or03f3141 insurance & expiry : no insurance registration place: sambalpur transmission : manual color : silver 

Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -