Extract part of xml file with python etree -


i have big xml file looks 1 below. have put part of it, >2gb, see structure. basicly subnetwork parents have same structure 1 showed below. want extract part of xml file <managedelementid string="xxxx" /> (where xxx input variable). here code , xml:

<create>  <subnetwork networktype="gsm" userlabel="bsc"> . . </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc01"> . . </subnetwork> <subnetwork networktype="ipran" userlabel="ipran"> . . </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc02">                   <managedelement sourcetype="cello">                      <managedelementid string="3galpas" />                      <primarytype type="rbs" />                    .                    .                   </managedelement>                   <managedelement sourcetype="cello">                      <managedelementid string="3gtuti" />                      <primarytype type="rbs" />                    .                    .                   </managedelement>                     <managedelement sourcetype="cello">                      <managedelementid string="3ghhh" />                      <primarytype type="rbs" />                    .                    .                   </managedelement> </subnetwork> </create>  

and code

from xml.etree import elementtree import xml.etree.elementtree et xml.etree.elementtree import xml, fromstring, tostring xml.etree.elementtree import element xml.etree.elementtree import subelement xml.etree.elementtree import element, subelement, comment   open(r"c:\\users\\etihkru\\desktop\\h4.xml", 'rt') f:    root = et.parse(f)    tree=root.getroot()    open(r"c:\\users\\etihkru\\desktop\\list_of_xxx", 'r') f2:         line in f2:              line=line.rstrip()              line1='"' + line + '"'              xp_str1 = str(('.//managedelementid[@string='))              xp_str2 = str("]/../../")              str_elem = xp_str1 + line1 + xp_str2               item in tree.findall(str_elem):                     print et.tostring(item) 

and file list_of_xxx below:

3galpas 3gtuti 

as said there numerues number of <managedelementid string=/>, , want extract ones in list_of_xxx.

so want output below:

<subnetwork networktype="wcdma" userlabel="rnc02">                   <managedelement sourcetype="cello">                      <managedelementid string="3galpas" />                      <primarytype type="rbs" />                    .                    .                   </managedelement> </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc02">                   <managedelement sourcetype="cello">                      <managedelementid string="3gtuti" />                      <primarytype type="rbs" />                    .                    .                   </managedelement> </subnetwork> 

so, want find managedelementids given in list_of_xxx,and parents managedelement , subnetwork, , write them given above. every mangedelementid should closed parents mentioned. i'm uing python 2.6 without lxml, don't have right install it.

extracting part of xml in sense that part exists in source xml should trivial. example, getting managedelements containing managedelementid you're interested in easy. here seems want them wrapped within subnetwork parent node.

in source xml, subnetwork contains mix of elements want , other elements want strip result, there no such subnetwork containing only managedelement nodes want.

we can approach extracting managedelement nodes source xml, , add them reconstructed parent subnetwork node :

..... ..... line in f2:     line = line.rstrip()     #get subnet nodes containing managedelementid     subnet_path = ".//managedelementid[@string='{0}']/../.."     subnet_path = subnet_path.format(line)     subnet in tree.findall(subnet_path):         #reconstruct subnet node:         parent = et.element(subnet.tag, attrib=subnet.attrib)         #path find managedelement containing managedelementid         content_path = ".//managedelementid[@string='{0}']/..".format(line)         #append managedelement found new subnet:         content in subnet.findall(content_path):             parent.append(content)         #print new subnet:         print et.tostring(parent) 

Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -