Extract part of xml file with python etree -
i have big xml file looks 1 below. have put part of it, >2gb, see structure. basicly subnetwork parents
have same structure 1 showed below. want extract part of xml file <managedelementid string="xxxx" />
(where xxx input variable). here code , xml:
<create> <subnetwork networktype="gsm" userlabel="bsc"> . . </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc01"> . . </subnetwork> <subnetwork networktype="ipran" userlabel="ipran"> . . </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc02"> <managedelement sourcetype="cello"> <managedelementid string="3galpas" /> <primarytype type="rbs" /> . . </managedelement> <managedelement sourcetype="cello"> <managedelementid string="3gtuti" /> <primarytype type="rbs" /> . . </managedelement> <managedelement sourcetype="cello"> <managedelementid string="3ghhh" /> <primarytype type="rbs" /> . . </managedelement> </subnetwork> </create>
and code
from xml.etree import elementtree import xml.etree.elementtree et xml.etree.elementtree import xml, fromstring, tostring xml.etree.elementtree import element xml.etree.elementtree import subelement xml.etree.elementtree import element, subelement, comment open(r"c:\\users\\etihkru\\desktop\\h4.xml", 'rt') f: root = et.parse(f) tree=root.getroot() open(r"c:\\users\\etihkru\\desktop\\list_of_xxx", 'r') f2: line in f2: line=line.rstrip() line1='"' + line + '"' xp_str1 = str(('.//managedelementid[@string=')) xp_str2 = str("]/../../") str_elem = xp_str1 + line1 + xp_str2 item in tree.findall(str_elem): print et.tostring(item)
and file list_of_xxx
below:
3galpas 3gtuti
as said there numerues number of <managedelementid string=/>
, , want extract ones in list_of_xxx
.
so want output below:
<subnetwork networktype="wcdma" userlabel="rnc02"> <managedelement sourcetype="cello"> <managedelementid string="3galpas" /> <primarytype type="rbs" /> . . </managedelement> </subnetwork> <subnetwork networktype="wcdma" userlabel="rnc02"> <managedelement sourcetype="cello"> <managedelementid string="3gtuti" /> <primarytype type="rbs" /> . . </managedelement> </subnetwork>
so, want find managedelementids
given in list_of_xxx,and parents managedelement
, subnetwork
, , write them given above. every mangedelementid
should closed parents mentioned. i'm uing python 2.6 without lxml, don't have right install it.
extracting part of xml in sense that part exists in source xml should trivial. example, getting managedelement
s containing managedelementid
you're interested in easy. here seems want them wrapped within subnetwork
parent node.
in source xml, subnetwork
contains mix of elements want , other elements want strip result, there no such subnetwork
containing only managedelement
nodes want.
we can approach extracting managedelement
nodes source xml, , add them reconstructed parent subnetwork
node :
..... ..... line in f2: line = line.rstrip() #get subnet nodes containing managedelementid subnet_path = ".//managedelementid[@string='{0}']/../.." subnet_path = subnet_path.format(line) subnet in tree.findall(subnet_path): #reconstruct subnet node: parent = et.element(subnet.tag, attrib=subnet.attrib) #path find managedelement containing managedelementid content_path = ".//managedelementid[@string='{0}']/..".format(line) #append managedelement found new subnet: content in subnet.findall(content_path): parent.append(content) #print new subnet: print et.tostring(parent)
Comments
Post a Comment