python - Adding each item in list to end of specific lines in FASTA file -
i solved in comments below.
so trying add each element of list of strings end of specific lines in different file.
hard explain want parse fasta file, , every time reaches header (line.startswith('>')) want replace parts of header element in list i've made.
for example:
file1:
">seq1 unwanted here
aatattata
atatatata
>seq2 unwanted stuff here
gtgtgtgtg
gtgtgtgtg
>seq3 more stuff don't want
acacacacac
acacacacac"
i want keep ">seq#" replace after next item in list below:
list: mylist = "['things1', '', 'things3', 'things4', '' 'things6', 'things7']"
result (modified file1):
">seq1 things1
aatattata
atatatata
>seq2 # adds nothing here due mylist[1] = ''
gtgtgtgtg
gtgtgtgtg
>seq3 things3
acacacacac
acacacacac
as can see want add blank items in list.
so once again, want parse fasta file, , every time gets header (there thousands), want replace after first word next item in separate list have made.
what have work, there few unnecessary lines i've edited down use few less lines. also, important note don't close file handles. result in errors, when writing file, either way it's bad practice. code:
#!/usr/bin/python import sys # gets list of annotations def get_annos(infile): open(infile, 'r') fh: # makes sure file closed annos = [] line in fh: annos.append( line.split('\t')[5] ) # added tab separator return annos # replaces info on each header correct annotation def add_annos(infile1, infile2, outfile): annos = get_annos(infile1) # contains list of annos open(infile2, 'r') f2, open(outfile, 'w') output: line in f2: if line.startswith('>'): line_split = list(line.split()[0]) # split line on whitespace , store first element in list line_split.append(annos.pop(0)) # append data of interest current id line output.write( ' '.join(line_split) + '\n' ) # join , write file newline character else: output.write(line) anno = sys.argv[1] seq = sys.argv[2] out = sys.argv[3] add_annos(anno, seq, out) get_annos(anno) this not perfect cleans things bit. i'd might veer away using pop() associate annotation data sequence ids unless files in same order every time.
Comments
Post a Comment