python - Adding each item in list to end of specific lines in FASTA file -

June 15, 2011

i solved in comments below.

so trying add each element of list of strings end of specific lines in different file.

hard explain want parse fasta file, , every time reaches header (line.startswith('>')) want replace parts of header element in list i've made.

for example:

file1:

">seq1 unwanted here

aatattata

atatatata

>seq2 unwanted stuff here

gtgtgtgtg

>seq3 more stuff don't want

acacacacac

acacacacac"

i want keep ">seq#" replace after next item in list below:

list: mylist = "['things1', '', 'things3', 'things4', '' 'things6', 'things7']"

result (modified file1):

">seq1 things1

aatattata

atatatata

>seq2 # adds nothing here due mylist[1] = ''

gtgtgtgtg

>seq3 things3

acacacacac

as can see want add blank items in list.

so once again, want parse fasta file, , every time gets header (there thousands), want replace after first word next item in separate list have made.

what have work, there few unnecessary lines i've edited down use few less lines. also, important note don't close file handles. result in errors, when writing file, either way it's bad practice. code:

#!/usr/bin/python  import sys  # gets list of annotations def get_annos(infile):     open(infile, 'r') fh:  # makes sure file closed         annos = []         line in fh:             annos.append( line.split('\t')[5] ) # added tab separator      return annos  # replaces info on each header correct annotation def add_annos(infile1, infile2, outfile):     annos = get_annos(infile1) # contains list of annos     open(infile2, 'r') f2, open(outfile, 'w') output:         line in f2:             if line.startswith('>'):                 line_split = list(line.split()[0]) # split line on whitespace , store first element in list                 line_split.append(annos.pop(0)) # append data of interest current id line                 output.write( ' '.join(line_split) + '\n' ) # join , write file newline character             else:                 output.write(line)  anno = sys.argv[1] seq = sys.argv[2] out = sys.argv[3]  add_annos(anno, seq, out) get_annos(anno)

this not perfect cleans things bit. i'd might veer away using pop() associate annotation data sequence ids unless files in same order every time.

Search This Blog

Ruby Code

python - Adding each item in list to end of specific lines in FASTA file -

Comments

Post a Comment

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -