complex regex matches in python -
i have txt file contains following data:
chri
atgccttgggcaacggt...(multiple lines)
chrii
aggttggccaaggtt...(multiple lines)
i want first find 'chri' , iterate through multiple lines of atgc until find xth char. want print xth char until yth char. have been using regex once have located line containing chri, don't know how continue iterating find xth char.
here code:
for i, line in enumerate(sacc_gff): match in re.finditer(chromo_val, line): print(line) match in re.finditer(r"[atgc]{%d},{%d}\z" % (int(amino_start), int(amino_end)), line): print(match.group())
what variables mean:
chromo_val
= chri
amino_start
= (some start point program found)
amino_end
= (some end point program found)
note: amino_start
, amino_end
need in variable form.
please let me know if clarify you, thank you.
it looks working fasta data, provide answer in mind, if isn't can use sub_sequence selection part still.
fasta_data = {} # creates empty dictionary open( fasta_file, 'r' ) fh: line in fh: if line[0] == '>': seq_id = line.rstrip()[1:] # strip newline character , remove leading '>' character fasta_data[seq_id] = '' else: fasta_data[seq_id] += line.rstrip() # return substring chromosome 'chri' first character @ amino_start not including amino_end sequence_string1 = fasta_data['chri'][amino_start:amino_end] # return substring chromosome 'chrii' first character @ amino_start , including amino_end sequence_string2 = fasta_data['chrii'][amino_start:amino_end+1]
fasta format:
>chr1 atttatatatat atggcgcgatcg >chr2 aatcgctgctgc
Comments
Post a Comment