match - find matches in two file using python -
i analyzing sequencing data , have few candidates genes need find functions.
after editing available human database , want compare candidate genes database , output function candidate gene.
i have basic python skills thought might me speed work finding functions of candidate genes.
so file1 contains candidate genes this
gene aqp7 rlim smco3 coasy hspa6
and database,file2.csv looks this:
gene function pdcd6 programmed cell death protein 6 cdc2 cell division cycle 2, g1 s , g2 m, isoform cra_a cdc2 cell division cycle 2, g1 s , g2 m, isoform cra_a cdc2 cell division cycle 2, g1 s , g2 m, isoform cra_a cdc2 cell division cycle 2, g1 s , g2 m, isoform cra_a
desired output
gene(from file1) ,function(matching file2)
i tried use code :
file1 = 'file1.csv' file2 = 'file2.csv' output = 'file3.txt' open(file1) inf: match = set(line.strip() line in inf) open(file2) inf, open(output, 'w') outf: line in inf: if line.split(' ',1)[0] in match: outf.write(line)
i blank page.
i tried using intersection function
with open('file1.csv', 'r') ref: open('file2.csv','r') com: open('common_genes_function','w') output: same = set(ref).intersection(com) print same
not working also..
please otherwise need manually
i recommend using pandas
merge
function. however, requires clear separator between 'gene' , 'function'-column. in example, assume @ tab:
import pandas pd #open files pandas datasets file1 = pd.read_csv(filepath1, sep = '\t') file2 = pd.read_csv(filepath2, sep = '\t') #merge files column 'gene' using 'inner', comes #with intersection of both datasets file3 = pd.merge(file1, file2, how = 'inner', on = ['gene'], suffixes = ['1','2']) file3.to_csv(filepath3, sep = ',')
Comments
Post a Comment