[Biopython-dev] Python script

Naiane Negri naiannegri at gmail.com
Thu Sep 10 15:48:40 UTC 2015


I'm new with python so i'm reaaally struggling in making a script.

So, what I need is to make a comparison between two files. One file
contains all proteins of some data base, the other contain only some of the
proteins presents in the other file, because it belongs to a organism. So I
need to know wich proteins of this data base is present in my organism. For
that I want to build a output like a matrix, with 0 and 1 referring to
every protein present in the data base that may or may not be in my
organism.

Does anybody have any idea of how could I do that? I'm trying to use
something like this $ cat sorted.a A B C D $ cat sorted.b A D $ join
sorted.a sorted.b | sed 's/^/1 /' && join -v 1 sorted.a sorted.b | sed
's/^/0 /' 1 A 1 D 0 B 0 C

But I'm not being able to use it because sometimes a protein is present but
its not in the same line. Here is a example:

1-cysPrx_C
14-3-3
2-Hacid_dh
2-Hacid_dh_C
2-oxoacid_dh
2H-phosphodiest
2OG-FeII_Oxy
2OG-FeII_Oxy_3
2OG-FeII_Oxy_4
2OG-FeII_Oxy_5
2OG-Fe_Oxy_2
2TM
2_5_RNA_ligase2

comparing with

1-cysPrx_C
120_Rick_ant
14-03-2003
2-Hacid_dh
2-Hacid_dh_C
2-oxoacid_dh
2-ph_phosp
2CSK_N
2C_adapt
2Fe-2S_Ferredox
2H-phosphodiest
2HCT
2OG-FeII_Oxy

Does anyone have an idea of how could I do that? Thanks so far.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150910/42ad24c1/attachment.html>


More information about the Biopython-dev mailing list