[Biopython] problem to find a accesion number in tab delimited file

Lenna Peterson arklenna at gmail.com
Tue Dec 11 03:14:38 UTC 2012


Hi Fernando,

A filehandle only reads through a file once, so the second time through the
loop, `fh` is, as you found, empty.

I would suggest reading the entirety of that file into a list-like object,
which will be persistent. You might also consider using the accession
number as a dictionary key and appending the GO numbers to a list value.

Cheers,

Lenna


On Mon, Dec 10, 2012 at 3:07 PM, Fernando <fpiston at gmail.com> wrote:

> Hello everybody,
>
> I'm trying to perform a GOs annotation using the SIMAP database which is
> Blast2GO annotated. Everything is fine, but I have problems when I try
> to find the accession number in the file where entry numbers are
> associated with their GOs. The problem is that the script does not find
> the number in the input file when really there is. I tried several things
> without good results (re.match, insert in a list and then extract the
> element, etc)
> File where the GOs are associated with entry numbers has this structure
> (accession number, GO term, blats2go score):
> 1f0ba1d119f52ff28e907d2b5ea450db        GO:0007154      79
> 1f0ba1d119f52ff28e907d2b5ea450db        GO:0005605      99
>
> The python code:
> #!/usr/bin/env python
> import re
> from Bio.Blast import NCBIXML
> from Bio import SeqIO
>
> input_file = open('/home/fpiston/Desktop/test_go/test2.fasta', 'rU')
> result_handle = open('/home/fpiston/Desktop/test_go/test2.xml', 'rU')
> save_file = open('/home/fpiston/Desktop/test_go/test2.out', 'w')
>
> fh = open('/home/fpiston/Desktop/test_go/Os_Bd_Ta_blat2go_fake', 'rU')
> q_dict =  SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
> blast_records = NCBIXML.parse(result_handle)
>
> hits = []
>
> for blast_record in blast_records:
>     if blast_record.alignments:
>         list = (blast_record.query).split()
>         if re.match('ENA|\w*|\w*', list[0]) != None:
>             list2 = list[0].split("|")
>             save_file.write('%s\t' % list2[1])
>         else:
>             save_file.write('%s\t' % list[0])
>         for alignment in blast_record.alignments:
>             for hsp in alignment.hsps:
>                 h = alignment.hit_def    #at this point all right
>                 for l in fh:             #here, 'l' in not found in 'fh'
>                     ls = l.split()
>                     if h in ls:
>                         print h
>                         print 'ok'
>                         save_file.write('%s\t' % ls[1])
>                 save_file.write('\n')
>         hits.append(blast_record.query.split()[0])
> misses =set(q_dict.keys()) - set(hits)
>
> for i in misses:
>     list = i.split("|")
>     if len(list) > 1:
>         save_file.write('%s\t' % list[1])
>     else:
>         save_file.write('%s\t' % list)
>     save_file.write('%s\n' % 'no_match')
>
> save_file.close()
>
>
>
>  Fernando
>  --
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list