[Biopython] problem to find a accesion number in tab delimited file

Tue Dec 11 07:06:30 UTC 2012

Hi,
I'm sorry, the code has an error. Really the problem appear in the
following lines:
                         if h in ls:
                             print h
                             print 'ok'
I can not to find the hit ('h') in the file 'fh' by lines.

I am going to try with a dictionary as indicated by Lenna.

Thanks for your answers.

Lenna Peterson <arklenna at gmail.com> writes:

> Hi Fernando, 
>
>
> A filehandle only reads through a file once, so the second time
> through the loop, `fh` is, as you found, empty. 
>
> I would suggest reading the entirety of that file into a list-like
> object, which will be persistent. You might also consider using the
> accession number as a dictionary key and appending the GO numbers to a
> list value. 
>
> Cheers, 
>
> Lenna
>
>
> On Mon, Dec 10, 2012 at 3:07 PM, Fernando <fpiston at gmail.com> wrote:
>
>     Hello everybody,
>     
>     I'm trying to perform a GOs annotation using the SIMAP database
>     which is
>     Blast2GO annotated. Everything is fine, but I have problems when I
>     try
>     to find the accession number in the file where entry numbers are
>     associated with their GOs. The problem is that the script does not
>     find
>     the number in the input file when really there is. I tried several
>     things
>     without good results (re.match, insert in a list and then extract
>     the element, etc)
>     File where the GOs are associated with entry numbers has this
>     structure (accession number, GO term, blats2go score):
>     1f0ba1d119f52ff28e907d2b5ea450db        GO:0007154      79
>     1f0ba1d119f52ff28e907d2b5ea450db        GO:0005605      99
>     
>     The python code:
>     #!/usr/bin/env python
>     import re
>     from Bio.Blast import NCBIXML
>     from Bio import SeqIO
>     
>     input_file = open('/home/fpiston/Desktop/test_go/test2.fasta',
>     'rU')
>     result_handle = open('/home/fpiston/Desktop/test_go/test2.xml',
>     'rU')
>     save_file = open('/home/fpiston/Desktop/test_go/test2.out', 'w')
>     
>     fh = open('/home/fpiston/Desktop/test_go/Os_Bd_Ta_blat2go_fake',
>     'rU')
>     q_dict =  SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
>     blast_records = NCBIXML.parse(result_handle)
>     
>     hits = []
>     
>     for blast_record in blast_records:
>         if blast_record.alignments:
>             list = (blast_record.query).split()
>             if re.match('ENA|\w*|\w*', list[0]) != None:
>                 list2 = list[0].split("|")
>                 save_file.write('%s\t' % list2[1])
>             else:
>                 save_file.write('%s\t' % list[0])
>             for alignment in blast_record.alignments:
>                 for hsp in alignment.hsps:
>                     h = alignment.hit_def    #at this point all right
>                     for l in fh:             #here, 'l' in not found
>     in 'fh'
>                         ls = l.split()
>                         if h in ls:
>                             print h
>                             print 'ok'
>                             save_file.write('%s\t' % ls[1])
>                     save_file.write('\n')
>             hits.append(blast_record.query.split()[0])
>     misses =set(q_dict.keys()) - set(hits)
>     
>     for i in misses:
>         list = i.split("|")
>         if len(list) > 1:
>             save_file.write('%s\t' % list[1])
>         else:
>             save_file.write('%s\t' % list)
>         save_file.write('%s\n' % 'no_match')
>     
>     save_file.close()
>     
>     
>     
>      Fernando
>      --
>     _______________________________________________
>     Biopython mailing list  -  Biopython at lists.open-bio.org
>     http://lists.open-bio.org/mailman/listinfo/biopython
>     
>

 Fernando
 --