[Biopython] problem to find a accesion number in tab delimited file
Lenna Peterson
arklenna at gmail.com
Tue Dec 11 03:14:38 UTC 2012
Hi Fernando,
A filehandle only reads through a file once, so the second time through the
loop, `fh` is, as you found, empty.
I would suggest reading the entirety of that file into a list-like object,
which will be persistent. You might also consider using the accession
number as a dictionary key and appending the GO numbers to a list value.
Cheers,
Lenna
On Mon, Dec 10, 2012 at 3:07 PM, Fernando <fpiston at gmail.com> wrote:
> Hello everybody,
>
> I'm trying to perform a GOs annotation using the SIMAP database which is
> Blast2GO annotated. Everything is fine, but I have problems when I try
> to find the accession number in the file where entry numbers are
> associated with their GOs. The problem is that the script does not find
> the number in the input file when really there is. I tried several things
> without good results (re.match, insert in a list and then extract the
> element, etc)
> File where the GOs are associated with entry numbers has this structure
> (accession number, GO term, blats2go score):
> 1f0ba1d119f52ff28e907d2b5ea450db GO:0007154 79
> 1f0ba1d119f52ff28e907d2b5ea450db GO:0005605 99
>
> The python code:
> #!/usr/bin/env python
> import re
> from Bio.Blast import NCBIXML
> from Bio import SeqIO
>
> input_file = open('/home/fpiston/Desktop/test_go/test2.fasta', 'rU')
> result_handle = open('/home/fpiston/Desktop/test_go/test2.xml', 'rU')
> save_file = open('/home/fpiston/Desktop/test_go/test2.out', 'w')
>
> fh = open('/home/fpiston/Desktop/test_go/Os_Bd_Ta_blat2go_fake', 'rU')
> q_dict = SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
> blast_records = NCBIXML.parse(result_handle)
>
> hits = []
>
> for blast_record in blast_records:
> if blast_record.alignments:
> list = (blast_record.query).split()
> if re.match('ENA|\w*|\w*', list[0]) != None:
> list2 = list[0].split("|")
> save_file.write('%s\t' % list2[1])
> else:
> save_file.write('%s\t' % list[0])
> for alignment in blast_record.alignments:
> for hsp in alignment.hsps:
> h = alignment.hit_def #at this point all right
> for l in fh: #here, 'l' in not found in 'fh'
> ls = l.split()
> if h in ls:
> print h
> print 'ok'
> save_file.write('%s\t' % ls[1])
> save_file.write('\n')
> hits.append(blast_record.query.split()[0])
> misses =set(q_dict.keys()) - set(hits)
>
> for i in misses:
> list = i.split("|")
> if len(list) > 1:
> save_file.write('%s\t' % list[1])
> else:
> save_file.write('%s\t' % list)
> save_file.write('%s\n' % 'no_match')
>
> save_file.close()
>
>
>
> Fernando
> --
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list