[Biopython] problem to find a accesion number in tab delimited file
Fernando
fpiston at gmail.com
Tue Dec 11 07:06:30 UTC 2012
Hi,
I'm sorry, the code has an error. Really the problem appear in the
following lines:
if h in ls:
print h
print 'ok'
I can not to find the hit ('h') in the file 'fh' by lines.
I am going to try with a dictionary as indicated by Lenna.
Thanks for your answers.
Lenna Peterson <arklenna at gmail.com> writes:
> Hi Fernando,
>
>
> A filehandle only reads through a file once, so the second time
> through the loop, `fh` is, as you found, empty.
>
> I would suggest reading the entirety of that file into a list-like
> object, which will be persistent. You might also consider using the
> accession number as a dictionary key and appending the GO numbers to a
> list value.
>
> Cheers,
>
> Lenna
>
>
> On Mon, Dec 10, 2012 at 3:07 PM, Fernando <fpiston at gmail.com> wrote:
>
> Hello everybody,
>
> I'm trying to perform a GOs annotation using the SIMAP database
> which is
> Blast2GO annotated. Everything is fine, but I have problems when I
> try
> to find the accession number in the file where entry numbers are
> associated with their GOs. The problem is that the script does not
> find
> the number in the input file when really there is. I tried several
> things
> without good results (re.match, insert in a list and then extract
> the element, etc)
> File where the GOs are associated with entry numbers has this
> structure (accession number, GO term, blats2go score):
> 1f0ba1d119f52ff28e907d2b5ea450db GO:0007154 79
> 1f0ba1d119f52ff28e907d2b5ea450db GO:0005605 99
>
> The python code:
> #!/usr/bin/env python
> import re
> from Bio.Blast import NCBIXML
> from Bio import SeqIO
>
> input_file = open('/home/fpiston/Desktop/test_go/test2.fasta',
> 'rU')
> result_handle = open('/home/fpiston/Desktop/test_go/test2.xml',
> 'rU')
> save_file = open('/home/fpiston/Desktop/test_go/test2.out', 'w')
>
> fh = open('/home/fpiston/Desktop/test_go/Os_Bd_Ta_blat2go_fake',
> 'rU')
> q_dict = SeqIO.to_dict(SeqIO.parse(input_file, "fasta"))
> blast_records = NCBIXML.parse(result_handle)
>
> hits = []
>
> for blast_record in blast_records:
> if blast_record.alignments:
> list = (blast_record.query).split()
> if re.match('ENA|\w*|\w*', list[0]) != None:
> list2 = list[0].split("|")
> save_file.write('%s\t' % list2[1])
> else:
> save_file.write('%s\t' % list[0])
> for alignment in blast_record.alignments:
> for hsp in alignment.hsps:
> h = alignment.hit_def #at this point all right
> for l in fh: #here, 'l' in not found
> in 'fh'
> ls = l.split()
> if h in ls:
> print h
> print 'ok'
> save_file.write('%s\t' % ls[1])
> save_file.write('\n')
> hits.append(blast_record.query.split()[0])
> misses =set(q_dict.keys()) - set(hits)
>
> for i in misses:
> list = i.split("|")
> if len(list) > 1:
> save_file.write('%s\t' % list[1])
> else:
> save_file.write('%s\t' % list)
> save_file.write('%s\n' % 'no_match')
>
> save_file.close()
>
>
>
> Fernando
> --
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
Fernando
--
More information about the Biopython
mailing list