[Biopython] Parsing FASTA records based on headers
Fabio Gori
gori at cs.ru.nl
Mon Jul 11 16:07:59 UTC 2011
Hi all,
I tried to parse a FASTA file to select the sequences whose headers satisfy a
condition. The condition is that the first word of the header belongs to a list
named SelectedSequencesId.
In the page http://biopython.org/wiki/SeqIO, I found this example, where the
condition is that sequence length <300:
1 from Bio import SeqIO
2
3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank")
4 short_seq_iterator = (record for record in input_seq_iterator \
5 if len(record.seq) < 300)
6
7 output_handle = open("short_seqs.fasta", "w")
8 SeqIO.write(short_seq_iterator, output_handle, "fasta")
9 output_handle.close()
so I tried to substitute line 5 with
5 record.id.split()[0] in SelectedSequencesId)
But it did not work.
I was able to get what I wanted generating a list with all the records and
then parsing it, but I'd like to find a solution that uses a generating
expression.
Thanks in advance,
Fabio
--
F. Gori, PhD student
Intelligent Systems
ICIS (Institute for Computing and Information Sciences)
Radboud University Nijmegen
Home Page: http://www.cs.ru.nl/~gori/
More information about the Biopython
mailing list