[Biopython] Parsing FASTA records based on headers

Dorota Matelska surykartka at gmail.com
Mon Jul 11 17:02:51 UTC 2011


Hi Fabio,

You forgot to change also the format name of your input file while using SeqIO.parse(). Your input is of fasta format, so instead of "genbank" put there "fasta", and it should work.

Hope this will help you :-)

Dorota

On Jul 11, 2011, at 6:07 PM, Fabio Gori wrote:

> Hi all,
> 
> I tried to parse a FASTA file to select the sequences whose headers satisfy a 
> condition. The condition is that the first word of the header belongs to a list 
> named SelectedSequencesId.
> In the page http://biopython.org/wiki/SeqIO, I found this example, where the 
> condition is that sequence length <300:
> 
> 1 from Bio import SeqIO
> 2 
> 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank")
> 4 short_seq_iterator = (record for record in input_seq_iterator \
> 5                      if len(record.seq) < 300)
> 6 
> 7 output_handle = open("short_seqs.fasta", "w")
> 8 SeqIO.write(short_seq_iterator, output_handle, "fasta")
> 9 output_handle.close()
> 
> so I tried to substitute line 5 with
> 5 record.id.split()[0] in SelectedSequencesId)
> 
> But it did not work.
> I was able to get what I wanted generating a list with all the records and 
> then parsing it, but I'd like to find a solution that uses a generating 
> expression.
> 
> Thanks in advance,
> 
> Fabio
> 
> -- 
> 
> F. Gori, PhD student
> Intelligent Systems
> ICIS (Institute for Computing and Information Sciences)
> Radboud University Nijmegen
> 
> Home Page: http://www.cs.ru.nl/~gori/
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython





More information about the Biopython mailing list