[Biopython] Parsing FASTA records based on headers

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 11 19:51:09 UTC 2011


On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori <gori at cs.ru.nl> wrote:
> Hi all,
>
> I tried to parse a FASTA file to select the sequences whose headers
> satisfy a condition.
>
> The condition is that the first word of the header belongs to a list
> named SelectedSequencesId.
>
> In the page http://biopython.org/wiki/SeqIO, I found this example, where the
> condition is that sequence length <300:
>
> ...
>
> so I tried to substitute line 5 with
> 5 record.id.split()[0] in SelectedSequencesId)

The SeqIO parse uses the first word of the ">" line as the id,
so all you need is this: record.id in SelectedSequencesId
rather than: len(record.seq) < 300

> But it did not work.

In what way? Did you also change the format to "fasta"
as Dorota pointed out?

Peter



More information about the Biopython mailing list