[Biopython] Parsing FASTA records based on headers

Thu Jul 14 11:21:31 UTC 2011

The condition "if record.id in SelectedSequencesId " works fine now, thank you.

Fabio

On Monday, July 11, 2011 09:51:09 pm Peter Cock wrote:
> On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori <gori at cs.ru.nl> wrote:
> > Hi all,
> > 
> > I tried to parse a FASTA file to select the sequences whose headers
> > satisfy a condition.
> > 
> > The condition is that the first word of the header belongs to a list
> > named SelectedSequencesId.
> > 
> > In the page http://biopython.org/wiki/SeqIO, I found this example, where
> > the condition is that sequence length <300:
> > 
> > ...
> > 
> > so I tried to substitute line 5 with
> > 5 record.id.split()[0] in SelectedSequencesId)
> 
> The SeqIO parse uses the first word of the ">" line as the id,
> so all you need is this: record.id in SelectedSequencesId
> rather than: len(record.seq) < 300
> 
> > But it did not work.
> 
> In what way? Did you also change the format to "fasta"
> as Dorota pointed out?
> 
> Peter

-- 

F. Gori, PhD student
Intelligent Systems
ICIS (Institute for Computing and Information Sciences)
Radboud University Nijmegen

Post Address:
    Intelligent Systems
    Postbus 9010 
    6500 GL  Nijmegen 
    The Netherlands 
Visiting Address:
    Room HG02.517 
    Faculty of Science 
    Heyendaalseweg 135
    6525 AJ  Nijmegen 
Tel.:   +31 (0)24 36 52703
E-mail: gori at cs.ru.nl
Home Page: http://www.cs.ru.nl/~gori/