[Biopython] Parsing FASTA records based on headers
Fabio Gori
gori at cs.ru.nl
Thu Jul 14 11:21:31 UTC 2011
The condition "if record.id in SelectedSequencesId " works fine now, thank you.
Fabio
On Monday, July 11, 2011 09:51:09 pm Peter Cock wrote:
> On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori <gori at cs.ru.nl> wrote:
> > Hi all,
> >
> > I tried to parse a FASTA file to select the sequences whose headers
> > satisfy a condition.
> >
> > The condition is that the first word of the header belongs to a list
> > named SelectedSequencesId.
> >
> > In the page http://biopython.org/wiki/SeqIO, I found this example, where
> > the condition is that sequence length <300:
> >
> > ...
> >
> > so I tried to substitute line 5 with
> > 5 record.id.split()[0] in SelectedSequencesId)
>
> The SeqIO parse uses the first word of the ">" line as the id,
> so all you need is this: record.id in SelectedSequencesId
> rather than: len(record.seq) < 300
>
> > But it did not work.
>
> In what way? Did you also change the format to "fasta"
> as Dorota pointed out?
>
> Peter
--
F. Gori, PhD student
Intelligent Systems
ICIS (Institute for Computing and Information Sciences)
Radboud University Nijmegen
Post Address:
Intelligent Systems
Postbus 9010
6500 GL Nijmegen
The Netherlands
Visiting Address:
Room HG02.517
Faculty of Science
Heyendaalseweg 135
6525 AJ Nijmegen
Tel.: +31 (0)24 36 52703
E-mail: gori at cs.ru.nl
Home Page: http://www.cs.ru.nl/~gori/
More information about the Biopython
mailing list