[Biopython] Retrieving fasta seqs

Peter biopython at maubp.freeserve.co.uk
Tue Feb 2 13:49:29 UTC 2010


On Tue, Feb 2, 2010 at 1:09 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>
> Finally, iterate through the large FASTA file, and write records of
> interest:
>
> sec = open(sys.argv[1], 'r')
> for rec in SeqIO.parse(sec, "fasta"):
>    if rec.id in listita:
>        SeqIO.write([rec], out_handle, "fasta")
>

Or, once you have read about generator expressions,
this version might seem nicer - but perhaps a bit too
complicated for a beginner:

records = SeqIO.parse(open(sys.argv[1], 'r'), "fasta")
wanted = (rec for rec in records if rec.id in listita)
SeqIO.write(wanted, out_handle, "fasta")

Another alternative, which could be quicker to run
depending on the size of the files and the relative
number of records wanted would be to use the
Bio.SeqIO.index() function to pull out the desired
records from the FASTA input file.

Peter




More information about the Biopython mailing list