[Biopython] Retrieving fasta seqs
Kevin
aboulia at gmail.com
Tue Feb 2 14:13:44 UTC 2010
My version uses set to store the Ids. It fails with too many records
( 60 million ) on 31 gb ram 64 bit centos python 2.4 can't figure
why. But works well with 1 million ids.
Can I propose this be part of the tutorial? It seems quite a popular
request. I was going to post on my blog but think more people will
benefit if it's on the wiki
I don't mind contributing the code and lessons
Kevin
Sent from my iPod
On 02-Feb-2010, at 9:49 PM, Peter <biopython at maubp.freeserve.co.uk>
wrote:
> On Tue, Feb 2, 2010 at 1:09 PM, Brad Chapman <chapmanb at 50mail.com>
> wrote:
>>
>> Finally, iterate through the large FASTA file, and write records of
>> interest:
>>
>> sec = open(sys.argv[1], 'r')
>> for rec in SeqIO.parse(sec, "fasta"):
>> if rec.id in listita:
>> SeqIO.write([rec], out_handle, "fasta")
>>
>
> Or, once you have read about generator expressions,
> this version might seem nicer - but perhaps a bit too
> complicated for a beginner:
>
> records = SeqIO.parse(open(sys.argv[1], 'r'), "fasta")
> wanted = (rec for rec in records if rec.id in listita)
> SeqIO.write(wanted, out_handle, "fasta")
>
> Another alternative, which could be quicker to run
> depending on the size of the files and the relative
> number of records wanted would be to use the
> Bio.SeqIO.index() function to pull out the desired
> records from the FASTA input file.
>
> Peter
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list