[Biopython] entire sequence file is unintentionally being loaded

Peter Cock p.j.a.cock at googlemail.com
Wed Nov 9 20:45:24 UTC 2016


Great.

If you are still using the index approach, you might be able
to use the get_raw method to output the records without
ever needing to parse them into SeqRecord objects?

e.g.

index1 = SeqIO.index(file1, "fastq")
index2 = SeqIO.index(file1, "fastq")
output_file = open("wanted.fastq", "wb")
for key in my_list_of_keys:
    # Use key+"/1" and key+"/2" if you have old-style names
    output_file.write(index1.get_raw(key))
    output_file.write(index2.get_raw(key))
output_file.close()

Peter

On Wed, Nov 9, 2016 at 8:29 PM, Liam Thompson <dejmail at gmail.com> wrote:
> Hi Peter
>
> Apologies for the inadequate description, but you understood the gist of it.
>
> Thank you for the suggestions. You were right about zip(), I was unaware
> that it would override the memory cautious operators. The itertools.izip
> seems to have sorted things out as suggested, although now I need to spend
> some time speeding the whole script up.
>
> I will try the .itervalues() as well. I did try that, but it complained as
> well but perhaps for different reasons. I will investigate and report back.
>
>
> Liam


More information about the Biopython mailing list