[Biopython] Retrieving fasta seqs

Kevin Lam aboulia at gmail.com
Tue Feb 2 15:30:54 UTC 2010


Traceback (most recent call last):
  File "test.py", line 22, in ?
    ids.add(recordf3)
# Then add each line to .ids.
MemoryError

the last id it processed is
1199_621_394_F3
which is probably 44739243rd record of 52465836 file

the code is
#!/usr/bin/python
##takes input file of single line ids and extracts the fasta from fasta file

import sys
sys.path.append("/home/g/lib/usr/lib64/python2.4/site-packages/")
import Bio
from Bio import SeqIO

inputhandle = open(sys.argv[1])
##
handle = open("Sample.csfasta")                                        #
Reference File
outfilename=sys.argv[1] + ".out"
outputhandle = open(outfilename,"w")

ids = set([])                                                           #
Set command to assign ids
##ids = set(['853_15_296','853_15_330','853_15_372'])
#debug

for line in inputhandle:
##    ids.add(line[:-1]) ##debug
    recordf3 = line[:-1] + '_F3'                  # Append each line of the
input file with ._F3.
    print recordf3                            #debug
*    ids.add(recordf3)
# Then add each line to .ids.*



On Tue, Feb 2, 2010 at 10:50 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Tue, Feb 2, 2010 at 2:29 PM, Kevin Lam <aboulia at gmail.com> wrote:
> > Yes I got a "memory error" when the job died.
> > The uncompressed ids file is about 680 mb. Perhaps storing in set will
> > increase the file space but
> > I assumed that it would still fit comfortably in 4gb of ram even if its a
> > 32bit limit.
> > its a mystery I am dying to solve if I have more time.
> >
> > I do not have the code right now will post up soon but it is almost the
> same
> > as the list method
>
> Kevin - If you can show us the script and the traceback it would be
> very helpful. This would tell us where the memory failure is (e.g.
> loading the list of IDs).
>
> Alvaro - Don't worry for your example, Kevin is trying to work on
> some very very big files (this is a continuation of an earlier
> discussion on the mailing list).
>
> Peter
>



More information about the Biopython mailing list