[Biopython] Retrieving fasta seqs
Kevin Lam
aboulia at gmail.com
Tue Feb 2 15:30:54 UTC 2010
Traceback (most recent call last):
File "test.py", line 22, in ?
ids.add(recordf3)
# Then add each line to .ids.
MemoryError
the last id it processed is
1199_621_394_F3
which is probably 44739243rd record of 52465836 file
the code is
#!/usr/bin/python
##takes input file of single line ids and extracts the fasta from fasta file
import sys
sys.path.append("/home/g/lib/usr/lib64/python2.4/site-packages/")
import Bio
from Bio import SeqIO
inputhandle = open(sys.argv[1])
##
handle = open("Sample.csfasta") #
Reference File
outfilename=sys.argv[1] + ".out"
outputhandle = open(outfilename,"w")
ids = set([]) #
Set command to assign ids
##ids = set(['853_15_296','853_15_330','853_15_372'])
#debug
for line in inputhandle:
## ids.add(line[:-1]) ##debug
recordf3 = line[:-1] + '_F3' # Append each line of the
input file with ._F3.
print recordf3 #debug
* ids.add(recordf3)
# Then add each line to .ids.*
On Tue, Feb 2, 2010 at 10:50 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:
> On Tue, Feb 2, 2010 at 2:29 PM, Kevin Lam <aboulia at gmail.com> wrote:
> > Yes I got a "memory error" when the job died.
> > The uncompressed ids file is about 680 mb. Perhaps storing in set will
> > increase the file space but
> > I assumed that it would still fit comfortably in 4gb of ram even if its a
> > 32bit limit.
> > its a mystery I am dying to solve if I have more time.
> >
> > I do not have the code right now will post up soon but it is almost the
> same
> > as the list method
>
> Kevin - If you can show us the script and the traceback it would be
> very helpful. This would tell us where the memory failure is (e.g.
> loading the list of IDs).
>
> Alvaro - Don't worry for your example, Kevin is trying to work on
> some very very big files (this is a continuation of an earlier
> discussion on the mailing list).
>
> Peter
>
More information about the Biopython
mailing list