[BioPython] reading large sequence files

Karin Lagesen karin.lagesen at labmed.uio.no
Tue Sep 23 10:19:21 EDT 2003


Hi!

I am working on whole (procaryote) genomes, and due to this I need to
work with whole genomes at the time. I am for instance reading in the
ecoli genome like this:

ecoliDir = genomePath + ecoli
ecoliFiles = os.listdir(ecoliDir)
ecoliFile = fnmatch.filter(ecoliFiles, '*.fna')
ecoliFile = open(os.path.join(ecoliDir, ecoliFile[0]), 'r')
iterator = Fasta.Iterator(ecoliFile, parser)
fileContents = iterator.next()
ecoliSeq = fileContents.sequence
ecoliFile.close()

where genomePath tells the program where the genome files are, and
ecoli just gives the genome name. All genome files end in .fna

However, when I do it this way it takes a looooooooong time to read in
the genome, it currently takes almost 10 minutes. Is there some way I
can make this go faster? I need to work with alltogether 13 genomes,
and it would be nice if this part of it wasn't the bottleneck.


Karin
-- 
Karin Lagesen, PhD student
karin.lagesen at labmed.uio.no


More information about the BioPython mailing list