[BioPython] reading large sequence files

Karin Lagesen karin.lagesen at labmed.uio.no
Tue Sep 23 10:19:21 EDT 2003


I am working on whole (procaryote) genomes, and due to this I need to
work with whole genomes at the time. I am for instance reading in the
ecoli genome like this:

ecoliDir = genomePath + ecoli
ecoliFiles = os.listdir(ecoliDir)
ecoliFile = fnmatch.filter(ecoliFiles, '*.fna')
ecoliFile = open(os.path.join(ecoliDir, ecoliFile[0]), 'r')
iterator = Fasta.Iterator(ecoliFile, parser)
fileContents = iterator.next()
ecoliSeq = fileContents.sequence

where genomePath tells the program where the genome files are, and
ecoli just gives the genome name. All genome files end in .fna

However, when I do it this way it takes a looooooooong time to read in
the genome, it currently takes almost 10 minutes. Is there some way I
can make this go faster? I need to work with alltogether 13 genomes,
and it would be nice if this part of it wasn't the bottleneck.

Karin Lagesen, PhD student
karin.lagesen at labmed.uio.no

More information about the BioPython mailing list