[BioPython] reading large sequence files

Karin Lagesen karin.lagesen at labmed.uio.no
Wed Sep 24 03:28:28 EDT 2003


On Tue, Sep 23, 2003 at 03:57:07PM +0100, Leighton Pritchard wrote:
> Hi Karin,
> 
> Guessing that you have one .fna sequence file containing the whole sequence 
> (or each chromosome/plasmid), then you can use quick_FASTA_reader from 
> SeqUtils in a manner similar to:
> 
> from Bio.SeqUtils import quick_FASTA_reader
> 
> name, seq = quick_FASTA_reader(genome_file)[0]
> 
> 
> The quick_FASTA_reader reads in (name, sequence) tuples without doing 
> anything too clever or time-consuming like parsing sequences as 
> SeqRecords.  It's *much* faster than using the Fasta.Iterator class.
> 
> Hope this helps,

So do I...:)

However, I have come upon a weird thing:

My sequence file looks like this:

>gi|16127994|ref|NC_000913.1| Escherichia coli K12, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT
GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGC

and so on.

When I try to load in this genome it crashes:

  File "gene.py", line 11, in __readFastaFile
    print quick_FASTA_reader(file)[0]
  File "/site/python_packages//lib/python/Bio/SeqUtils/__init__.py",
line 281, in quick_FASTA_reader
    name,seq= entry.split('\n',1)
ValueError: unpack list of wrong size

The way I call it is as follows:

    def __readFastaFile(self, file):
        title, seq = quick_FASTA_reader(file)[0]
        return title, seq

Where file is a string containing the absolute file name. 

I am reasonably new to python, so please excuse me if I am doing
something obviously wrong/idiotic...:)

Karin
-- 
Karin Lagesen, PhD student
karin.lagesen at labmed.uio.no


More information about the BioPython mailing list