[Biopython] problem parsing embl file

Sameet Mehta msameet at gmail.com
Mon Jun 28 15:20:49 EDT 2010


Hi,

I am trying to parse a EMBL file created in 2004.  The file contains a
single record for the entire chromosome.  I have tried the following
two approaches

r = SeqIO.parse( file( "chromosome1.contig.embl" ), "embl" ).next()
r = SeqIO.read( file( "chromosome1.contig.embl" ), "embl" )

I get the following error:
ValueError                                Traceback (most recent call last)

/home/sameet/NIH-work/downloads/2004_release/<ipython console> in <module>()

/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.pyc in
read(handle, format, alphabet)
    516     iterator = parse(handle, format, alphabet)
    517     try:
--> 518         first = iterator.next()
    519     except StopIteration:
    520         first = None

/usr/lib64/python2.6/site-packages/Bio/GenBank/Scanner.pyc in
parse_records(self, handle, do_features)
    418         #This is a generator function

    419         while True:
--> 420             record = self.parse(handle, do_features)
    421             if record is None : break
    422             assert record.id is not None

/usr/lib64/python2.6/site-packages/Bio/GenBank/Scanner.pyc in
parse(self, handle, do_features)
    401                     feature_cleaner = FeatureValueCleaner())
    402
--> 403         if self.feed(handle, consumer, do_features):
    404             return consumer.data
    405         else:

/usr/lib64/python2.6/site-packages/Bio/GenBank/Scanner.pyc in
feed(self, handle, consumer, do_features)
    383         consumer.sequence(sequence_string)
    384         #Calls to consumer.base_number() do nothing anyway

--> 385         consumer.record_end("//")
    386
    387         assert self.line == "//"

/usr/lib64/python2.6/site-packages/Bio/GenBank/__init__.pyc in
record_end(self, content)
   1047         and self._expected_size != len(sequence):
   1048             raise ValueError("Expected sequence length %i, found %i." \
-> 1049                              % (self._expected_size, len(sequence)))
   1050
   1051         if self._seq_type:

ValueError: Expected sequence length 666, found 5580032.

Can you tell me if i am doing anything wrong.  I am following the
instructions as given in the Bio.SeqIO wiki page.

Thanks for the help.
Sameet
-- 
Sameet Mehta, Ph.D.,
Phone:  (301) 842-4791


More information about the Biopython mailing list