[Biopython] problem parsing embl file
Sameet Mehta
msameet at gmail.com
Mon Jun 28 16:02:33 EDT 2010
Hi Peter,
The Sequence length is 5580032, its first chromosome of yeast.
following are the first 10 lines of the file.
ID c212 standard; DNA; FUN; 666 BP.
AC c212;
FH Key Location/Qualifiers
FH
FT CDS complement(1..5662)
FT /gene="SPAC212.11"
FT /partial
FT /product="DNA helicase; no apparent orthologs"
FT /note="possibly pseudo as has strange promoter region"
FT misc_feature complement(1115..1339)
Also I believe that I am using the latest BioPython on my laptop. I
think I found the problem!! Indeed the first line is the problem. So
how can i circumvent this.
On Mon, Jun 28, 2010 at 3:56 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi Sameet,
>
> On Mon, Jun 28, 2010 at 8:20 PM, Sameet Mehta <msameet at gmail.com> wrote:
>> Hi,
>>
>> I am trying to parse a EMBL file created in 2004. The file contains a
>> single record for the entire chromosome. I have tried the following
>> two approaches
>>
>> r = SeqIO.parse( file( "chromosome1.contig.embl" ), "embl" ).next()
>> r = SeqIO.read( file( "chromosome1.contig.embl" ), "embl" )
>
> Those look fine - if you are using Biopython 1.54 you can just
> use the filename rather then opening it explicitly.
>
>> I get the following error:
>> ValueError Traceback (most recent call last)
>> ...
>> ValueError: Expected sequence length 666, found 5580032.
>>
>> Can you tell me if i am doing anything wrong. I am following the
>> instructions as given in the Bio.SeqIO wiki page.
>
> No, your code is fine. It looks like you have a broken EMBL file.
> Could you show me the first few lines of the EMBL file, and also
> have a look at it in a text editor to see if the sequence length
> really is 666bp, or 5580032 as Biopython thinks?
>
> (Or send the whole EMBL file to me off list?)
>
> In any case, that check seemed a bit strict (I've seen several
> examples of unofficial GenBank or EMBL files where the
> sequence length didn't match the header) so I relaxed this
> check to a warning for Biopython 1.54. You could try updating
> your copy of Biopython and see if it will accept the file then?
>
> Regards,
>
> Peter
>
--
Sameet Mehta, Ph.D.,
Phone: (301) 842-4791
More information about the Biopython
mailing list