[BioPython] Cannot parse/convert embl formatted files

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Aug 17 11:19:29 UTC 2006


Hi Chris,
  thank for your comments. I have filed bugreport at http://bugzilla.open-bio.org/show_bug.cgi?id=2077
Martin

Chris Fields wrote:
> Martin,
> 
> I think the Bioperl EMBL and GenBank parsers run all features through  a
> loop using regex to specifically look for the '\' tags and the  quotes. 
> So if there isn't a closing quote the parser chokes (spits  back
> something about lack of closed or paired quotes).  That may not  be too
> easy to work around.  It shouldn't die, though, so if there  isn't a
> balanced quote it could be added back in bioperl SeqIO.
> 
> I have been thinking about rewriting this as there is some redundancy 
> on the way the features are handled.  Just have my hands tied a bit  now
> (can't get to it yet).
> 
> Anyway, I think checking for balanced quotes is done from a  validation
> point-of-view.
> 
> Chris
> 
> On Aug 12, 2006, at 7:16 PM, Martin MOKREJŠ wrote:
> 
>> Hi Chris,
>>
>> Chris Fields wrote:
>>
>>> Just so everybody knows, EMBL recently made a few major revisions to
>>> their sequence format. These are now corrected in Bioperl CVS and
>>> will be available for the next dev release (hopefully out within a
>>> few months).
>>
>>
>> I will test that later. Thanks.
>>
>>>
>>> Odd about the unbalanced quotes; is that on the Bioperl end?  I
>>> missed that bit...
>>
>>
>> No, the input EMBL files are broken:
>>
>> And the relevant EBML file was:
>>
>> ID   5OSAR003520 standard; RNA; PLN; 213 BP.
>> ...
>> FT   5'UTR           1..213
>> FT                   /source="REFSEQ::XM_479174:1..213"
>> FT                   /gene="B1056G08.147"
>> FT                   /product="putative dihydropterin  pyrophosphokinase
>> FT   repeat_region   61..87
>> ...
>> //
>>
>> Still, I believe the parser could ignore this minot error and  terminate
>> the string (or treat it as terminated) when it is actually terminated
>> by a following feature line.



More information about the Biopython mailing list