[Bioperl-l] Memory requirements for conversion from embl to genbank

Sendu Bala bix at sendu.me.uk
Thu Aug 31 17:34:44 UTC 2006


Chris Fields wrote:
> Sendu, Martin,
> 
> This has been the problem with these particular example sequences.  The
> issue is that they do NOT conform to the EMBL standard or any sane sequence
> format standard.  Not that we stick to a standard vehemently ourselves, but
> we expect some sane formatting.  IMHO, (as I have repeatedly stated) we
> should not be responsible for trying to 'fix' broken sequence formats unless
> it is sanely possible and doesn't degrade performance/quality.  
> 
> Saying that, I do believe we should at the least have a warning or throw the
> appropriate error.  So if duplicate species are present, shouldn't there be
> a thrown error?

Bio::DB::Taxonomy::list should have been throwing an error before; it 
does now. It would be nicer really if embl.pm stopped adding to the 
classification array when it finds the end of one species 
classification, but then it's just guessing about how broken one 
particular file is.

I think the throw is good enough, let the user correct the file.



More information about the Bioperl-l mailing list