[Biopython] LOCUS name length on GenBank output: option to adjust?

Peter Cock p.j.a.cock at googlemail.com
Thu Feb 2 15:38:58 UTC 2017


I've not checked lately but has there been any progress on the NCBI
moving GenBank format away from a strict positional interpretation of
the LOCUS line to being separator based? The issue and discussion
you referenced was back in 2015...

Practically speaking perhaps we could write a spec-breaking minimal
LOCUS line (leaving out things like the date and division which make
parsing a problem once the column positions are lost) with a warning?

Peter


On Thu, Feb 2, 2017 at 3:16 PM, Chevreux, Bastien
<bastien.chevreux at dsm.com> wrote:
> Dear list,
>
>
>
> could BioPython implement an option to adjust the strictness of name length
> checking of a sequence when writing GenBank output?
>
>
>
> I am aware of the short discussion in
>
>   https://github.com/biopython/biopython/issues/526
>
>
>
> and that BioPython wants to be strict on writing GenBank. However, I beg to
> reconsider this decision and allow for a user override.
>
>
>
> Background: annotation of metagenomics / metatranscriptomic datasets where
> one can easily have a million contigs or more. Projects were named
> accordingly short so that names of DNA sequences fit into 16 characters.
> However, what was not considered was GenBank output of peptides, where the
> locus names are <name_of_contig>+<underscore>+<CDScounter_in_contig> … and
> that may go up to 20.
>
>
>
> As we are talking protein sequences here and there is no known protein
>>99999 amino acids, there would be a lot of wiggle room to allow the user to
> set a 20 char limit (or more) during GB output.
>
>
>
> Best,
>
>   Bastien
>
>
>
> --
> DSM Nutritional Products Microbia Inc | Bioinformatics
> 60 Westview Street | Lexington, MA 02421 | United States
> Phone +1 781 259 7613 | Fax +1 781 259 0615
>
>
>
>
> ________________________________
>
> DISCLAIMER:
> This e-mail is for the intended recipient only.
> If you have received it by mistake please let us know by reply and then
> delete it from your system; access, disclosure, copying, distribution or
> reliance on any of it by anyone else is prohibited.
> If you as intended recipient have received this e-mail incorrectly, please
> notify the sender (via e-mail) immediately.
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list