[Bioperl-l] Bio::SeqIO::genbank

Chris Fields cjfields at illinois.edu
Thu Apr 8 16:09:09 EDT 2010


On Thu, 2010-04-08 at 21:39 +0200, Dave Messina wrote: 
> Hi Wayne,
> 
> > if $mol is not in the fixed list of genbank molecule types it should
> > be set to the default value of 'DNA', or some other smarter way of
> > forcing the molecule type into the fixed vocabulary would be a help.
> 
> Sounds good to me. Did you modify your local copy of Bio::SeqIO::genbank and try it out?
> 
> I will say, though, that Genbank is a tricky format, both to read and to write. Even if BioPerl would write Genbank records that are fully compliant with the spec, I'm pretty sure they would not be round-trippable*. That is, if you read a Genbank record into BioPerl and then wrote it back out, the output wouldn't exactly match the input.

This is true.  Jason and I talked about this recently and arrived pretty
much at the same conclusion.  We're mainly interested in parsing data
into a usable framework for manipulation.  Recreating data isn't our top
priority.

> I think that NCBI is trying to nudge people toward their XML format. I know it won't help this particular situation, but it might be an option to consider for the future.

The only problem I had with the XML spit out from eutils has been it was
an on-the-fly conversion of the ASN.1.  Not sure what the status of it
is now.

What's going on with the INSDC XML format?  That was supposed to be an
international standard and appeared more lightweight (if such a thing
can be said about XML).

> Speaking of which, what is the current status of the BioPerl Genbank XML parser? Jay, did you ever release that?
> 
> 
> Dave
> 
> 
> 
> * not that they were designed to be: http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats

I think it was in a branch, can't recall.

chris





More information about the Bioperl-l mailing list