[BioPython] SeqRecord to Genbank: use SeqIO?

Peter biopython at maubp.freeserve.co.uk
Wed Aug 6 09:27:17 UTC 2008


On Tue, Aug 5, 2008 at 11:47 PM, Cedar McKay wrote:
>
> Aha! I see. The following is on the SeqIO wiki page
> (http://www.biopython.org/wiki/SeqIO):
>
> "If you supply the sequences as a SeqRecord iterator, then for sequential
> file formats like Fasta or GenBank, the records can be written one by one"
>
> I think I wrongly thought this implied that Genbank Records can be written.
> But I see now that isn't the case, and the "Fasta or GenBank" files it
> references must be the input files that are parsed, not the format of the
> output.  I'm looking forward to this functionality.

I agree with you - in hindsight that bit of the wiki is misleading.
Sorry about that.  I was using GenBank as an example of a sequential
file format where the records can be written one by one (unlike for
example Clustal or most multiple sequence alignment formats where the
records are interleaved).  This is true, and a valid example of what I
meant by a "sequential file format" - as are SwissProt and EMBL.
However, this wording did wrongly give the impression that Bio.SeqIO
could write GenBank files (which Biopython 1.47 can't do).

>> Would it help if the error message for this situation was a little
>> more precise?  e.g. Rather than "Unknown format 'xxx'", perhaps
>> "Writing 'xxx' format is not supported yet, only reading it".
>>
> I think your new suggested message is more clear, but the existing one is
> clear enough. I simply thought there was a problem because I had it in my
> mind that genbank writing was now supported.

I've updated Bio.SeqIO and Bio.AlignIO, so that they will say:
ValueError: Reading format 'xxx' is supported, but not writing

rather than:
ValueError: Unknown format 'xxx'

when the format is known (but only as an input format).  I think this
is more helpful and more accurate.

Peter



More information about the Biopython mailing list