[Biopython-dev] Bug 2533 - Support for simple "tab" format in Bio.SeqIO

Wed Jul 2 13:03:36 UTC 2008

Hi all,

Do any of you have any comments or feedback on this suggested new
"simple tab separated" format for Bio.SeqIO?  To match BioPerl I plan
on calling it the "tab" format - see below.

Any real world example files would be good for the test suite.

One nice thing is it adds another output format, something we're a bit
short of in Bio.SeqIO with only fasta and some alignment formats (now
handled via Bio.AlignIO, i.e. pfam/stockholm, clustal and phylip).

Peter

---------- Forwarded message ----------
From: Peter <biopython at maubp.freeserve.co.uk>
Date: Tue, Jul 1, 2008 at 5:06 PM
Subject: Re: [BioPython] Sequence from Fasta
To: dalloliogm at gmail.com
Cc: biopython at biopython.org

Giovanni wrote:
> yes, I think it will be useful to implement.
> I know of people who have written a customized fasta2tab script and
> use it quite frequently, so it would be good to support such a task.
> As you said before this format is commonly used in combination with
> grep/gawk scripts.

I've gone for the simple option about how to parse the first field, its used
as the record identifer (.id) and name only (nothing clever).  Here is my
suggested code, which you are welcome to download and try out.

Bug 2533 - Support for simple "tab" format in Bio.SeqIO
http://bugzilla.open-bio.org/show_bug.cgi?id=2533

If you want to try this yourself you'll need to download the new file
TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to
tell it about the new format (two new lines, see patch).

Peter