[BioPython] Sequence from Fasta

Leighton Pritchard lpritc at scri.ac.uk
Wed Jul 2 09:10:05 UTC 2008


Hi,

Just to chip in my two-pennorth, there's a self-defining tab-separated
format called "Axon Text File Format" that might be useful here, and which
is frequently seen in microarray work.  There's a description of its use for
.gal files here:

http://www.moleculardevices.com/pages/software/gn_genepix_file_formats.html

It's fairly flexible and straightforward as a format, and allows for
human-readable headers - which can contain column definitions, content
notes, or anything else you like - that is adaptable to a variety of output
conventions.  Of course, getting people to settle on a convention is the
tricky part, but at least any given file might be well-defined in its
headers while people argue over it ;)

It's hardly likely that going down that route in BioPython will result in an
file convention that's accepted industry-wide, but at least it might give
options for writing out FASTA file content in tab-separated form while
unambiguously retaining ID, name and definition strings along with the
sequence. 

L.

On 01/07/2008 17:06, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Giovanni wrote:
>> yes, I think it will be useful to implement.
>> I know of people who have written a customized fasta2tab script and
>> use it quite frequently, so it would be good to support such a task.
>> As you said before this format is commonly used in combination with
>> grep/gawk scripts.
> 
> I've gone for the simple option about how to parse the first field, its used
> as the record identifer (.id) and name only (nothing clever).  Here is my
> suggested code, which you are welcome to download and try out.
> 
> Bug 2533 - Support for simple "tab" format in Bio.SeqIO
> http://bugzilla.open-bio.org/show_bug.cgi?id=2533
> 
> If you want to try this yourself you'll need to download the new file
> TabIO.py into the Bio/SeqIO folder and update Bio/SeqIO/__init__.py to
> tell it about the new format (two new lines, see patch).
> 
> Peter
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).




More information about the Biopython mailing list