[BioPython] Sequence from Fasta

Peter biopython at maubp.freeserve.co.uk
Tue Jul 1 08:04:24 UTC 2008


On Mon, Jun 30, 2008 at 10:40 AM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:

>> But I'm looking for something like this:
>>
>> Name Sequence without linebreak
>>
>> Example:
>>
>> MySequence atgcgcgctcggcgcgctcgfcgcgccccccatggctcgcgcactacagcg
>> MySequence2 atgcgctctgcgcgctcgatgtagaatatgagatctctatgagatcagcatca
>
> Bioperl's SeqIO has support for a 'tab sequence format' which is
> similar to this[1].
> Maybe it could be useful in the future to add support for such a
> format in biopython.
>
> [1] http://www.bioperl.org/wiki/Tab_sequence_format
>

That does look fairly straight forward.

Do you happen to know how BioPerl reacts when the first field has spaces?
I would suggest treating the first field like the ">" line in a FASTA file and
taking the first word as the id/name and the whole field as the description.

This format could be handy for some people who use the command line.  By
converting between FASTA and the tab format (which can be done with sed
or awk), each sequence is on one line, so you use tools like grep to filter your
records.  Then convert back to fasta.  There's a nice blog page I found some
time ago where the author describes his workflow for this.

Peter



More information about the Biopython mailing list