[BioPython] Sequence from Fasta

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Tue Jul 1 09:37:53 UTC 2008


On Tue, Jul 1, 2008 at 10:04 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Mon, Jun 30, 2008 at 10:40 AM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>
>>> But I'm looking for something like this:
>>>
>>> Name Sequence without linebreak
>>>
>>> Example:
>>>
>>> MySequence atgcgcgctcggcgcgctcgfcgcgccccccatggctcgcgcactacagcg
>>> MySequence2 atgcgctctgcgcgctcgatgtagaatatgagatctctatgagatcagcatca
>>
>> Bioperl's SeqIO has support for a 'tab sequence format' which is
>> similar to this[1].
>> Maybe it could be useful in the future to add support for such a
>> format in biopython.
>>
>> [1] http://www.bioperl.org/wiki/Tab_sequence_format
>>
>
> That does look fairly straight forward.
>
> Do you happen to know how BioPerl reacts when the first field has spaces?
> I would suggest treating the first field like the ">" line in a FASTA file and
> taking the first word as the id/name and the whole field as the description.
>

It ignores any field after the first space in the header.

For example:
$ cat >seq1.fasta
>seq1 field2 field3
acatcgatgcatgctagctactgtacgac

$ cat > fasta2tab.pl
my $seqin = Bio::SeqIO->newFh("-file" => "seq1.fasta", "-format" => "fasta");
my $seqout = Bio::SeqIO->newFh("-fh" => \*STDOUT, "-format" => "tab");

while (<$seqin>)
{
        print $seqout $_;
}

$ perl fasta2tab.pl
seq1    acatcgatgcatgctagctactgtacgac


Do you need some help to implement this function?


> This format could be handy for some people who use the command line.  By
> converting between FASTA and the tab format (which can be done with sed
> or awk), each sequence is on one line, so you use tools like grep to filter your
> records.  Then convert back to fasta.  There's a nice blog page I found some
> time ago where the author describes his workflow for this.
>
> Peter
>



-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it



More information about the Biopython mailing list