[BioPython] Sequence from Fasta

Tue Jul 1 09:45:48 UTC 2008

> It ignores any field after the first space in the header.
>
> For example:
> $ cat >seq1.fasta
>>seq1 field2 field3
> acatcgatgcatgctagctactgtacgac
>
> $ cat > fasta2tab.pl
> my $seqin = Bio::SeqIO->newFh("-file" => "seq1.fasta", "-format" => "fasta");
> my $seqout = Bio::SeqIO->newFh("-fh" => \*STDOUT, "-format" => "tab");
>
> while (<$seqin>)
> {
>        print $seqout $_;
> }
>
> $ perl fasta2tab.pl
> seq1    acatcgatgcatgctagctactgtacgac

Interesting - and now what I had expected.

Could you try getting BioPerl to read in a TSV file like this:

seq1 field2 field3(tab)acatcgatgcatgctagctactgtacgac
seq2 with a description here(tab)ctcgctgnnacatcctagctactgta

I personally would prefer this sort of output as it preserves all the
data from a FASTA input file.  On the other hand, if there are exsting
tools which wouldn't read this then it would be a bad idea.

>
> Do you need some help to implement this function?
>

Not really - but discussion of the behaviour details now is a good
idea.  And some real world example TSV files would be good to add to
Biopython as test cases.

Thanks,

Peter