[BioPython] Sequence from Fasta
Peter
biopython at maubp.freeserve.co.uk
Tue Jul 1 09:45:48 UTC 2008
> It ignores any field after the first space in the header.
>
> For example:
> $ cat >seq1.fasta
>>seq1 field2 field3
> acatcgatgcatgctagctactgtacgac
>
> $ cat > fasta2tab.pl
> my $seqin = Bio::SeqIO->newFh("-file" => "seq1.fasta", "-format" => "fasta");
> my $seqout = Bio::SeqIO->newFh("-fh" => \*STDOUT, "-format" => "tab");
>
> while (<$seqin>)
> {
> print $seqout $_;
> }
>
> $ perl fasta2tab.pl
> seq1 acatcgatgcatgctagctactgtacgac
Interesting - and now what I had expected.
Could you try getting BioPerl to read in a TSV file like this:
seq1 field2 field3(tab)acatcgatgcatgctagctactgtacgac
seq2 with a description here(tab)ctcgctgnnacatcctagctactgta
I personally would prefer this sort of output as it preserves all the
data from a FASTA input file. On the other hand, if there are exsting
tools which wouldn't read this then it would be a bad idea.
>
> Do you need some help to implement this function?
>
Not really - but discussion of the behaviour details now is a good
idea. And some real world example TSV files would be good to add to
Biopython as test cases.
Thanks,
Peter
More information about the Biopython
mailing list