[Bioperl-l] How to read in FASTA formatted sequence without fastaheader?

Jason Stajich jason.stajich at duke.edu
Fri Sep 30 23:12:54 EDT 2005


On Sep 30, 2005, at 7:16 PM, Ryan Golhar wrote:

> True.  Ok, so I have a raw sequence instead of fasta...when I try to
> read in the sequence using raw format, it only reads in the first  
> line.
>
> I'm thinking of modifying the raw module and making a multilineraw
> module that will stop reading on a newline or EOF.
>
Well technically will have to detect the presence of multiple  
consecutive newlines as it currently separates on single newlines,  
hence your problem.

Seems like it is easier to use a standard file format in the future  
(and dare I say *standard* for anyone who might come along after you  
on a project), but you could probably modify raw.pm locally to  
separate on multiline newline.

Thinking about this I'm not sure how much help SeqIO is.  You just  
need a function that will give you back Bio::PrimarySeq objectsm  
isn't much more complicated than this below.
If you just add this to your perl script you will be able to split a  
sequence on double newlines and use the 'raw' format.

use strict;
use Bio::SeqIO;
use Bio::SeqIO::raw;


sub Bio::SeqIO::raw::next_seq{
    my ($self, at args) = @_;
    local $_ = "\n\n";
    my $nextline = $self->_readline();
    if( !defined $nextline ){ return undef; }

    my $sequence = uc($nextline);
    $sequence =~ s/\W//g;

    return  $self->sequence_factory->create(-seq => $sequence);
}

# your perl code now that will eventually do a Bio::SeqIO->new(- 
format => 'raw', .... );

> I don't want to modify the actual files because they might screw up  
> all
> my other scripts.  I could write one to insert the fasta header in  
> a tmp
> file then concatente the sequence to the file, but it just doesn't  
> seem
> like a clean solution to me.
>
>
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Richard
> Sucgang, PhD
> Sent: Friday, September 30, 2005 5:59 PM
> To: golharam at umdnj.edu
> Cc: 'Bioperl List'
> Subject: Re: [Bioperl-l] How to read in FASTA formatted sequence  
> without
> fastaheader?
>
>
>
> Well, maybe I am mistaken, but isn't the header line the item that
> makes a FASTA file a FASTA file?
> As in, now you have a raw sequence.
>
>
> On Sep 30, 2005, at 3:43 PM, Ryan Golhar wrote:
>
>
>> I'm looking for the easier way to read in a fasta file that doesn't
>> contain the fasta header, ie the ">..." line.
>>
>> I tried just specifying fasta, but then the first line of the  
>> sequence
>>
>
>
>> is taken as the name of the sequence.  I also tried specifying raw,
>> but
>> then only the first line is read.
>>
>> Is there any (easy) way to do this without reformatting the fasta  
>> file
>>
>
>
>> or creating a new one?  Thanks,
>>
>> Ryan
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/




More information about the Bioperl-l mailing list