[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit

Aaron J. Mackey amackey at pcbi.upenn.edu
Fri Jul 16 07:16:54 EDT 2004


On second (or third) thought, perhaps it would be better to formulate 
it like this:

   # oops, just read the first line of fasta record ...
   my $fasta = $line;
   while ( # still fasta, not gff # ) {
     $fasta .= $line;
   }
   # convert to seq:
   $seq = Bio::SeqIO->new(-fh => IO::String->new($fasta), -format => 
"fasta")->next_seq;

Otherwise, no matter how we fiddle with _readline, it's going to be 
ugly to "share" $self->{_readline} between two distinct objects.

-Aaron

On Jul 15, 2004, at 7:00 PM, Aaron J Mackey wrote:

>
> On Thu, 15 Jul 2004, Chris Mungall wrote:
>
>> However, the fasta parser sets the input record seperator $/=">\n", 
>> so I
>> actually have to read in up to but NOT including the next ^\> (or end 
>> of
>> file). Which means I actually have to switch $/ within the GFF parser!
>
> Hmm, doesn't it switch to "\n>" you mean?  Regardless, why should you 
> have to worry about it?  You _pushback, you send off to the next 
> parser; if it changes what $/ is, then Bio::Root::IO::_readline (or 
> maybe just a fasta.pm overriden version) could/should be savvy to it 
> (comments from the gallery?):
>
> Index: IO.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v
> retrieving revision 1.51
> diff -r1.51 IO.pm
> 420,422c420,422
> <            Note also that the current implementation does not handle 
> pushed
> <            back input correctly unless the pushed back input ends 
> with the
> <            value of $/.
> ---
>>            Note also that the current implementation does handle
>>            pushed back input correctly when the pushed back input
>>            doesn't end with whatever is the local value of $/.
> 441a442,455
>>
>>     # If $/ has changed since the push back occurred, we may need to
>>     # adjust the buffering ...
>>     if (defined($line) && defined($/) && $line =~ m!$/!) {
>>       # $/ is defined (not in file-slurp mode); does our current
>>       # line have too much stuff already?
>>       if (length($')) {
>>           $line = "$`$/";
>>           unshift @{$self->{'_readbuffer'}}, $';
>>         }
>>     } elsif (!eof($fh)) {
>>         # need to read some more ...
>>         $line .= <$fh>;
>>     }
>
>
>> The simple solution is to force everyone to preceed the fasta section 
>> with
>> a ##FASTA directive - however, the spec says this is optional.
>
> Nah, the simple solution is to fix BioPerl ;)
>
>> Of course, I could just go back to my own 8-line fasta parsing code
>> within GFF.pm.....
>
> No, then you'd need to worry about it keeping in sync with 
> SeqIO/fasta.pm, which is what we're trying to avoid, if possible.
>
> I repeat: thanks for the hard work!
>
> -Aaron
>
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania       email:  amackey at pcbi.upenn.edu
415 S. University Avenue         office: 215-898-1205
Philadelphia, PA  19104-6017     fax:    215-746-6697



More information about the Bioperl-l mailing list