[Bioperl-l] Sim4 output parsing using Bioperl

Jason Stajich jason at cgt.duhs.duke.edu
Tue Jul 1 17:53:44 EDT 2003


Now that you've described the problem, the current code can't distinguish
between an empty report and the end of the file, each loop iteration

while( my $exonset = $parser->next_exonset ) {

}
will end whenever it hits an empty report.

You can short circuit this if you know ahead of time how many reports
there are in your file:
for( my $i = 0; $i < $maxnum; $i++ ){
  my $exonset = $parser->next_exonset;
  # OR
  # my @exons = $parser->parse_next_alignment;
}

The better solution is for the Sim4 parser to return gene objects (which I
thought it did) which will have 0 exons on a no alignment report, but will
return undef when it gets to the end of the report.

That's the best I got right now.

I have written SearchIO::sim4 which doesn't have this problem, but is only
on the main trunk (checkout via CVS) but it returns Bio::Search objects
not Exons.


-jason

On Tue, 1 Jul 2003, Arnaud Kerhornou wrote:

> Selon Jason Stajich <jason at cgt.duhs.duke.edu>:
> >
> > > --->
> > > 138736-138759  (192-218)   75%
> > > <---
> > >
> > > If this line is missing, it stops.
> > >
> > > Am I misusing the parser ?
> >
> > If the line is missing then there is no alignment so it can't build a
> > gene/exon for you.
>
> The sim4 output reports the (potential) alignments against an ESTs database. If
> the first EST doesn't align with the genomic sequence, it can't build an exon
> but the parsing should carry on and parse the information about the second EST
> and so on, until the all set of ESTs data have been processed.
>
> Arnaud
>
> > -jason
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> >
>
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list