[Bioperl-l] Sim4 output parsing using Bioperl

Arnaud Kerhornou axk at sanger.ac.uk
Tue Jul 1 18:42:47 EDT 2003


Jason Stajich wrote:

>On Tue, 1 Jul 2003, Arnaud Kerhornou wrote:
>
>  
>
>>Hi
>>
>>I'd like to use Bioperl to parse Sim4 output.
>>I run sim4 using this command, "sim4 genomicDNA.fa cDNAs.fa P=1 A=4"
>>
>>I can't get the parsing going through the all sim4 output, it stops on
>>this type of result:
>>
>>------------------>
>>seq1 = DICTY6P4_0001.gfseq.masked, 145194 bp
>>seq2 = CONTIGS.fa (>AU275246), 258 bp
>>
>> >DICTY6P4_0001
>> >AU275246  FC-IC Dictyostelium discoideum cDNA clone FC-IC1786, mRNA
>>sequence
>>    
>>
>
>THERE SHOULD BE LOCATION INFORMATION HERE
>
>Are you sure you you expect to see an alignment between these two
>sequences.
>  
>
I'd like to see the parser not expecting any location or alignment 
information because in sim4 output you may not have them.
The parser seems to expect a line such as the following one, where a hit 
is reported:

--->
138736-138759  (192-218)   75%
<---

If this line is missing, it stops.

Am I misusing the parser ?

>  
>
>>seq1 = DICTY6P4_0001.gfseq.masked, 145194 bp
>><------------------
>>
>>There is a "last" statement in the method "parse_next_alignment" in
>>Bio::Tools::Sim4::Results.pm
>>
>> /^seq1/ && do {
>>       if($started) {
>>           $self->_pushback($_);
>>           last;
>>       }
>>...
>>}
>>
>>Why this last statement ?
>>
>>    
>>
>Because we trying to detect the end of a report in case you've stored more
>than one report in a single file.  The next time you see seq1 it means a
>new report has started.
>  
>
>>Thanks
>>Arnaud
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>--
>Jason Stajich
>Duke University
>jason at cgt.mc.duke.edu
>
>  
>


More information about the Bioperl-l mailing list