[Bioperl-l] Parsing needle/water output

Jason Stajich jason at bioperl.org
Fri May 15 16:03:05 EDT 2009


That's a problem from water not bioperl - if the report doesn't  
include the header then it won't be parsed by Bioperl.

You could consider replacing water with SSEARCH and GGSEARCH for  
needle functionality both in the FASTA package.  Bio::SearchIO parses  
those formats and IDs aren't truncated there.

-jason
On May 15, 2009, at 11:13 AM, shalabh sharma wrote:

> Hi Mahmut,                   Thanks a lot, actually this is exactly  
> what i
> was looking for. Its really helpful.
> One thing more, is there any way i can get the full id of the  
> sequence from
> the water output.
> like ->
>         My sequece id is: JCVI_READ_1103769852490
>
> and the water output gives something like this :
>
> 4206170-42077    409 CCATGCCGCGTGTATGAAGAAGGCCTTCGG   458
>
>        |||  .|||     ||.|          ||||||| |||.||||| ||.|
>
> JCVI_READ_110      1 CCA--ACGC-----TGCA----------GGGTTGT-AAG   31
>
> So when i run the script i get output:
>                                                       
> JCVI_READ_110     1
> 1078
>
> Instead of getting JCVI_READ_1103769852490 i am getting   just  
> JCVI_READ_110
>
> Thanks
> Shalabh
>
> On Fri, May 15, 2009 at 1:45 PM, Mahmut Uludag <uludag at ebi.ac.uk>  
> wrote:
>
>>
>>>>> yes, i tried to read the documentation about
>>>>> Bio::AlignIO::emboss, but there
>>>>> is not much in it.So is like i can call the same functions
>>>>> which are used in
>>>>> searchIO. I need start and end position of the pairwise alignment.
>>
>> Hi Shalabh,
>>
>> I copied below an example script that prints start and end position  
>> of
>> sequences used for constructing pairwise alignment by EMBOSS 'water'
>> program. After your last email this became an unnecessary example  
>> but i
>> thought it would be useful for people with similar questions in the
>> future.
>>
>> Regards,
>> Mahmut
>>
>>
>> use Bio::Factory::EMBOSS;
>> use Bio::Seq;
>> use Bio::AlignIO;
>>
>> my $aseq = Bio::Seq->new( -id => 'seq1', -seq => 'AACATGTAGGGATAG' );
>> my $bseq = Bio::Seq->new( -id => 'seq2', -seq => 'GCATGTTTAGATAG' );
>> my $factory      = new Bio::Factory::EMBOSS;
>> my $water        = $factory->program("water");
>> my $wateroutfile = 'out.water';
>>
>> $water->run(
>>       {
>>               -asequence => $aseq,
>>               -bsequence => $bseq,
>>               -gapopen   => '10.0',
>>               -gapextend => '0.5',
>>               -outfile   => $wateroutfile
>>       }
>> );
>>
>> my $alnin = new Bio::AlignIO(
>>       -format => 'emboss',
>>       -file   => $wateroutfile
>> );
>>
>> while ( my $aln = $alnin->next_aln ) {
>>       # process the alignment -- Bio::SimpleAlign objects
>>       foreach $seq ( $aln->each_seq() ) {
>>               print "\n" . $seq->display_id;
>>               print " " . $seq->start . " " . $seq->end;
>>       }
>>       print "\n" . $aln->percentage_identity;
>>       print "\n" . $aln->consensus_string(50) . "\n";
>> }
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org







More information about the Bioperl-l mailing list