[Bioperl-l] AlignIO problems

Chris Fields cjfields at uiuc.edu
Sun Feb 25 19:58:23 UTC 2007


Bio::AlignIO::clustalw doesn't work with masked sequences; it parses  
the output quite literally as is, so any [.-] are treated as gaps.   
If the seqs are 100% identical then you will have a seq with 100%  
gaps and no sequence, thus giving you the warnings you see.

The best way to accomplish what you want is to not mask the sequence  
alignment to begin with when running clustalw/muscle/whatever.   
Exactly how are you generating these?  When I use clustalw no  
identity masking occurs by default.

chris

On Feb 25, 2007, at 7:28 AM, 江 文恺 wrote:

> hi, all,
> I use the AlignIO module to convert the alignment file.
> my original file is :
> CLUSTAL W(1.81) multiple sequence alignment
>
>
> dana_GLEANR_11249       
> MEAIAKHDFSATADDELSFRKTQTLKILNMEDDSNWYRAELDGKEGLIPSNYIEMKNHDW
> dere_GLEANR_7213       ...V...................I....................... 
> .............
> dgri_GLEANR_6962       .......................I....................... 
> .............
> FBgn0004638            .......................I....................... 
> .............
> dmoj_GLEANR_6118       ...........N...........I....................... 
> .............
> dper_GLEANR_18885      ...V...................I....................... 
> .............
> dpse_GLEANR_14384      ...V...................I....................... 
> .............
> dsec_GLEANR_3096       .................N.....I....................... 
> .............
> dsim_GLEANR_9744        
> -----------------------------...............................
> dvir_GLEANR_4811       .......................I....................... 
> .............
> dwil_GLEANR_10869      .......................I....................... 
> .............
> dyak_GLEANR_13576      .......................I....................... 
> .............
>
>
>
> dana_GLEANR_11249       
> YYGRITRADAEKLLSNKHEGAFLIRISESSPGDFSLSVKCPDGVQHFKVLRDAQSKFFLW
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       .................L............................. 
> .............
> dper_GLEANR_18885      ............................................... 
> .............
> dpse_GLEANR_14384      ............................................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249       
> VVKFNSLNELVEYHRTASVSRSQDVKLRDMIPEEMLVQALYDFVPQESGELDFRRGDVIT
> dere_GLEANR_7213       ............................................... 
> .............
> dgri_GLEANR_6962       ............................................... 
> .............
> FBgn0004638            ............................................... 
> .............
> dmoj_GLEANR_6118       ..............................V.D.............. 
> .............
> dper_GLEANR_18885      .......................E....................... 
> .............
> dpse_GLEANR_14384      .......................E....................... 
> .............
> dsec_GLEANR_3096       ............................................... 
> .............
> dsim_GLEANR_9744       ............................................... 
> .............
> dvir_GLEANR_4811       ............................................... 
> .............
> dwil_GLEANR_10869      ............................................... 
> .............
> dyak_GLEANR_13576      ............................................... 
> .............
>
>
>
> dana_GLEANR_11249      VTDRSDENWWNGEIGNRKGIFPATYVTPYHS
> dere_GLEANR_7213       ...............................
> dgri_GLEANR_6962       ...............................
> FBgn0004638            ...............................
> dmoj_GLEANR_6118       ............Q..................
> dper_GLEANR_18885      ...............................
> dpse_GLEANR_14384      ...............................
> dsec_GLEANR_3096       ...............................
> dsim_GLEANR_9744       ...............................
> dvir_GLEANR_4811       ...............................
> dwil_GLEANR_10869      ...............................
> dyak_GLEANR_13576      ...............................
>
>
> I want to change those "." characters back to alphabetic  
> expression, then i write the code like this:
> use Bio::AlignIO;
> my $in=Bio::AlignIO->new(-file =>"FBgn0000097.aln",
>                      -format => 'clustalw');
> my $out=Bio::AlignIO->new(-file=>">../clustalw/0097.aln",
>                       -format =>'clustalw');
> while (my $aln=$in->next_aln() ){
>    $aln->unmatch();
>    $aln->set_displayname_flat();
>    $out->write_aln($aln);
> }
>
> but when i run the code, there are error message like:
>
> -------------------- WARNING ---------------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> ---------------------------------------------------
>
> ------------- EXCEPTION  -------------
> MSG: No sequence with name [dsim_GLEANR_9744/1-182]
> STACK Bio::SimpleAlign::displayname /home/src/bioperl/bioperl-live/ 
> Bio/SimpleAlign.pm:2307
> STACK Bio::SimpleAlign::set_displayname_flat /home/src/bioperl/ 
> bioperl-live/Bio/SimpleAlign.pm:2374
> STACK toplevel aligntest.pl:11
>
> --------------------------------------
>
> I don't know where is the problem.
>
> Jiang
>
> _________________________________________________________________
> 免费下载 MSN Explorer:   http://explorer.msn.com/lccn/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list