[Bioperl-l] AlignIO problems

Chris Fields cjfields at uiuc.edu
Mon Feb 26 15:05:50 UTC 2007


Make sure to keep this on the list, others may have some input.

You should be able to test the various sequence objects you're  
retrieving from Bio::DB::Fasta via Bio::SeqIO to see if they are what  
you're expecting, then track down the problematic sequences.  My  
guess is the odd seqs are due to the way you are using Bio::DB::Fasta  
for each of the files.  I'm wondering if you are having problems with  
indices overwriting one another and are thus getting back blank seq  
objects.

You should probably consider just indexing all of your files  
together; according to the POD you can use a single Bio::DB::Fasta to  
index all of the files in one go (indicate the path and use '-glob')  
and retrieve what you need that way.  Either that or separating them  
into separate directories so the indices are also separate.

chris

On Feb 25, 2007, at 9:50 PM, 江 文恺 wrote:

> Thank you for your help!
> May be you are right, I use the following code to create my seq  
> object arrays:
>          my $outfilename=$dmel;
>          my $ana_pep_db=Bio::DB::Fasta->new("dana.translation.fasta");
>          my $ana_cdna_db=Bio::DB::Fasta->new("dana.cds.fasta");
>          my $ere_pep_db=Bio::DB::Fasta->new("dere.translation.fasta");
>          my $ere_cdna_db=Bio::DB::Fasta->new("dere.cds.fasta");
>          my $mel_pep_db=Bio::DB::Fasta->new("dmel.translation.fasta");
>          my $mel_cdna_db=Bio::DB::Fasta->new("dmel.cds.fasta");
>          my $sec_pep_db=Bio::DB::Fasta->new("dsec.translation.fasta");
>          my $sec_cdna_db=Bio::DB::Fasta->new("dsec.cds.fasta");
>          my $sim_pep_db=Bio::DB::Fasta->new("dsim.translation.fasta");
>          my $sim_cdna_db=Bio::DB::Fasta->new("dsim.cds.fasta");
>          my $yak_pep_db=Bio::DB::Fasta->new("dyak.translation.fasta");
>          my $yak_cdna_db=Bio::DB::Fasta->new("dyak.cds.fasta");
>          my $ana_pep_obj=$ana_pep_db->get_Seq_by_id($dana);
>          my $ana_nuc_obj=$ana_cdna_db->get_Seq_by_id($dana);
>          my $ere_pep_obj=$ere_pep_db->get_Seq_by_id($dere);
>          my $ere_nuc_obj=$ere_cdna_db->get_Seq_by_id($dere);
>          my $mel_pep_obj=$mel_pep_db->get_Seq_by_id($dmel);
>          my $mel_nuc_obj=$mel_cdna_db->get_Seq_by_id($dmel);
>          my $sec_pep_obj=$sec_pep_db->get_Seq_by_id($dsec);
>          my $sec_nuc_obj=$sec_cdna_db->get_Seq_by_id($dsec);
>          my $sim_pep_obj=$sim_pep_db->get_Seq_by_id($dsim);
>          my $sim_nuc_obj=$sim_cdna_db->get_Seq_by_id($dsim);
>          my $yak_pep_obj=$yak_pep_db->get_Seq_by_id($ddyak);
>          my $yak_nuc_obj=$yak_cdna_db->get_Seq_by_id($ddyak);
>          push @prots, $ana_pep_obj;
>          push @cdna, $ana_nuc_obj;
>          push @prots, $ere_pep_obj;
>          push @cdna, $ere_nuc_obj;
>          push @prots, $mel_pep_obj;
>          push @cdna, $mel_nuc_obj;
>          push @prots, $sec_pep_obj;
>          push @cdna, $sec_nuc_obj;
>          push @prots, $sim_pep_obj;
>          push @cdna, $sim_nuc_obj;
>          push @prots, $yak_pep_obj;
>          push @cdna, $yak_nuc_obj;
>
> then I use the @prots as input for  my  $aln=$aln_factory->align 
> (\@prots);
> This method will create align files with sequences masked.
>
> But if I use fasta files(not an object) which contain protein  
> sequences as input, $inputfile='FBgn0000097.pep';
> @params=('outorder'=>'INPUT');
> $factory=Bio::Tools::Run::Alignment::Clustalw->new(@params);
> $aln=$factory->align($inputfile);
> #$aln->gap_char('-');
> $aln->map_chars('\.','-');
> $aln_out=Bio::AlignIO->new(-file=>">0097.aln",-format=>'clustalw');
> $aln_out->write_aln($aln);
>
> This methods create files without masking~~~
> I think sequence objects created by "get_Seq_by_id" from sequence  
> databases directly are not appropriate.
>
> Thank you for your suggestion again!
>
> Jiang.
>
>> From: Chris Fields <cjfields at uiuc.edu>
>> To: 姹?鏂囨伜 <biology0046 at hotmail.com>
>> Subject: Re: [Bioperl-l] AlignIO problems
>> Date: Sun, 25 Feb 2007 21:26:34 -0600
>>
>> I ran the same using a local fasta formatted file on my system  
>> which  works (no masking).
>>
>> Of note, the gaps were all marked as '.'.  You're gaps were both  
>> '.'  and '-',  which may mean that something is wrong with the seq  
>> objects  themselves.  Maybe SeqIO is misreading them?
>>
>> chris
>>
>> On Feb 25, 2007, at 7:34 PM, 姹?鏂囨伜 wrote:
>>
>>> I use the Bio::Tools::Run::Alignment::Clustalw module to carry  
>>> out  multiple alignment.
>>> my code is:
>>>         my @clustal_param=('outorder'=>'INPUT');
>>>         my $aln_factory=Bio::Tools::Run::Alignment::Clustalw->new  
>>> (@clustal_param);
>>>         my  $aln=$aln_factory->align(\@prots);###@prots is   
>>> array  of protein sequence objects
>>>         my $aln_out=Bio::AlignIO->new(-file=>">./dmel_group/ 
>>> clustal/ ${outfilename}.aln",-format=>'clustalw');
>>>
>>>         $aln_out->write_aln($aln);
>>> This code produce alignment which mask identity residues.
>>> But if i use clustalW directly, the output is normal.
>>> Thank you for your help~
>>>
>>> Jiang
>>
>
> _________________________________________________________________
> 免费下载 MSN Explorer:   http://explorer.msn.com/lccn

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list