[Bioperl-l] bioperl capability

Chris Fields cjfields at illinois.edu
Fri Aug 21 21:03:07 UTC 2009


On Aug 18, 2009, at 11:39 PM, deequan wrote:

>
> Howdy there,
>
>         Yes, quite right.  I apologize for the double posting.   
> Moreover, I
> appreciate your assistance in trying to sort out what can and cannot  
> be done
> with bioperl.  To address the problem previously stated, I put  
> together a
> remarkably misbehaving script that has the following parts:
>
> #Some parsing:
>         $q_start = $hsp->query->start;
>         $q_end = $hsp->query->end;
>         $h_start = $hsp->hit->start;
>         $h_end = $hsp->hit->end;
>         $length = $hsp->query->seqlength();
>         $id = $hit->accession;		
> 		
>          print OUT "$id\t"; 	
>          my $seq;
>          if($h_start<$h_end){
>
> #the bit per your recommendation		
> 		my $begin = $h_start-$q_start+1;
> 		my $cease = ($length - $q_end) + $h_end;
> 		my $strand = 1;
> 		my $factory = Bio::DB::GenBank->new(-format=> 'genbank',
> 			-seq_start =>$begin,
> 			-seq_stop =>$cease,
> 			-strand => $strand, #1 = plus, 2 = minus
> 		);
> 		$seq = $factory->get_Seq_by_acc($id);
> 	}else{#else assume backward, code not shown}
>

[

> #and some stuff to retrieve the sequence
>
> 	my $len = $seq->length();
> 	my $string = $seq->subseq(1, $len);
> 	print OUT "length = $len\t";
> 	print OUT "seq = $string\n";

]

Not sure what you are doing with the above sequence.  The abve

> In your previous reply, you said the code accessing the seq object  
> created
> by get_Seq_by_acc would have to pass that obj (here $seq) to a seqIO  
> for
> basic IO purposes.

# create an output seq stream somewhere
my $out = Bio::SeqIO->new(-file   => '>sequences.gb', -format =>  
'genbank');

....

# take seq object ($seq), write to the stream
$out->write_seq($seq);

> Not seeing exactly how to go about that, I tried some
> other functions in combination that seemed as though they should work
> (length() and subseq()).  Unfortunately, the program does not even  
> run to
> that point, as the script throws an exception:
>
> ------------- EXCEPTION -------------
> MSG: acc CP000948 does not exist
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> C:/Perl/site/lib/Bio/DB/WebDBSeqI.pm:18
> 2
> STACK toplevel test.pl:36
> -------------------------------------
>
>
> Oddly, the record corresponding to this accession number can be  
> found here:
> http://www.ncbi.nlm.nih.gov/nuccore/169887498

That's probably something to do with NCBI unfortunately; I'll have to  
look into it.  The best alternative is if you have BLAST reports that  
include the GI (or UID).  That's the most reliable number (using that  
in coordination with get_Seq_by_id), but it's not on by default, you  
have to indicate it's inclusion.  More recent versions of  
Bio::SearchIO::blast parse out the GI from the descriptor if it's  
present.

> Perhaps you'd be willing to offer another hint.  Thank you for your
> assistance thus far.  And on behalf of all posters, thank you for  
> sharing
> your knowledge.  'Preciate.
>
> David Q.

No problem.

chris




More information about the Bioperl-l mailing list