[Bioperl-l] Fetching Fasta seqs from GenBank - Help request

Sat Mar 27 08:12:00 EST 2004

: 

Alberto,

 you have an error in your code:

 my $query_string = ('Bothrops[Organism] AND
  ribosomal','Bothrops[Organism] AND mitochondrial');

 with this line you are putting an array into string,
 try to add this line

 print $query_string

 and see: you have only the last value in your query_string!

 If I understood well your need you should use a quesry like this:

 my $query_string = 'Bothrops[Organism] AND (ribosomal OR mitochondrial)';

 Remo

Quoting Alberto Davila <davila at ioc.fiocruz.br>:

> Hi Sean,
> 
> Thanks for your valuable help !
> 
> I solved the problem using "Bio::DB::Query::GenBank", my goal was to
> retrieve 2 types of sequences (mitochondrial and ribosomal) from
> specific organism (eg Bothrops spp)... I am listing my script for those
> interested to do something similar.. the only warning I get is:
> 
> [davila at tryps script]$ perl fetch2contaminant.pl
> Useless use of a constant in void context at fetch2contaminant.pl line
> 10.
> 
> I was not sure in which field (eg keyword or feature) I should look for
> ribosomal and mitochondrial genes, but leaving blank gave some good
> results.
> 
> Indeed Bioperl is powerful... a bit confusing for beginners too.
> 
> Thanks and best regards,
> 
> Alberto
> 
> 
> #!/usr/local/bin/perl -w
> 
> use lib "/usr/local/bioperl14";
> use strict;
> use Bio::DB::Query::GenBank;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> 
> 
> my $query_string = ('Bothrops[Organism] AND
> ribosomal','Bothrops[Organism] AND mitochondrial');
> my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide',
>                                         -query=>$query_string,
> 		                        -mindate => '1985',
> 		                        -maxdate => '2004');
> 
> my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query);
> 
> #open a seqio handle for writing the outputfile in fasta
> my $outfile = new Bio::SeqIO(-format=>'fasta',
>                              -file=>'>contaminant.bothrops');
> 				  
>  while (my $s = $seqio->next_seq) {
> 
> #write the fasta  
>    $outfile->write_seq($s);
>    
> 	}			  
> 				  
>  	  
> 	  exit;
> 
> 
> 
> 
> 
> 
> 
> On Thu, 2004-03-25 at 16:37, Sean Davis wrote:
> > Alberto,
> > 
> > I would second that.  If are doing more with this than retrieving raw
> > sequence (if you care at all), maybe you could let Barry and I know what
> you
> > are trying to do more generally.  Bioperl is quite powerful, but it does
> > take some direction to get started.
> > 
> > Sean
> > 
> > On 3/25/04 12:43 PM, "Barry Moore" <barry.moore at genetics.utah.edu> 
wrote:
> > 
> > > Alberto-
> > > 
> > > You said, "the 'get_Stream_by_id' is returning me more than the
> > > 'sequence per se'".  I'm not sure if this is what your asking, but 
I'll
> > > take a shot.  Since your are retrieving your two sequences in EMBL
> > > format, you get all the associated information that you would see if 
you
> > > 
> > > downloaded that same file from the web interface.  Your sequences are
> > > stored by BioPerl as RichSeq objects which inherits a PrimarySeq
> > > objects.  So that EMBL file data is stored in the RichSeq object and 
the
> > > 
> > > associated PrimarySeq object it inherited.   Of course when you save
> > > that locally as a fasta file, that extra information is lost.  If you
> > > decide you need to use that data have a look at the documentation for
> > > Bio::Seq::RichSeq and Bio::PrimarySeq and the SeqIO and Feature
> > > Annotation HOW TOs to learn more.
> > > 
> > > Barry
> > > 
> > > Alberto Davila wrote:
> > > 
> > >> Thanks Jason,
> > >> 
> > >> I installed the IO::String, then it is working fine now. However I 
have
> > >> a doubt, the "get_Stream_by_id" is returning me more than the 
"sequence
> > >> per se", what is it ? My script and results are listed below. Finally 
I
> > >> would like to save (in my local disk) the retrieved sequences as 
fasta
> > >> files... is there any argument for that ?
> > >> 
> > >> Thanks again, Alberto
> > >> 
> > >> 
> > >> #!/usr/local/bin/perl -w
> > >> 
> > >> use lib "/usr/local/bioperl14";
> > >> use Bio::DB::BioFetch;
> > >> use strict;
> > >> use Bio::DB::WebDBSeqI;
> > >> use HTTP::Request::Common 'POST';
> > >> 
> > >> my $format_type='fasta';
> > >> my $stream;
> > >> 
> > >> 
> > >> my $bf = new Bio::DB::BioFetch(-format        =>$format_type,
> > >>                               -retrievaltype =>'tempfile',
> > >>       -db            =>'EMBL');
> > >>  
> > >> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > >> while (my $s = $stream->next_seq) {
> > >>    print $s->seq,"\n\n\n";
> > >> }              
> > >>  
> > >>  
> > >>  exit;
> > >> 
> > >> 
> > >> 
> > >> 
> > >> [davila at tryps script]$ perl gb-fetch-1.pl
> > >> 
agtagtgtactaccaagtatagataacgtttaaatattaaagttttggatcaaagccaaagatgattcgca
> > > t
> > >> 
gctggtgctgattgtagttacagctgcaagcccagtgtatcagagatgtttccaagatggggctatagtga
> > > a
> > >> gcaaaacccatccaaagaggcagtcacagaagtgtccctaaaagatgatgttagca
> > >> 
> > > 
> > >> 
> > > 
> > >> 
cctggacctcctgtgcaagaacatgaaacanctgtggttcttccttctcctggtggcagctcccagatggg
> > > t
> > >> 
cctgtcccaggtgcacctgcaggagtcgggcccaggactggggaagcctccagagctcaaaaccccacttg
> > > g
> > >> 
tgacacaactcacacatgcccacggtgcccagagcccaaatcttgtgacacacctcccccgtgcccacggt
> > > g
> > >> 
cccagagcccaaatcttgtgacacacctcccccatgcccacggtgcccagagcccaaatcttgtgacacac
> > > c
> > >> 
tcccccgtgcccnnngtgcccagcacctgaactcttgggaggaccgtcagtcttcctcttccccccaaaac
> > > c
> > >> 
caaggatacccttatgatttcccggacccctgaggtcacgtgcgtggtggtggacgtgagccacgaagacc
> > > c
> > >> 
nnnngtccagttcaagtggtacgtggacggcgtggaggtgcataatgccaagacaaagctgcgggaggagc
> > > a
> > >> 
gtacaacagcacgttccgtgtggtcagcgtcctcaccgtcctgcaccaggactggctgaacggcaaggagt
> > > a
> > >> 
caagtgcaaggtctccaacaaagccctcccagcccccatcgagaaaaccatctccaaagccaaaggacagc
> > > c
> > >> 
cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngaggagatgaccaagaaccaagtcagcctgacct
> > > g
> > >> 
cctggtcaaaggcttctaccccagcgacatcgccgtggagtgggagagcaatgggcagccggagaacaact
> > > a
> > >> 
caacaccacgcctcccatgctggactccgacggctccttcttcctctacagcaagctcaccgtggacaaga
> > > g
> > >> 
caggtggcagcaggggaacatcttctcatgctccgtgatgcatgaggctctgcacaaccgctacacgcaga
> > > a
> > >> 
gagcctctccctgtctccgggtaaatgagtgccatggccggcaagcccccgctccccgggctctcggggtc
> > > g
> > >> 
cgcgaggatgcttggcacgtaccccgtgtacatacttcccaggcacccagcatggaaataaagcacccagc
> > > g
> > >> ctgccctgg
> > >> 
> > >> 
> > >> 
> > >> 
> > >> On Tue, 2004-03-23 at 22:44, Jason Stajich wrote:
> > >>  
> > >> 
> > >>> You need an additional perl module.
> > >>> 
> > >>> 
> > >>> install IO::String from CPAN
> > >>> 
> > >>> There is a section on how to install additional perl modules in the
> > >>> INSTALL document.
> > >>> 
> > >>> -j
> > >>> 
> > >>> On Tue, 23 Mar 2004, Alberto Davila wrote:
> > >>> 
> > >>>    
> > >>> 
> > >>>> Hi,
> > >>>> 
> > >>>> May I ask for some help ?
> > >>>> 
> > >>>> I am trying to use the BioFetch module in order to download several
> > > seqs
> > >>>> (from specific Organisms) from GenBank in fasta format, but looks
> > > like I
> > >>>> am missing "IO/String.pm" and other things.. should I install
> > > additional
> > >>>> bioperl modules (I have the Bioperl Core 1.4 installed) ? or use a
> > >>>> different module for my purpose ?
> > >>>> 
> > >>>> My script and error msg are listed below.
> > >>>> 
> > >>>> Thanks and besr regards,
> > >>>> 
> > >>>> Alberto
> > >>>> 
> > >>>> ****
> > >>>> 
> > >>>> #!/usr/local/bin/perl -w
> > >>>> 
> > >>>> use lib "/usr/local/bioperl14";
> > >>>> package Bio::DB::BioFetch;
> > >>>> use strict;
> > >>>> use Bio::DB::WebDBSeqI;
> > >>>> use HTTP::Request::Common 'POST';
> > >>>> 
> > >>>> my $format_type='fasta';
> > >>>> my $stream;
> > >>>> 
> > >>>> 
> > >>>> my $bf = new Bio::DB::BioFetch(-format        =>$format_type',
> > >>>>                               -retrievaltype =>'tempfile',
> > >>>>                               -db            =>'EMBL');
> > >>>> 
> > >>>> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > >>>> while (my $s = $stream->next_seq) {
> > >>>>    print $s->seq,"\n";
> > >>>>        }
> > >>>> 
> > >>>> 
> > >>>>          exit;
> > >>>> 
> > >>>> 
> > >>>> [davila at tryps script]$ perl gb-fetch-1.pl
> > >>>> Can't locate IO/String.pm in @INC (@INC contains:
> > >>>> /usr/local/bioperl14/i386-linux-thread-multi /usr/local/bioperl14
> > >>>> /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3
> > >>>> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2
> > >>>> /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0
> > >>>> /usr/lib/perl5/site_perl
> > >>>> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> > >>>> /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2
> > >>>> /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0
> > >>>> /usr/lib/perl5/vendor_perl .) at
> > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > >>>> BEGIN failed--compilation aborted at
> > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > >>>> Compilation failed in require at gb-fetch-1.pl line 6.
> > >>>> BEGIN failed--compilation aborted at gb-fetch-1.pl line 6.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>