[Bioperl-l] Fetching Fasta seqs from GenBank - Help request

Alberto Davila davila at ioc.fiocruz.br
Sat Mar 27 10:08:18 EST 2004


Hi Remo,

Thanks for catching this, so Sean was right as well... I was confused
because the first "$query_string" was returning me sequences with
"ribosomal" words... I just realized they were retrieved because they
have the "mitochondrial" word together:

>AY223674 Bothrops jararacussu specimen-voucher DPL 104 16S ribosomal
RNA gene,
partial sequence; mitochondrial gene for mitochondrial product.

>AY223673 Bothrops alternatus specimen-voucher DPL 2879 16S ribosomal
RNA gene,
partial sequence; mitochondrial gene for mitochondrial product.

I am now "querying" "ribosomal" and "mitochondrial" genes in the
"[title]" field... then, things are working ok now. Thanks !

Alberto



On Sat, 2004-03-27 at 10:12, sanges at biogem.it wrote:
> : 
> 
> Alberto,
>  
>  you have an error in your code:
>  
>  my $query_string = ('Bothrops[Organism] AND
>   ribosomal','Bothrops[Organism] AND mitochondrial');
>  
>  with this line you are putting an array into string,
>  try to add this line
>  
>  print $query_string
>  
>  and see: you have only the last value in your query_string!
>  
>  If I understood well your need you should use a quesry like this:
>  
>  my $query_string = 'Bothrops[Organism] AND (ribosomal OR mitochondrial)';
>  
>  Remo
>  
> Quoting Alberto Davila <davila at ioc.fiocruz.br>:
> 
> > Hi Sean,
> > 
> > Thanks for your valuable help !
> > 
> > I solved the problem using "Bio::DB::Query::GenBank", my goal was to
> > retrieve 2 types of sequences (mitochondrial and ribosomal) from
> > specific organism (eg Bothrops spp)... I am listing my script for those
> > interested to do something similar.. the only warning I get is:
> > 
> > [davila at tryps script]$ perl fetch2contaminant.pl
> > Useless use of a constant in void context at fetch2contaminant.pl line
> > 10.
> > 
> > I was not sure in which field (eg keyword or feature) I should look for
> > ribosomal and mitochondrial genes, but leaving blank gave some good
> > results.
> > 
> > Indeed Bioperl is powerful... a bit confusing for beginners too.
> > 
> > Thanks and best regards,
> > 
> > Alberto
> > 
> > 
> > #!/usr/local/bin/perl -w
> > 
> > use lib "/usr/local/bioperl14";
> > use strict;
> > use Bio::DB::Query::GenBank;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> > 
> > 
> > my $query_string = ('Bothrops[Organism] AND
> > ribosomal','Bothrops[Organism] AND mitochondrial');
> > my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide',
> >                                         -query=>$query_string,
> > 		                        -mindate => '1985',
> > 		                        -maxdate => '2004');
> > 
> > my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query);
> > 
> > #open a seqio handle for writing the outputfile in fasta
> > my $outfile = new Bio::SeqIO(-format=>'fasta',
> >                              -file=>'>contaminant.bothrops');
> > 				  
> >  while (my $s = $seqio->next_seq) {
> > 
> > #write the fasta  
> >    $outfile->write_seq($s);
> >    
> > 	}			  
> > 				  
> >  	  
> > 	  exit;
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Thu, 2004-03-25 at 16:37, Sean Davis wrote:
> > > Alberto,
> > > 
> > > I would second that.  If are doing more with this than retrieving raw
> > > sequence (if you care at all), maybe you could let Barry and I know what
> > you
> > > are trying to do more generally.  Bioperl is quite powerful, but it does
> > > take some direction to get started.
> > > 
> > > Sean
> > > 
> > > On 3/25/04 12:43 PM, "Barry Moore" <barry.moore at genetics.utah.edu> 
> wrote:
> > > 
> > > > Alberto-
> > > > 
> > > > You said, "the 'get_Stream_by_id' is returning me more than the
> > > > 'sequence per se'".  I'm not sure if this is what your asking, but 
> I'll
> > > > take a shot.  Since your are retrieving your two sequences in EMBL
> > > > format, you get all the associated information that you would see if 
> you
> > > > 
> > > > downloaded that same file from the web interface.  Your sequences are
> > > > stored by BioPerl as RichSeq objects which inherits a PrimarySeq
> > > > objects.  So that EMBL file data is stored in the RichSeq object and 
> the
> > > > 
> > > > associated PrimarySeq object it inherited.   Of course when you save
> > > > that locally as a fasta file, that extra information is lost.  If you
> > > > decide you need to use that data have a look at the documentation for
> > > > Bio::Seq::RichSeq and Bio::PrimarySeq and the SeqIO and Feature
> > > > Annotation HOW TOs to learn more.
> > > > 
> > > > Barry
> > > > 
> > > > Alberto Davila wrote:
> > > > 
> > > >> Thanks Jason,
> > > >> 
> > > >> I installed the IO::String, then it is working fine now. However I 
> have
> > > >> a doubt, the "get_Stream_by_id" is returning me more than the 
> "sequence
> > > >> per se", what is it ? My script and results are listed below. Finally 
> I
> > > >> would like to save (in my local disk) the retrieved sequences as 
> fasta
> > > >> files... is there any argument for that ?
> > > >> 
> > > >> Thanks again, Alberto
> > > >> 
> > > >> 
> > > >> #!/usr/local/bin/perl -w
> > > >> 
> > > >> use lib "/usr/local/bioperl14";
> > > >> use Bio::DB::BioFetch;
> > > >> use strict;
> > > >> use Bio::DB::WebDBSeqI;
> > > >> use HTTP::Request::Common 'POST';
> > > >> 
> > > >> my $format_type='fasta';
> > > >> my $stream;
> > > >> 
> > > >> 
> > > >> my $bf = new Bio::DB::BioFetch(-format        =>$format_type,
> > > >>                               -retrievaltype =>'tempfile',
> > > >>       -db            =>'EMBL');
> > > >>  
> > > >> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > > >> while (my $s = $stream->next_seq) {
> > > >>    print $s->seq,"\n\n\n";
> > > >> }              
> > > >>  
> > > >>  
> > > >>  exit;
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >> [davila at tryps script]$ perl gb-fetch-1.pl
> > > >> 
> agtagtgtactaccaagtatagataacgtttaaatattaaagttttggatcaaagccaaagatgattcgca
> > > > t
> > > >> 
> gctggtgctgattgtagttacagctgcaagcccagtgtatcagagatgtttccaagatggggctatagtga
> > > > a
> > > >> gcaaaacccatccaaagaggcagtcacagaagtgtccctaaaagatgatgttagca
> > > >> 
> > > > 
> > > >> 
> > > > 
> > > >> 
> cctggacctcctgtgcaagaacatgaaacanctgtggttcttccttctcctggtggcagctcccagatggg
> > > > t
> > > >> 
> cctgtcccaggtgcacctgcaggagtcgggcccaggactggggaagcctccagagctcaaaaccccacttg
> > > > g
> > > >> 
> tgacacaactcacacatgcccacggtgcccagagcccaaatcttgtgacacacctcccccgtgcccacggt
> > > > g
> > > >> 
> cccagagcccaaatcttgtgacacacctcccccatgcccacggtgcccagagcccaaatcttgtgacacac
> > > > c
> > > >> 
> tcccccgtgcccnnngtgcccagcacctgaactcttgggaggaccgtcagtcttcctcttccccccaaaac
> > > > c
> > > >> 
> caaggatacccttatgatttcccggacccctgaggtcacgtgcgtggtggtggacgtgagccacgaagacc
> > > > c
> > > >> 
> nnnngtccagttcaagtggtacgtggacggcgtggaggtgcataatgccaagacaaagctgcgggaggagc
> > > > a
> > > >> 
> gtacaacagcacgttccgtgtggtcagcgtcctcaccgtcctgcaccaggactggctgaacggcaaggagt
> > > > a
> > > >> 
> caagtgcaaggtctccaacaaagccctcccagcccccatcgagaaaaccatctccaaagccaaaggacagc
> > > > c
> > > >> 
> cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngaggagatgaccaagaaccaagtcagcctgacct
> > > > g
> > > >> 
> cctggtcaaaggcttctaccccagcgacatcgccgtggagtgggagagcaatgggcagccggagaacaact
> > > > a
> > > >> 
> caacaccacgcctcccatgctggactccgacggctccttcttcctctacagcaagctcaccgtggacaaga
> > > > g
> > > >> 
> caggtggcagcaggggaacatcttctcatgctccgtgatgcatgaggctctgcacaaccgctacacgcaga
> > > > a
> > > >> 
> gagcctctccctgtctccgggtaaatgagtgccatggccggcaagcccccgctccccgggctctcggggtc
> > > > g
> > > >> 
> cgcgaggatgcttggcacgtaccccgtgtacatacttcccaggcacccagcatggaaataaagcacccagc
> > > > g
> > > >> ctgccctgg
> > > >> 
> > > >> 
> > > >> 
> > > >> 
> > > >> On Tue, 2004-03-23 at 22:44, Jason Stajich wrote:
> > > >>  
> > > >> 
> > > >>> You need an additional perl module.
> > > >>> 
> > > >>> 
> > > >>> install IO::String from CPAN
> > > >>> 
> > > >>> There is a section on how to install additional perl modules in the
> > > >>> INSTALL document.
> > > >>> 
> > > >>> -j
> > > >>> 
> > > >>> On Tue, 23 Mar 2004, Alberto Davila wrote:
> > > >>> 
> > > >>>    
> > > >>> 
> > > >>>> Hi,
> > > >>>> 
> > > >>>> May I ask for some help ?
> > > >>>> 
> > > >>>> I am trying to use the BioFetch module in order to download several
> > > > seqs
> > > >>>> (from specific Organisms) from GenBank in fasta format, but looks
> > > > like I
> > > >>>> am missing "IO/String.pm" and other things.. should I install
> > > > additional
> > > >>>> bioperl modules (I have the Bioperl Core 1.4 installed) ? or use a
> > > >>>> different module for my purpose ?
> > > >>>> 
> > > >>>> My script and error msg are listed below.
> > > >>>> 
> > > >>>> Thanks and besr regards,
> > > >>>> 
> > > >>>> Alberto
> > > >>>> 
> > > >>>> ****
> > > >>>> 
> > > >>>> #!/usr/local/bin/perl -w
> > > >>>> 
> > > >>>> use lib "/usr/local/bioperl14";
> > > >>>> package Bio::DB::BioFetch;
> > > >>>> use strict;
> > > >>>> use Bio::DB::WebDBSeqI;
> > > >>>> use HTTP::Request::Common 'POST';
> > > >>>> 
> > > >>>> my $format_type='fasta';
> > > >>>> my $stream;
> > > >>>> 
> > > >>>> 
> > > >>>> my $bf = new Bio::DB::BioFetch(-format        =>$format_type',
> > > >>>>                               -retrievaltype =>'tempfile',
> > > >>>>                               -db            =>'EMBL');
> > > >>>> 
> > > >>>> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > > >>>> while (my $s = $stream->next_seq) {
> > > >>>>    print $s->seq,"\n";
> > > >>>>        }
> > > >>>> 
> > > >>>> 
> > > >>>>          exit;
> > > >>>> 
> > > >>>> 
> > > >>>> [davila at tryps script]$ perl gb-fetch-1.pl
> > > >>>> Can't locate IO/String.pm in @INC (@INC contains:
> > > >>>> /usr/local/bioperl14/i386-linux-thread-multi /usr/local/bioperl14
> > > >>>> /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3
> > > >>>> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2
> > > >>>> /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0
> > > >>>> /usr/lib/perl5/site_perl
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0
> > > >>>> /usr/lib/perl5/vendor_perl .) at
> > > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > > >>>> BEGIN failed--compilation aborted at
> > > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > > >>>> Compilation failed in require at gb-fetch-1.pl line 6.
> > > >>>> BEGIN failed--compilation aborted at gb-fetch-1.pl line 6.
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 




More information about the Bioperl-l mailing list