[Bioperl-l] Re: RegSeq and NT_****** contig Id

Heikki Lehvaslaiho heikki at ebi.ac.uk
Tue Jul 29 08:02:58 EDT 2003


Jing,

I am sorry but there are no methods that would allow you to do this
automatically. It needs to be written, but no-one has. 

One problem is the context. It is difficult to write a method that would
know where to fetch the sequence from (although the ODDA system should
partially address that). 

	-Heikki

On Mon, 2003-07-28 at 20:19, jzhao wrote:
> Hi Heikki,
> 
> I've tried to use Bio::SeqIO(-format=>'genbank', -file=>'/path/to/file') to 
> parse the file into an object. However, i run into problems for NT_******* 
> genbank files. So this is what i have done: I downloaded a genbank format 
> file from NCBI, so this one contains three contig records (NT_039167, 
> NT_039168, and NT_039169), and none of these records has actual sequence in 
> the end, but some instructions of how to assembly the contig. and I believe 
> that's why SeqIO module was broken when it always try to retrieve the 
> sequence in the end of each record.
> 
> In bioperl, is there any specific handler for contig files?
> 
> Thanks,
> Jing
> 
> At 04:29 PM 7/15/2003 +0100, you wrote:
> >Rather than retrieving the files over the net using bioperl Bi::DB
> >modules, you should use ftp to downloand the files and then use
> >Bio::SeqIO(-format=>'genbank', -file=>'/path/to/file') to read them into
> >memory.
> >
> >         -Heikki
> >
> >On Tue, 2003-07-15 at 14:46, jzhao wrote:
> > > Heikki,
> > >
> > > Thank you very much for the reply. what I really want is the mouse contig
> > > retrieved as a bioperl sequence object (but not as a genbank file), so 
> > that
> > > I can use bioperl methods to obtain data and populate our own relational
> > > database without bother to write a parser for genbank flat file. This
> > > strategy works fine with smaller genomes like bacteria, virus and
> > > plasmodium. Now I wonder if there's any way for me to do the similar for
> > > mouse genome.
> > >
> > > Thanks a lot!
> > > Jing
> > >
> > >
> > > At 11:42 PM 7/14/2003 +0100, you wrote:
> > > >Jing,
> > > >
> > > >Bio::DB::RefSeq used to inherit from Bio::DB::NCBIHelper, but lately it
> > > >has been a subclass of Bio::DB::DBFetch. Looks like in the transition we
> > > >lost the warning:
> > > >
> > > >   $self->throw("NT_ contigs are whole chromosome files which are
> > > >     not part of regular database distributions. Go to
> > > >     ftp://ftp.ncbi.nih.gov/genomes/.")
> > > >         if $ids =~ /NT_/;
> > > >
> > > >It also true that the NCBI Entrez web interface now allows retrieving
> > > >NT_ contigs, so it would be possible to hack RefSeq class to retrieve
> > > >them. However, NCBI has asked us help to limit the load to their online
> > > >services, I am hesitant to do that when their eutils server is excluding
> > > >them (Or is it? Do we just need different parameters?). Downloading a
> > > >28,477,090 base mouse chromosome 1 sequence with tons of annotation is
> > > >certainly heavy. The warning should definitely be put back in.
> > > >
> > > >Yours,
> > > >         -Heikki
> > > >
> > > >P.S. DBI is for accessing local relational database and not needed here.
> > > >         -H.
> > > >
> > > >On Mon, 2003-07-14 at 19:36, jzhao wrote:
> > > > > Dear Sir,
> > > > >
> > > > > I was trying to retrieve some mouse contig data from the RefSeq 
> > database
> > > > > with Bioperl. My testing perl script looks like:
> > > > >
> > > > > use Bio::DB::RefSeq;
> > > > > use Bio::SeqIO;
> > > > > use DBI;
> > > > > use strict;
> > > > >
> > > > > my $gb = new Bio::DB::RefSeq;
> > > > > my $seq = $gb->get_Seq_by_acc('NT_039167');
> > > > >
> > > > > if ( defined $seq ) {
> > > > >       printf 'seq defined', '\n';
> > > > > }
> > > > > else {
> > > > >       printf 'seq undefined', '\n';
> > > > > }
> > > > >
> > > > > This script works with access ids like NC_000913 (bacteria genome), but
> > > > > with NT_****** contig id, the $seq returns undefined. I checked, these
> > > > > contig data are stored in RefSeq db ftp site, but why they are not
> > > > > available through DBI interface? anything I'm missing here?
> > > > >
> > > > > Thank you very much,
> > > > > Jing
> > > >--
> > > >______ _/      _/_____________________________________________________
> > > >       _/      _/                      http://www.ebi.ac.uk/mutations/
> > > >      _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
> > > >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> > > >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> > > >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> > > >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> > > >___ _/_/_/_/_/________________________________________________________
> >--
> >______ _/      _/_____________________________________________________
> >       _/      _/                      http://www.ebi.ac.uk/mutations/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
> >     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
> >    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
> >   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
> >      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> >___ _/_/_/_/_/________________________________________________________
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list