[Bioperl-l] Problems getting NTs

Jason Stajich jason@cgt.mc.duke.edu
Fri, 31 May 2002 13:45:06 -0400 (EDT)


On Fri, 31 May 2002, Stefan Kirov wrote:

>                   $seq = new Bio::DB::RefSeq(-retrievaltype =>
> 'tempfile', -format => 'fasta');
>                   $tmp = $seq->get_Seq_by_acc('NT_005612');
> or
>         $refseq=new Bio::DB::RefSeq;
>         $geneloc = $refseq->get_Seq_by_acc('NT_005612');
> I tried both RefSeq and GenBank to get NT contigs, both FASTA and GB
> format and I keep getting this error:

> MSG: NT_ contigs are whole chromosome files which are not part of
> regulardatabase distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Doesn't that error message give you a hint?? NCBI is not serving up NT
contigs because they are genome size sequences.

If you want to play with this data, go to the genomes ftp site and
download them, then use Bio::Index::Fasta or Bio::DB::Fasta to index them
locally.  Be warned that standard bioperl doesn't play nicely with huge
sequence files as it holds the sequences in memory - you can use the
Bio::Seq::LargeSeq/LargePrimarySeq to manipulate these or better yet, use
Lincoln's Bio::DB::Fasta which provides a very nice layer for
indexing and interacting with large seq files using the bioperl
SeqI and DB::RandomAccessI interfaces.

> Different methods (get_seq_by_acc, get_seq_by_gi,version, bacth) give
> sometimes different error messages, but they all have one thing in
> common- they fail.
> Am I missing something fundamental or NCBI are in a middle of a big
> shift and if you want to work on the NTs you should do this locally? I
> think there was a small discussion on this previously, so if anyone has
> had any success with using bioperl to directly work on the NTs, please
> share your experience.
> Thanks!!!
>
>
> --
> Stefan A. Kirov, Ph.D.
> Dept Biochemistry and Cellular and Molecular Biology
> F233 Walters Life Sciences Building
> 1414 Cumberland Avenue
> University of Tennessee
> Knoxville, TN  37996-0840
> Tel: 865-974-6710
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu