[Bioperl-l] Warning message terminates Stream_by_query?

Jason Stajich jason at cgt.duhs.duke.edu
Wed Jun 2 17:44:37 EDT 2004


That is is annoying isn't it...

I am worried that you aren't able to download the entire sequence for
some reason and it is truncated.

The mesages shows it stopping in the middle of the tramslation:
translation="MTLSKETEVIFDWRRGVEYHSANPPLYDSSTFHQTSLG
GDVKYDYARSGNPNRELLEEKLARLEQGKFAFAFASGIAAISAVLLTFK
SGDHVILPDDVYGGTFRLTEQILNRFNIEFTTVDTTKLEQIEGAIQSNTK
LIYIETPSNPCFKITDIKAVSKIAEKHELLVAVDNTFMTPLGQSPLLLGAD
IVIHSATKFLSGHSDLIN
whereas it has a few more lines in the original chrom file.

I wonder if there is a problem downloading a whole chromsome record from
genbank - the web download is not the most reliable method and you'll find
like easier if you can download the .gbk files directly.

Depends on what you are working on I guess if you can predict the space of
accessions - if you are just working on finished/published genomes you can
grab stuff ftp://ftp.ncbi.nih.gov/genbank/genomes like this S.aureus
record and I bet you won't have the same problem.

-jason


On Wed, 2 Jun 2004, JAMES IBEN wrote:

> Hello list,
>
>   I have written a program (my first) which takes a Genbank
> query and retrieves sequences to pull out an intergenic region
> that I would like to work with.  However, when running the
> program I always at some point run into the following warning
> message:
>
> -------------------- WARNING ---------------------
> MSG: Unbalanced quote in:
> /locus_tag="SAV0358"
> /codon_start=1
> /transl_table=11
> /product="putative cystathionine beta-lyase"
> /protein_id="BAB56520.1"
> /db_xref="GI:14246126"
> /
> translation="MTLSKETEVIFDWRRGVEYHSANPPLYDSSTFHQTSLG
> GDVKYDYARSGNPNRELLEEKLARLEQGKFAFAFASGIAAISAVLLTFK
> SGDHVILPDDVYGGTFRLTEQILNRFNIEFTTVDTTKLEQIEGAIQSNTK
> LIYIETPSNPCFKITDIKAVSKIAEKHELLVAVDNTFMTPLGQSPLLLGAD
> IVIHSATKFLSGHSDLINo further qualifiers will be added for this
> feature
> ---------------------------------------------------
>
>   With different querys, the message refers to some other
> Genbank sequence (i.e. not always this particular entry).  The
> problem is that once I have run into this message, the
> seqence stream terminates, ending the program.
>   I have checked these entries and see nothing apparantly
> wrong with them (everything is bounded by quotes).  Can
> anyone tell me what this error arises from and perhaps what I
> can do to avoid it (or at least to skip any problematic
> sequences without interrupting the stream)?
>   The querys I have been sumitting should only pull about 250
> sequences if they were not interrupted.  Is there some sort of
> stream size limitation that I am hitting?  If there is a problem
> with this approach is there a better solution for my particular
> task than using Stream_by_query?
>
> Thanks for your help,
> James
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list