[Bioperl-l] Problems downloading and parsing GenBank records

Moller, Abraham mollera2 at miamioh.edu
Wed Jun 21 03:14:12 UTC 2017


Hi Chris,

Thanks for explaining what is going on. The protein sequence (YP_008791527.1)
indeed comes from a GenBank record that has been removed (NC_022785). It
seems the FASTA file containing the list of sequence accessions I am using
(in each header) includes accessions to truncated or removed GenBank
records.

I wonder if should simply manually curate my FASTA file every time I come
upon such error (replace NC_022785 with the newer CP006567 - the newer
Streptomyces rapamycinicus NRRL 5491 genome). This seems to have come up
about a quarter of the way through fully parsing the FASTA file.

Regards,
Jon



On Tue, Jun 20, 2017 at 10:53 PM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> Hi Jon,
>
>
>
> It looks like the script is attempting to parse a bad Genbank record, one
> that was truncated by an external error from NCBI, and failing (which is
> probably a good thing if the record is faulty).
>
>
>
> I noticed the record for that protein no longer is valid (it’s
> discontinued); the genome was replaced with this one:
>
>
>
> https://www.ncbi.nlm.nih.gov/genome/?term=txid1343740[Organism:noexp]
>
>
>
> Was this an older cached record?
>
>
>
> chris
>
>
>
> *From: *Bioperl-l <bioperl-l-bounces+cjfields=il
> linois.edu at mailman.open-bio.org> on behalf of "Moller, Abraham" <
> mollera2 at miamioh.edu>
> *Date: *Tuesday, June 20, 2017 at 7:24 PM
> *To: *"bioperl-l at mailman.open-bio.org" <bioperl-l at mailman.open-bio.org>
> *Subject: *[Bioperl-l] Problems downloading and parsing GenBank records
>
>
>
> Hi all,
>
> I have been using a script to parse GenBank files to find taxonomic
> information corresponding to bacterial genomes. After several tries, my
> script has failed with the following error:
>
> ...
> Bacteria_Actinobacteria_Streptomycetales_Streptomycetaceae_
> Streptomyces_Streptomyces_sp._4F
> Bacteria_Actinobacteria_Streptomycetales_Streptomycetaceae_
> Streptomyces_Streptomyces_glaucescens
> --------------------- WARNING ---------------------
> MSG: Unbalanced quote in:
> /locus_tag="M271_25565"
> /inference="COORDINATES: ab initio prediction:GeneMarkS+"
> /note="Derived by automated computational analysis using
> gene prediction method: GeneMarkS+."
> /codon_start=1
> /transl_table=11
> /product="membrane protein"
> /protein_id="YP_008791527.1"
> /db_xref="GeneID:17596261"
> /translation="MPSPTSLAPAGPTATPTRTTATARRLMAICGTLLAALLCALSVG
> ANSASAHAALTSTDPADGSVVKTAPREVTLNFSEGVLLSGDSVRVLDPKGKRVDTGKT
> AHVDGKSSTAAAGLHSGLPDG Error: External viewer error: Empty Response. Bytes
> read: 0 Status: TimeoutNo further qualifiers will be added for this
> feature
> ---------------------------------------------------`
>
> After this, the script seems to halt for hours at least, if not
> indefinitely...
> Is this a BioPerl or GenBank issue? Any help would be appreciated.
>
> Thanks,
>
> Jon Moller
>
>
> --
>
> Abraham (Jon) Moller
>
> Microbiology and Chemistry | 2016
>
> Cell, Molecular, and Structural Biology (CMSB) BS/MS | Liang Bioinfo Lab
>
> Microbiology Club President
>
>
>
>
>



-- 
Abraham (Jon) Moller
Microbiology and Chemistry | 2016
Cell, Molecular, and Structural Biology (CMSB) BS/MS | Liang Bioinfo Lab
Microbiology Club President
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20170620/8017e706/attachment.html>


More information about the Bioperl-l mailing list