[Bioperl-l] Refseq Hits

Smithies, Russell Russell.Smithies at agresearch.co.nz
Tue Jun 2 16:56:26 EDT 2009


The identifiers are separated by a Ctrl-A char ("\001") in the original non-redundant fasta header so you should be able to split them up again - assuming BioPerl didn't munge them.

--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of shalabh sharma
> Sent: Wednesday, 3 June 2009 3:16 a.m.
> To: Jonathan Crabtree
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] Refseq Hits
> 
> Hi Jonathan,                  Your information is really helpful. Thanks a
> lot.
> 
> -Shalabh
> 
> 
> On Tue, Jun 2, 2009 at 11:04 AM, Jonathan Crabtree <
> jonathancrabtree at gmail.com> wrote:
> 
> >
> > Hi Shalabh-
> >
> > I believe RefSeq is a non-redundant database, in which sequence entries
> > with identical sequences are merged and their descriptions are concatenated
> > in the FASTA defline.  If you look up the two accession numbers/gi numbers
> > from your search results I think you'll see that both are valid matches
> > because their polypeptide sequences are identical:
> >
> > http://www.ncbi.nlm.nih.gov/protein/71082715
> > http://www.ncbi.nlm.nih.gov/protein/91762865
> >
> > You're just getting a single match with two descriptions instead of two
> > matches with one description, but the sequence is the same and so, therefore
> > are the blast alignments.
> >
> > Jonathan
> >
> > On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma <shalabh.sharma7 at gmail.com
> > > wrote:
> >
> >> Hi All,
> >>          This is not really a bioperl query, but i am really confused and
> >> need some help.
> >> I blasted some sequences against refseq database (locally). After parsing
> >> the blast result what i noticed that some description fields contain two
> >> hit
> >> names like:
> >> hit_name ->    gi|71082715|ref|YP_265434.1|
> >> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> >> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding
> >> protein
> >> [Candidatus Pelagibacter ubique HTCC1002]
> >>
> >> So besides giving me description for hit_name (HTCC 1062) its also giving
> >> me
> >> HTCC 1002.
> >> I will really appreciate if someone can help me out.
> >>
> >> Thanks
> >> Shalabh
> >> _________________________________________________
> >> Shalabh Sharma
> >> Scientific Computing Professional Associate
> >> Department of Marine Sciences
> >> University of Georgia
> >> Athens, GA 30602-3636
> >>
> >> phone: 706-542-0341
> >> email: ssharmai at uga.edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Bioperl-l mailing list