[Bioperl-l] Refseq Hits

Jonathan Crabtree jonathancrabtree at gmail.com
Tue Jun 2 15:04:33 UTC 2009


Hi Shalabh-

I believe RefSeq is a non-redundant database, in which sequence entries with
identical sequences are merged and their descriptions are concatenated in
the FASTA defline.  If you look up the two accession numbers/gi numbers from
your search results I think you'll see that both are valid matches because
their polypeptide sequences are identical:

http://www.ncbi.nlm.nih.gov/protein/71082715
http://www.ncbi.nlm.nih.gov/protein/91762865

You're just getting a single match with two descriptions instead of two
matches with one description, but the sequence is the same and so, therefore
are the blast alignments.

Jonathan

On Tue, Jun 2, 2009 at 10:44 AM, shalabh sharma
<shalabh.sharma7 at gmail.com>wrote:

> Hi All,
>          This is not really a bioperl query, but i am really confused and
> need some help.
> I blasted some sequences against refseq database (locally). After parsing
> the blast result what i noticed that some description fields contain two
> hit
> names like:
> hit_name ->    gi|71082715|ref|YP_265434.1|
> Description -> ubiquitin binding protein [Candidatus Pelagibacter ubique
> HTCC1062] gi|91762865|ref|ZP_01264830.1| possible ubiquitin binding protein
> [Candidatus Pelagibacter ubique HTCC1002]
>
> So besides giving me description for hit_name (HTCC 1062) its also giving
> me
> HTCC 1002.
> I will really appreciate if someone can help me out.
>
> Thanks
> Shalabh
> _________________________________________________
> Shalabh Sharma
> Scientific Computing Professional Associate
> Department of Marine Sciences
> University of Georgia
> Athens, GA 30602-3636
>
> phone: 706-542-0341
> email: ssharmai at uga.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list