[Bioperl-l] From Blast hits to Taxanomy lineage for Short DNA Sequences (reads)

Abhishek Pratap abhishek.vit at gmail.com
Fri Apr 15 20:16:26 UTC 2011


Hi Guys

I have one more related question. This time I have list of  NCBI locus names
and not GI numbers. What I need to do is to obtain lineage for each locus
name.

Is this functionality built in ?

Eg:

I want to seach NCBI for Locus name "CP000490" and get the organism lineage
?


 Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales;
            Rhodobacteraceae; Paracoccus.


This info is present in the gen bank record but I am not sure whats the best
way to fetch it specifically.
http://www.ncbi.nlm.nih.gov/nuccore/CP000490


Thanks for your help!
-Abhi

On Wed, Mar 9, 2011 at 7:25 PM, Abhishek Pratap <abhishek.vit at gmail.com>wrote:

> Thanks guys. I could not try either method today but will get back to
> you if I face problem.
>
> Best,
> -Abhi
>
> On Wed, Mar 9, 2011 at 9:34 AM, shalabh sharma
> <shalabh.sharma7 at gmail.com> wrote:
> > Hey Abhishek:
> >                     The other way to deal with this that you can download
> > the gi_taxaid file from ncbi.
> > Convert all your GI's to taxaid and use Bio::DB:Taxanomy.
> > http://www.bioperl.org/wiki/Module:Bio::DB::Taxonomy
> >
> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Taxon.html
> > I think there are lot of other options too, if you are interested you can
> > search for the thread which i started long time back.
> > Hope this helps.
> > -Shalabh Sharma
> > -----------------------------------------------
> > Shalabh Sharma
> > Scientific Computing Professional Associate (Bioinformatics Specialist)
> > Department of Marine Sciences
> > University of Georgia
> > Athens, GA 30602-3636
> >
> > On Wed, Mar 9, 2011 at 4:20 AM, Miguel Pignatelli <
> miguel.pignatelli at uv.es>
> > wrote:
> >>
> >> Hi Abhishek,
> >>
> >> For a non bioperl related solution, take a look at Bio::LITE::Taxonomy.
> >> It has been design to deal with great number of sequences (it is fast
> and
> >> efficient).
> >>
> >> You may also find interesting the Blast2lca tool,
> >>
> >> https://github.com/emepyc/Blast2lca
> >>
> >> It currently works with the best hits for each query (calculates the
> lower
> >> common ancestor), but if you want to use only the best hit, please drop
> me a
> >> line.
> >>
> >> Please, let me know if you need further help with any of these,
> >>
> >> Cheers,
> >>
> >> M;
> >>
> >>
> >>
> >> On 08/03/11 22:42, Abhishek Pratap wrote:
> >>>
> >>> Hi All
> >>>
> >>> I have results from different megablast of short reads(DNA sequences)
> >>> and after extracting the tophit for each read I want to bin them by
> >>> their lineage creating a tree.
> >>>
> >>> For example.
> >>>
> >>> If blast query hits the reference ->
> >>>
> >>>
> gi|196110604|gb|CP001103.1|__Alteromonas_macleodii_'Deep_ecotype',_complete_genome
> >>>
> >>> I want to get the lineage for this specie.
> >>>
> >>>
> >>>
> Bacteria;Proteobacteria;Gammaproteobacteria;Alteromonadales;Alteromonadaceae;Alteromonas;Alteromona
> >>>
> >>> The final goal is to do the above mapping as efficiently as possible.
> >>> Any pointers will be appreciated.
> >>>
> >>>
> >>> Thanks!
> >>> -Abhi
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>



More information about the Bioperl-l mailing list