[Biopython] Cluster Blast's most frequent Alignments

Peter Cock p.j.a.cock at googlemail.com
Tue Sep 22 16:25:33 UTC 2015


Hi Stelios,

I guess you are using BLASTN since the ESTs are long compared to
Illumina reads etc typically supported by read mapping tools like BWA?

Most of the recent read mapping tools produce SAM/BAM output, so
I would look at converting the BLAST+ output into SAM/BAM, and then
trying existing downstream tools for quantifying expression.

The NCBI are working on adding SAM output to the BLAST+ suite,
but for now you must use an external script to do this. See:

http://blastedbio.blogspot.co.uk/2015/07/ncbi-working-on-sam-output-from-blast.html
https://www.biostars.org/p/53434/

Interestingly I've read BLASTN has also been used with early long
read data from PacBio and/or Oxford Nanopore before more
specialised tools were written.

Peter


On Fri, Sep 18, 2015 at 8:50 PM, Stelios Barberakis
<chefarov at protonmail.com> wrote:
> Hello all,
>
> I am new to Bioinformatics, so excuse me if I have got this all wrong.
>
> I am aligning multiple sequences (ESTs) to a genome (scaffolds fasta file)
> using NcbiblastnCommandline module, and for the purposes of my project I
> need to cluster the overlapping alignments in order to locate highly
> expressed genes. I was suprised not to found any articles online about a
> standard (formalised) methodology of this step.
>
> Well, one can easily locate the scaffolds that appear on multiple alignments
> using Biopython's parsers and just go on processing his data.
> The thing is that I was wondering if this process would be meaningful to be
> added to Biopython, for example as a method inside BlastIO package.
>
> If so, then we should decide on the output format of the new file/info
> produced as the result of this process. For example one idea would be (?) to
> gather all the alignments in one place discarding the source sequences
> (queries), and just highlight by some way, e.g introducing a new score
> index, the most expressed scaffolds.
>
> Any thoughts on this?
>
> Thank you for your time,
> Stelios
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list