[Biopython] Blast sequences and SNPs detection

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 29 18:31:21 UTC 2012


I'm assuming Fonz meant to send this to the list, my reply is below.

On Thu, Mar 29, 2012 at 7:21 PM, fonz esposito
<alfonso.esposito1983 at hotmail.it> wrote:
> Dear Peter and dear all,
>
> first of all thanks for answering me so quickly, then I will try to explain
> better my problem: I have sequences from DGGE bands, they have some
> mistakes, mainly invalid basecall so I need to blast every single sequence
> (after trimming the first and last bases from the AB1) on NCBI, and then
> compare it to the best hit, checking out every mismatch. This could be
> automated, I did with biopython the blast and I can process the output but I
> did not manage to indicate the exact nucleotide number and what the mismatch
> is, and when there is a gap I don't exactly know how to tell the program to
> output the gap location in the original sequence I blasted.
>
> I hope that I was clearer now, let me know if you can help me
>
> Alfonso

So these are 'Sanger' capillary reads, and while you may have lots
I'm guessing this is under 100 in all? In that case using BLAST is
probably going to be OK - although depending on how many
sequences you have you might want to run that locally rather
than at the NCBI. Which database are you intending to search
against? i.e. Do you know what organism your bands should be
from (or even what kind of organism)?

What are you trying to do with any suspect bases where your
sequences differ from those in the database? I personally (if
the number of sequences was quite small) might think about
working directly from BLAST pairwise alignment to go back
to the chromatogram in Chromas (or an equivalent tool) to
see if the base call can be manually corrected, or is the
difference appears to be real.

Peter

P.S. You can read the (trimmed) sequences from ABI/AB1
files directly within Biopython 1.58 or later.



More information about the Biopython mailing list