[Biopython] Blast sequences and SNPs detection

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 29 14:27:24 UTC 2012


On Thu, Mar 29, 2012 at 3:12 PM, fonz esposito
<alfonso.esposito1983 at hotmail.it> wrote:
>
> Dear All,
>
> I am Alfonso Esposito, I am a PhD student in environmental microbiology and I am
> quite new to the python community. I am trying to figure out how to make a script
> but I am going mad. I would need a script that takes as input a fasta file with N
> sequences, blast it on the nucleotide collection in NCBI and delivers a output file
> containing each SNP or gap with the correspondent nucleotide position (for
> example position 123 A->G or Gap between 145 and 146)... thanks everybody
> and I hope to reicive your answer

Hello Alfonso,

I am confused about your aim here. Surely a dedicated SNP detection tool would
be more appropriate than BLAST?

BLAST finds similar sequences, it doesn't find SNPs. Are you hoping to take
the matched sequences and lookup their annotation for SNPs? Or are you
wanting to treat BLAST pairwise sequence alignments as if there were
alternative strains/alleles and interpret the differences as SNPs? Perhaps
you plan to restrict your BLAST search to a known accession/reference
genome?

Also if your FASTA file with N sequence in it is actually high throughput
sequencing reads (e.g. Illumina reads), you probably want to start with a
mapping tool like BWA to do the alignment, not BLAST.

Peter



More information about the Biopython mailing list