[Bioperl-l] primer candidates validation by comparing the wgs blast results between fwd and rev.

Sean Davis sdavis2 at mail.nih.gov
Thu Feb 18 11:49:14 UTC 2010


For the problem that you describe below, you might consider primer3
(or eprimer3) for the design of the primer sets, if that part is still
in question.  As for the aligning back to the genome, you might
consider using a flavor of in silico pcr (isPcr).  There are bioperl
tools for working with isPcr, but you could also run it as a
standalone.

Sean

On Wed, Feb 17, 2010 at 9:10 PM, teetee <sclantw at hotmail.com> wrote:
>
> I am totally new to bioperl.
>
> I would like to see if anyone could give me a hint or clue for tackling this
> problem I am trying to solve:
> Use bioperl/perl script and CGI to create a primer quality control web
> interface
>
> the steps I would like to be automated:
> I design many primer pairs (~500+) flanking intron regions of silkworm wgs
> sequences close to cDNA/mRNA/EST/molecular anchor loci selected. After the
> primers are generated (I wish this step could be automated but it really
> can't), I have to blast each and every of them against the wgs database of
> the same organism to make sure there is no common hits in terms of the same
> contig number result between the forward and the reverse primers blastn hits
> to avoid the non-target amplification except for the target intron region.
> The steps I take to validate the primers are as follows:
> 1. At NCBI blastn webpage, put in the forward primer sequence in the search
> field, label it ("job title"), choose the wgs database and the organism, and
> click "submit" to start search.
> 2. Open another browser tab and go to NCBI blastn webpage, put in the
> reverse primer sequence in the search field, label it ("job title"), choose
> the wgs database and the organism, and click "submit" to start search.
> 3. On the forward primer blastn result page, write down the top 20 wgs
> sequence that was from build 2 genomic sequencing project (the title of each
> hit has a text string with certain format like
> "Bm_scaf<number>_contig<number>").
> 4. On the reverse primer blastn result page, write down the top 20 wgs
> sequence that was from build 2 genomic sequencing project (the title of each
> hit has a text string with certain format like
> "Bm_scaf<number>_contig<number>").
> 5. compare the recorded blast hits from step 3 and step 4 and list the
> common hit(s) between the two primer sequences (with the same scaffold and
> contig number)
> 6. show a warning if there is more than one common hit since there should be
> only one target hit.
>
> Example:
> I have these two primer sequences:
> GCATCGGTGAACGAGCTA
> CGCCTGCAAACGAGAATA
>
> First I blast each of the above primer sequences against wgs database bombyx
> mori (organismid:7091) on blast website
> http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome
> After I get the results in two different browser tabs, I record down the
> results. For example, from the forward primer result page(only the first
> several hits are listed):
> =================== first few hits from forward primer blast result
> ==================
> BABH01015134.1
> Bombyx mori DNA, contig: Bm_scaf21_contig15134,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
>
> BABH01038273.1
> Bombyx mori DNA, contig: Bm_scaf121_contig38273,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
>
> AADK01021213.1
> Bombyx mori strain Dazao Ctg021213, whole genome
> shotgun sequence
> 34.2 34.2 94% 0.12 100%
>
> BAAB01106839.1
> Bombyx mori DNA, contig477862, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
>
> BAAB01154920.1
> Bombyx mori DNA, contig585939, whole genome shotgun
> sequence
> 34.2 34.2 94% 0.12 100%
>
> BABH01007204.1
> Bombyx mori DNA, contig: Bm_scaf8_contig7204,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 32.2 32.2 88% 0.48 100%
>
> BABH01020379.1
> Bombyx mori DNA, contig: Bm_scaf33_contig20379,
> strain: p50T/Dazao, build 2, whole genome shotgun
> sequence
> 32.2 32.2 88% 0.48 100%
> ================================ end of the forward blast result
> ================================ first few hits from reverse primer blast
> result ==================
> BABH01015134.1
> Bombyx mori DNA, contig:
> Bm_scaf21_contig15134, strain: p50T/Dazao,
> build 2, whole genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
>
> AADK01021213.1
> Bombyx mori strain Dazao Ctg021213, whole
> genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
>
> AADK01032592.1
> Bombyx mori strain Dazao Ctg032592, whole
> genome shotgun sequence
> 36.2 36.2 100% 0.031 100%
>
> BAAB01106839.1
> Bombyx mori DNA, contig477862, whole genome
> shotgun sequence
> 36.2 36.2 100% 0.031 100%
>
> BABH01028024.1
> Bombyx mori DNA, contig:
> Bm_scaf56_contig28024, strain: p50T/Dazao,
> build 2, whole genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
>
> AADK01039561.1
> Bombyx mori strain Dazao Ctg039561, whole
> genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
>
> AADK01056852.1
> Bombyx mori strain Dazao Ctg063892, whole
> genome shotgun sequence
> 30.2 30.2 83% 1.9 100%
>
> BABH01001710.1
> Bombyx mori DNA, contig: Bm_scaf2_contig1710,
> strain: p50T/Dazao, build 2, whole genome
> shotgun sequence
> ===============================end of the reverse result
>
> >From the list, I would record down the ones with "Bm_scaf#_contig#"(ex.
> Bm_scaf21_contig15134) since that's the string pattern I would like to
> compare with the hits from reverse primer blast results.
> After I record down the first 20 qualified blast hits(hopefully with blast
> paser program I can use more than 20), I compare them with the ones from the
> reverse primer search result and see if there is any common result other
> than the target.
>
> I am OK to go through this validation process manually if there are only
> tens of primers I have to design. However with 500 and maybe more primers to
> come I believe there is an easier way.
>
>
> I imagine the code will have the following functions:
> input: user's primer pairs, multiple entry capatability
> output: compare the blastn results between both fwd and reverse primer and
> generate a list of common blastn hits (w/ same scaf# and contig# from build
> 2 wgs sequences - naming convention: Bm_scaf#_contig#) on the wgs (accept
> the blast search parameters through the web and pass to the blast command)
> background record-keeping mechanism: create records of the blast report for
> each primer vs wgs blastn results and properly name the files.
>
> I guess my question is:
> What would be the most stright-forward approach? (I know you probably think
> I already know the method since I post the question here, but more
> suggestions the better) and where should I start?
>
> My background:
> 1. I've written codes for retrieving PDB file upon user's PDB 4-letter
> protein ID entry and atom-atom distance measurement with the same setup(perl
> script+web CGI interface)
> 2. I've used command-line megablast to batch blast multiple sequences
> however my impression is that it's not intend to do short sequence
> blast(primers are usually around 20-24bp long).
> ref.:
> http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download
> 3. I can modify simple perl codes and do some text string menupilation in
> perl
> 4. I have my own linux box and have apache/perl/bioperl/cgi ready.
>
> --
> View this message in context: http://old.nabble.com/primer-candidates-validation-by-comparing-the-wgs-blast-results-between-fwd-and-rev.-tp27633496p27633496.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list