[Bioperl-l] Alignment from blast report

Paolo Pavan paolo.pavan at gmail.com
Tue Mar 2 14:37:59 UTC 2010


Hi Chris,
Thank you for your reply. So I have to understand that since the
get_aln method returns the HSP alignment, there is no way to retrieve
the whole alignment as in the example pasted, isn't it?
Basically I'm trying to use megablast as kind of multiple local
alignment engine and actually I'm not pretty sure this is a good idea
but in my particular case could be suitable. I mean that the example
below reports only the portions of the sequences that align loosing
the portions that does not, I'm not sure I gave the idea. What do you
think about? Can you give me your opinion?
If there isn't any module written yet, I can try to write a parser, it
could be of any interest?

Thank you,
Paolo

2010/3/2 Chris Fields <cjfields at illinois.edu>:
> Paolo,
>
> You can get a Bio::SimpleAlign from the HSP object.  The first code example in this section in the HOWTO demonstrates this:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>
> chris
>
> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote:
>
>> Dear all,
>> Sorry for pushing up my post but, please does anyone have an hint for me?
>> Maybe have I to send attached the report to the mailing list? I don't
>> know attachment policies of the list, if it is allowed and is needed I
>> can do that.
>>
>> Thank you,
>> Paolo
>>
>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>> Sorry,
>>> Maybe I forgot to add this is the megablast -m 5 output.
>>>
>>> Thank you again,
>>> Paolo
>>>
>>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>>> Hi all,
>>>> I have just a brief question: I've got some megablast reports such the
>>>> one I've pasted below.
>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the
>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the
>>>> entire alignment represented as a Bio::SimpleAlign object or
>>>> Bio::Align::AlignI implementing one?
>>>>
>>>> Thank you all,
>>>> Paolo
>>>>
>>>>
>>>> MEGABLAST 2.2.16 [Mar-25-2007]
>>>>
>>>>
>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000),
>>>> "A greedy algorithm for aligning DNA sequences",
>>>> J Comput Biol 2000; 7(1-2):203-14.
>>>>
>>>> Database: 00038-00053.fasta
>>>>            2 sequences; 2001 total letters
>>>>
>>>> Searching..................................................done
>>>>
>>>> Query= 00038-00053
>>>>          (802 letters)
>>>>
>>>>
>>>>
>>>>                                                                  Score    E
>>>> Sequences producing significant alignments:                      (bits) Value
>>>>
>>>> ______00038
>>>> 226   1e-62
>>>> ______00053
>>>> 115   3e-29
>>>>
>>>> 1_0         472
>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531
>>>> ______00038 883
>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942
>>>> ______00053      ------------------------------------------------------------
>>>>
>>>> 1_0         532
>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590
>>>> ______00038 943
>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001
>>>> ______00053      ------------------------------------------------------------
>>>>
>>>> 1_0         591
>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650
>>>> ______00038 1000
>>>> ------------------------------------------------------------ 1001
>>>> ______00053      ------------------------------------------------------------
>>>>
>>>> 1_0         651
>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710
>>>> ______00038 1000
>>>> ------------------------------------------------------------ 1001
>>>> ______00053      ------------------------------------------------------------
>>>>
>>>> 1_0         711
>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770
>>>> ______00038 1000
>>>> ------------------------------------------------------------ 1001
>>>> ______00053 1    -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34
>>>>
>>>> 1_0         771  ccttaagtatttactttaacttgtcttgatca 802
>>>> ______00038 1000 -------------------------------- 1001
>>>> ______00053 35   ccttaagtatt-actttaacttgtcttgatca 65
>>>>   Database: 00038-00053.fasta
>>>>     Posted date:  Feb 25, 2010  4:47 PM
>>>>   Number of letters in database: 2001
>>>>   Number of sequences in database:  2
>>>>
>>>> Lambda     K      H
>>>>     1.37    0.711     1.31
>>>>
>>>> Gapped
>>>> Lambda     K      H
>>>>     1.37    0.711     1.31
>>>>
>>>>
>>>> Matrix: blastn matrix:1 -3
>>>> Gap Penalties: Existence: 0, Extension: 0
>>>> Number of Sequences: 2
>>>> Number of Hits to DB: 17
>>>> Number of extensions: 3
>>>> Number of successful extensions: 3
>>>> Number of sequences better than 10.0: 2
>>>> Number of HSP's gapped: 2
>>>> Number of HSP's successfully gapped: 2
>>>> Length of query: 802
>>>> Length of database: 2001
>>>> Length adjustment: 10
>>>> Effective length of query: 792
>>>> Effective length of database: 1981
>>>> Effective search space:  1568952
>>>> Effective search space used:  1568952
>>>> X1: 9 (17.8 bits)
>>>> X2: 20 (39.6 bits)
>>>> X3: 51 (101.1 bits)
>>>> S1: 9 (18.3 bits)
>>>> S2: 9 (18.3 bits)
>>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list