[Bioperl-l] Alignment from blast report

Paolo Pavan paolo.pavan at gmail.com
Fri Mar 5 18:51:55 UTC 2010


Dear Mark,
Thank you again for your efforts spent on this theme, I have read and
tested carefully enough I hope, your new ads. I found they work
perfectly but either I miss some feature of the Tiling API (and this
is possible) or it could be that they don't entirely match what was
the initial problem; for sure my fault, I can explain better.
Let me start saying that what is needed is the merge of the alignments
returned by the get_tiled_alns method.

I have 2 seqs: h1, h2 (in the given example 00038 and 00053) and they
could be aligned against the same sequence q (named 1_0)
They cannot be aligned with common multiple sequence aligners like
clustalw since in this case is to be preferred a local alignment
algorithm instead of a global alignment. This specific case cannot be
handled by programs like cap3 either. I found that megablast -m 5 can
output a tiling of all the hits found versus the query, reporting this
entire.
I hope I gave the idea, if needed I can provide the input sequences of
the megablast.

Thank you again and have a nice week end,
Paolo

2010/3/4 Mark A. Jensen <maj at fortinbras.us>:
> Paolo -- Ok, there's now (r16900) an *experimental* method in
> Bio::Search::Tiling::MapTiling called get_tiled_alns().
> POD is below. Try it out and let me know--
> cheers,
> MAJ
>
>
> =head1 TILED ALIGNMENTS
>
> The experimental method L<ALIGNMENTS/get_tiled_alns> will use a tiling
> to concatenate tiled hsps into a series of L<Bio::SimpleAlign>
> objects:
>
> @alns = $tiling->get_tiled_alns($type, $context);
>
> Each alignment contains two sequences with ids 'query' and 'subject',
> and consists of a concatenation of tiling HSPs which overlap or are
> directly adjacent. The alignment are returned in C<$type> sequence
> order. When HSPs overlap, the alignment sequence is taken from the HSP
> which comes first in the coverage map array.
>
> The sequences in each alignment contain features (even though they are
> L<Bio::LocatableSeq> objects) which map the original query/subject
> coordinates to the new alignment sequence coordinates. You can
> determine the original BLAST fragments this way:
>
> $aln = ($tiling->get_tiled_alns)[0];
> $qseq = $aln->get_seq_by_id('query');
> $hseq = $aln->get_seq_by_id('subject');
> foreach my $feat ($qseq->get_SeqFeatures) {
>   $org_start = ($feat->get_tag_values('query_start'))[0];
>   $org_end = ($feat->get_tag_values('query_end'))[0];
>   # original fragment as represented in the tiled alignment:
>   $org_fragment = $feat->seq;
> }
> foreach my $feat ($hseq->get_SeqFeatures) {
>   $org_start = ($feat->get_tag_values('subject_start'))[0];
>   $org_end = ($feat->get_tag_values('subject_end'))[0];
>   # original fragment as represented in the tiled alignment:
>   $org_fragment = $feat->seq;
> }
>
>
> ----- Original Message ----- From: "Paolo Pavan" <paolo.pavan at gmail.com>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "Chris Fields" <cjfields at illinois.edu>; <bioperl-l at lists.open-bio.org>
> Sent: Tuesday, March 02, 2010 10:17 AM
> Subject: Re: [Bioperl-l] Alignment from blast report
>
>
>> I think you got the sense, thank you. Of course hsps from different
>> hits will be reflected in different elements aligned. I've attached
>> the example pasted (unix text) because is more readable, hoping will
>> not be held by the mailing server :-)
>>
>> Thank you,
>> Paolo
>>
>> 2010/3/2 Mark A. Jensen <maj at fortinbras.us>:
>>>
>>> This might a good method to have for Bio::Search::Tiling--
>>> you want to stitch together all the hsps and have the
>>> concatenated alignment returned as a Bio::SimpleAlign,
>>> correct? Tiling would create the right set of hsps from
>>> which to generate the composite alignment. I can
>>> try to get something working, but it may take a while-
>>> MAJ
>>> ----- Original Message ----- From: "Paolo Pavan" <paolo.pavan at gmail.com>
>>> To: "Chris Fields" <cjfields at illinois.edu>
>>> Cc: <bioperl-l at lists.open-bio.org>
>>> Sent: Tuesday, March 02, 2010 9:37 AM
>>> Subject: Re: [Bioperl-l] Alignment from blast report
>>>
>>>
>>> Hi Chris,
>>> Thank you for your reply. So I have to understand that since the
>>> get_aln method returns the HSP alignment, there is no way to retrieve
>>> the whole alignment as in the example pasted, isn't it?
>>> Basically I'm trying to use megablast as kind of multiple local
>>> alignment engine and actually I'm not pretty sure this is a good idea
>>> but in my particular case could be suitable. I mean that the example
>>> below reports only the portions of the sequences that align loosing
>>> the portions that does not, I'm not sure I gave the idea. What do you
>>> think about? Can you give me your opinion?
>>> If there isn't any module written yet, I can try to write a parser, it
>>> could be of any interest?
>>>
>>> Thank you,
>>> Paolo
>>>
>>> 2010/3/2 Chris Fields <cjfields at illinois.edu>:
>>>>
>>>> Paolo,
>>>>
>>>> You can get a Bio::SimpleAlign from the HSP object. The first code
>>>> example
>>>> in this section in the HOWTO demonstrates this:
>>>>
>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>>>
>>>> chris
>>>>
>>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote:
>>>>
>>>>> Dear all,
>>>>> Sorry for pushing up my post but, please does anyone have an hint for
>>>>> me?
>>>>> Maybe have I to send attached the report to the mailing list? I don't
>>>>> know attachment policies of the list, if it is allowed and is needed I
>>>>> can do that.
>>>>>
>>>>> Thank you,
>>>>> Paolo
>>>>>
>>>>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>>>>>
>>>>>> Sorry,
>>>>>> Maybe I forgot to add this is the megablast -m 5 output.
>>>>>>
>>>>>> Thank you again,
>>>>>> Paolo
>>>>>>
>>>>>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>>>>>>
>>>>>>> Hi all,
>>>>>>> I have just a brief question: I've got some megablast reports such
>>>>>>> the
>>>>>>> one I've pasted below.
>>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the
>>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the
>>>>>>> entire alignment represented as a Bio::SimpleAlign object or
>>>>>>> Bio::Align::AlignI implementing one?
>>>>>>>
>>>>>>> Thank you all,
>>>>>>> Paolo
>>>>>>>
>>>>>>>
>>>>>>> MEGABLAST 2.2.16 [Mar-25-2007]
>>>>>>>
>>>>>>>
>>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller
>>>>>>> (2000),
>>>>>>> "A greedy algorithm for aligning DNA sequences",
>>>>>>> J Comput Biol 2000; 7(1-2):203-14.
>>>>>>>
>>>>>>> Database: 00038-00053.fasta
>>>>>>> 2 sequences; 2001 total letters
>>>>>>>
>>>>>>> Searching..................................................done
>>>>>>>
>>>>>>> Query= 00038-00053
>>>>>>> (802 letters)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Score E
>>>>>>> Sequences producing significant alignments: (bits) Value
>>>>>>>
>>>>>>> ______00038
>>>>>>> 226 1e-62
>>>>>>> ______00053
>>>>>>> 115 3e-29
>>>>>>>
>>>>>>> 1_0 472
>>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531
>>>>>>> ______00038 883
>>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942
>>>>>>> ______00053
>>>>>>> ------------------------------------------------------------
>>>>>>>
>>>>>>> 1_0 532
>>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590
>>>>>>> ______00038 943
>>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001
>>>>>>> ______00053
>>>>>>> ------------------------------------------------------------
>>>>>>>
>>>>>>> 1_0 591
>>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650
>>>>>>> ______00038 1000
>>>>>>> ------------------------------------------------------------ 1001
>>>>>>> ______00053
>>>>>>> ------------------------------------------------------------
>>>>>>>
>>>>>>> 1_0 651
>>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710
>>>>>>> ______00038 1000
>>>>>>> ------------------------------------------------------------ 1001
>>>>>>> ______00053
>>>>>>> ------------------------------------------------------------
>>>>>>>
>>>>>>> 1_0 711
>>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770
>>>>>>> ______00038 1000
>>>>>>> ------------------------------------------------------------ 1001
>>>>>>> ______00053 1
>>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34
>>>>>>>
>>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802
>>>>>>> ______00038 1000 -------------------------------- 1001
>>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65
>>>>>>> Database: 00038-00053.fasta
>>>>>>> Posted date: Feb 25, 2010 4:47 PM
>>>>>>> Number of letters in database: 2001
>>>>>>> Number of sequences in database: 2
>>>>>>>
>>>>>>> Lambda K H
>>>>>>> 1.37 0.711 1.31
>>>>>>>
>>>>>>> Gapped
>>>>>>> Lambda K H
>>>>>>> 1.37 0.711 1.31
>>>>>>>
>>>>>>>
>>>>>>> Matrix: blastn matrix:1 -3
>>>>>>> Gap Penalties: Existence: 0, Extension: 0
>>>>>>> Number of Sequences: 2
>>>>>>> Number of Hits to DB: 17
>>>>>>> Number of extensions: 3
>>>>>>> Number of successful extensions: 3
>>>>>>> Number of sequences better than 10.0: 2
>>>>>>> Number of HSP's gapped: 2
>>>>>>> Number of HSP's successfully gapped: 2
>>>>>>> Length of query: 802
>>>>>>> Length of database: 2001
>>>>>>> Length adjustment: 10
>>>>>>> Effective length of query: 792
>>>>>>> Effective length of database: 1981
>>>>>>> Effective search space: 1568952
>>>>>>> Effective search space used: 1568952
>>>>>>> X1: 9 (17.8 bits)
>>>>>>> X2: 20 (39.6 bits)
>>>>>>> X3: 51 (101.1 bits)
>>>>>>> S1: 9 (18.3 bits)
>>>>>>> S2: 9 (18.3 bits)
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
>
>
> --------------------------------------------------------------------------------
>
>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>




More information about the Bioperl-l mailing list