[Bioperl-l] genbank (blast) alignments

Bernd Web bernd.web at gmail.com
Fri Jul 24 12:12:56 UTC 2009


Hi,

Although this not refer to the original query/new alignment format
for blast 2.2.21, the BLAST -m 6 format (query-anchored) is relatively
easily transformed to a format the Bio::AlignIO can read as Chris
suggests:

This alignment format can be parsed as a Clustal alignment by
prepending a CLUSTAL header and removing the start positions from the
sequences (or as a SELEX alignment)


Bernd

On Fri, Jul 24, 2009 at 12:53 AM, Chris Fields<cjfields at illinois.edu> wrote:
> Lots of emails to answer, so little time.  Doesn't help when my VPN goes out
> either ;>
>
> What you want appears to be generating a multiple alignment from pairwise
> alignment.   The answer is 'very likely not'.  However, the local BLAST
> executable does have several options for generating alignments from HSP data
> (assuming that's what you mean):
>
> -m :     alignment view options:
>        0 = pairwise,
>        1 = query-anchored showing identities,
>        2 = query-anchored no identities,
>        3 = flat query-anchored, show identities,
>        4 = flat query-anchored, no identities,
>        5 = query-anchored no identities and blunt ends,
>        6 = flat query-anchored, no identities and blunt ends,
>        7 = XML Blast output,
>        8 = tabular,
>        9 tabular with comment lines [Integer] default = 0
>
> You can set this by reformatting on the BLAST web site (here's a chunk of
> the output, note the query):
>
> Query         61
> PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY  109
> NP_389430     61
> PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY  109
> YP_001421124  61
> PVTVGEIDITLYRDDLT-KKTSNE-E--PLVKGADIPADIT-------DQKVIVVDDVLY  109
> YP_078940     63
> KVTVGELDITLYRDDLS-KKTSNK-E--PLVKGADIPADIT-------DQKVILVDDVLY  111
> ZP_03053294   61
> PVIVGELDITLYRDDLT-KKTENQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF  109
> YP_001486689  61
> PVIVGELDITLYRDDLT-KKTDNQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF  109
> YP_002949168  60
> AVPVGELDITLYRDDLT-VKTIDH-E--PLVKGTDVPFDVT-------NKKVILVDDVLF  108
> ZP_01860800   61
> KMPVGEIDITLYRDDLT-VKTANE-E--PEVKGSDLPVDVT-------DKKVILIDDVLF  109
> ZP_04121773   61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> ZP_04218628   61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> YP_002316154  66
> SIPVGELDITLYRDDLT-VKTDDR-E--PLVKGTDVPFSVT-------NQKVILVDDVLF  114
> ZP_00240953   61
> EMEVGELDITLYRDDLT-LQSKNE-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> YP_037953     61
> EIEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> ZP_04193166   61
> KMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> NP_833611     61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> ZP_03018932   61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY  109
> ...
>
> We do not have a parser for that format, BTW, but it wouldn't be too hard to
> get something working quickly based on one of the current parsers.  Probably
> could go AlignIO or SearchIO (or both).
>
> chris
>
> On Jul 23, 2009, at 2:38 PM, Robert Buels wrote:
>
>> Wow, that silence is deafening.  I can't believe somebody who knows what
>> they're talking about hasn't written you back yet.
>>
>> Perhaps you could do some kind of transformation where you read in the
>> BLAST report with Bio::SearchIO, and then write to MSF with
>> Bio::AlignIO::msf?  You would probably need to do some fiddling to create
>> the proper objects and relationships that Bio::AlignIO::msf would want.
>>
>> But this reply probably isn't helpful, because you probably already knew
>> that much.  I'm mostly just trying to add to this thread so that people who
>> actually know a lot about BioPerl's functions in this area will see it and
>> hopefully be of more help.
>>
>> Rob
>>
>> --
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY  14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>
>> Thomas Keller wrote:
>>>
>>> Greetings,
>>> Blast 2.2.21 has a multi-sequence alignment feature that is really handy:
>>> put in the accession number of the refseq in one sequence field and a
>>> concatenated fasta file of the Sanger reads to align in the second box and
>>> it does the alignments. Unfortunately, the output is a series of alignments
>>> rather than the more useful msf format with all reads aligned with the
>>> reference.
>>> Is there a bioperl module that reads the blast alignments and converts it
>>> to an msf alignment?
>>> thanks,
>>> Tom
>>> kellert at ohsu.edu
>>> 503-494-2442
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list