[Bioperl-l] genbank (blast) alignments
Bernd Web
bernd.web at gmail.com
Fri Jul 24 12:12:56 UTC 2009
Hi,
Although this not refer to the original query/new alignment format
for blast 2.2.21, the BLAST -m 6 format (query-anchored) is relatively
easily transformed to a format the Bio::AlignIO can read as Chris
suggests:
This alignment format can be parsed as a Clustal alignment by
prepending a CLUSTAL header and removing the start positions from the
sequences (or as a SELEX alignment)
Bernd
On Fri, Jul 24, 2009 at 12:53 AM, Chris Fields<cjfields at illinois.edu> wrote:
> Lots of emails to answer, so little time. Doesn't help when my VPN goes out
> either ;>
>
> What you want appears to be generating a multiple alignment from pairwise
> alignment. The answer is 'very likely not'. However, the local BLAST
> executable does have several options for generating alignments from HSP data
> (assuming that's what you mean):
>
> -m : alignment view options:
> 0 = pairwise,
> 1 = query-anchored showing identities,
> 2 = query-anchored no identities,
> 3 = flat query-anchored, show identities,
> 4 = flat query-anchored, no identities,
> 5 = query-anchored no identities and blunt ends,
> 6 = flat query-anchored, no identities and blunt ends,
> 7 = XML Blast output,
> 8 = tabular,
> 9 tabular with comment lines [Integer] default = 0
>
> You can set this by reformatting on the BLAST web site (here's a chunk of
> the output, note the query):
>
> Query 61
> PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY 109
> NP_389430 61
> PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY 109
> YP_001421124 61
> PVTVGEIDITLYRDDLT-KKTSNE-E--PLVKGADIPADIT-------DQKVIVVDDVLY 109
> YP_078940 63
> KVTVGELDITLYRDDLS-KKTSNK-E--PLVKGADIPADIT-------DQKVILVDDVLY 111
> ZP_03053294 61
> PVIVGELDITLYRDDLT-KKTENQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF 109
> YP_001486689 61
> PVIVGELDITLYRDDLT-KKTDNQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF 109
> YP_002949168 60
> AVPVGELDITLYRDDLT-VKTIDH-E--PLVKGTDVPFDVT-------NKKVILVDDVLF 108
> ZP_01860800 61
> KMPVGEIDITLYRDDLT-VKTANE-E--PEVKGSDLPVDVT-------DKKVILIDDVLF 109
> ZP_04121773 61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> ZP_04218628 61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> YP_002316154 66
> SIPVGELDITLYRDDLT-VKTDDR-E--PLVKGTDVPFSVT-------NQKVILVDDVLF 114
> ZP_00240953 61
> EMEVGELDITLYRDDLT-LQSKNE-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> YP_037953 61
> EIEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> ZP_04193166 61
> KMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> NP_833611 61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> ZP_03018932 61
> EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109
> ...
>
> We do not have a parser for that format, BTW, but it wouldn't be too hard to
> get something working quickly based on one of the current parsers. Probably
> could go AlignIO or SearchIO (or both).
>
> chris
>
> On Jul 23, 2009, at 2:38 PM, Robert Buels wrote:
>
>> Wow, that silence is deafening. I can't believe somebody who knows what
>> they're talking about hasn't written you back yet.
>>
>> Perhaps you could do some kind of transformation where you read in the
>> BLAST report with Bio::SearchIO, and then write to MSF with
>> Bio::AlignIO::msf? You would probably need to do some fiddling to create
>> the proper objects and relationships that Bio::AlignIO::msf would want.
>>
>> But this reply probably isn't helpful, because you probably already knew
>> that much. I'm mostly just trying to add to this thread so that people who
>> actually know a lot about BioPerl's functions in this area will see it and
>> hopefully be of more help.
>>
>> Rob
>>
>> --
>> Robert Buels
>> Bioinformatics Analyst, Sol Genomics Network
>> Boyce Thompson Institute for Plant Research
>> Tower Rd
>> Ithaca, NY 14853
>> Tel: 503-889-8539
>> rmb32 at cornell.edu
>> http://www.sgn.cornell.edu
>>
>>
>> Thomas Keller wrote:
>>>
>>> Greetings,
>>> Blast 2.2.21 has a multi-sequence alignment feature that is really handy:
>>> put in the accession number of the refseq in one sequence field and a
>>> concatenated fasta file of the Sanger reads to align in the second box and
>>> it does the alignments. Unfortunately, the output is a series of alignments
>>> rather than the more useful msf format with all reads aligned with the
>>> reference.
>>> Is there a bioperl module that reads the blast alignments and converts it
>>> to an msf alignment?
>>> thanks,
>>> Tom
>>> kellert at ohsu.edu
>>> 503-494-2442
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list