[Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Thu Nov 10 11:34:39 UTC 2011


Issue #3312 has been updated by Peter Cock.


Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
</pre>

At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:

<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
</pre>

Curious. It seems LALIGN is starting to write out another alignment, but then doesn't.

It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines.

I have updated Biopython to give a more helpful error message in this case:
https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25

<pre>
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
...     print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
</pre>

Are you on Bill Pearson's FASTA mailing list? We should report this.

Peter

----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312

Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.

@lalign36 -m 10 at.fasta os.fasta >test.aln@

@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
    assert len(a) == 2, "Should be pairwise!"
    print "Alignment length %i" % a.get_alignment_length()
    for record in a:
        print record.seq, record.name, record.id
@

@Traceback (most recent call last):
  File "R:\Untitled 4.py", line 5, in <module>
    for a in AlignIO.parse(handle, "fasta-m10"):
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
    for a in i:
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
    yield build_hsp()
  File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
    assert query_tags, query_tags
AssertionError: {}@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org




More information about the Biopython-dev mailing list