[Biopython-dev] [Biopython - Bug #3312] Failing to parse fasta-m10 format generated by lalign36
redmine at redmine.open-bio.org
redmine at redmine.open-bio.org
Thu Nov 10 11:34:39 UTC 2011
Issue #3312 has been updated by Peter Cock.
Looking at this, I believe there is a problem in lalign36 itself rather than Biopython: At the end of the first batch of alignments (for query one, AT1G01040.1) we have the odd line:
<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
</pre>
At the end of the second (and final) batch of alignments (for query two, AT5G04140.2) we have these odd lines:
<pre>
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os07g46460.1 1500 bp_Up Chr 07:27738635..27737133 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
>>LOC_Os03g02970.1 1500 bp_Up Chr 03:1205337..1203835 (reverse complemented)
</pre>
Curious. It seems LALIGN is starting to write out another alignment, but then doesn't.
It was very helpful that you included the input files as well, so I could run this with the version of lalign36 I have installed (version 36.3.4 Apr, 2011) and here the output is a bit different but shows similar odd lines.
I have updated Biopython to give a more helpful error message in this case:
https://github.com/biopython/biopython/commit/1a99454d358fab41771551e8f3a475a90f240b25
<pre>
>>> from Bio import AlignIO
>>> for a in AlignIO.parse("test.aln", "fasta-m10"):
... print a
...
SingleLetterAlphabet() alignment with 2 rows and 130 columns
AAAAAAAGAGAGAAATATTACTACAAAACAGAAGCAAGCAAGTG...ATC AT1G01040.1
AGAGAGAGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCA-GAG...ATC LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 81 columns
AAACAGAAGCAAGC--AAGTGGAA-AACAGACCAGAAGAGAGAG...CGA AT1G01040.1
AGAGAGAGGGAAGCGGAGGAGGGAGAAGAGATCAGAGGAAAGAG...TGA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 264 columns
AAGATTTCGATTTCG-ATATAAATACTTAAT---CTTT-ATAAA...TTA AT1G01040.1
AATATATCTATTTCTTAAACAAATCATTATTTTCCTTTCATAAA...CTA LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 428 columns
ATTTTTATTTTTATTTT-TATGGGAAAGAAGTTGCACGAGTCGG...TTT AT1G01040.1
ATCATTATTTTCCTTTCATAAAAAAATGAATT---ATGAGGCGG...TTT LOC_Os03g02970.1
SingleLetterAlphabet() alignment with 2 rows and 145 columns
AACTCACTCAAGAAAACCAAATCCCCAGAGA-AGAAA-ACAGAA...AAC AT1G01040.1
ATCTCAATCGAGAGAGCGAGCACACGAGAGAGAGAGAGAGGGAA...ATC LOC_Os03g02970.1
Traceback (most recent call last):
...
ValueError: No data for query 'AT1G01040.1', match 'LOC_Os07g46460.1'
</pre>
Are you on Bill Pearson's FASTA mailing list? We should report this.
Peter
----------------------------------------
Bug #3312: Failing to parse fasta-m10 format generated by lalign36
https://redmine.open-bio.org/issues/3312
Author: gahoo lee
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category:
Target version:
URL:
When I parse an alignment created by lalign which is included in FASTA36, I got errors. We got two sequences in each fasta file now, but if one sequence each, there's no error. Here are the codes and error.
@lalign36 -m 10 at.fasta os.fasta >test.aln@
@from Bio import AlignIO
handle = open('test.aln')
for a in AlignIO.parse(handle, "fasta-m10"):
assert len(a) == 2, "Should be pairwise!"
print "Alignment length %i" % a.get_alignment_length()
for record in a:
print record.seq, record.name, record.id
@
@Traceback (most recent call last):
File "R:\Untitled 4.py", line 5, in <module>
for a in AlignIO.parse(handle, "fasta-m10"):
File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\__init__.py", line 371, in parse
for a in i:
File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 242, in FastaM10Iterator
yield build_hsp()
File "D:\Program Files\Python\lib\site-packages\Bio\AlignIO\FastaIO.py", line 106, in build_hsp
assert query_tags, query_tags
AssertionError: {}@
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
More information about the Biopython-dev
mailing list