[Biopython-dev] Proposal of a patch for FastaIO

Wibowo Arindrarto w.arindrarto at gmail.com
Thu Jun 21 11:38:22 UTC 2012


Hi Peter,

Thanks for the mention! Fasta is indeed the next format that I will be
working to support. It should be usable sometime next week.

Hi Roberto,

As Peter mentioned, I am now working to add support for Fasta in
SearchIO. It's similar in some ways to Bioperl's SearchIO, but there
are also differences, some of which I might have mentioned in my
development blog (http://bow.web.id/blog/tag/gsoc/).

Different from AlignIO whose focus is the alignments; SearchIO will
try to parse all other useful information in the file (certainly
including the coordinates) and make them all easily accessible.

If you are interested, my development branch is here:
https://github.com/bow/biopython/tree/searchio. Support for a more
complete Fasta parsing should be available next week, and I would
really appreciate it if you could then try your hands on it and let me know your
thoughts :).

Hope that helps,
Bow


On Thu, Jun 21, 2012 at 11:48 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Jun 21, 2012 at 7:30 AM, Roberto Mosca
> <roberto.mosca at irbbarcelona.org> wrote:
>> Dear Biopython developers,
>>
>> I am new to this mailing list but I would like to propose a patch for
>> the parser of the fasta-m10 alignment format and I do not really know
>> how to do it...
>>
>> The fact is that from a python script I need to access the details
>> (start and stop residues) of the alignment in every sequence and also
>> the E-value (sw_expect) which are saved in private variables (_al_start,
>> _al_stop and _annotations["sw_expect"]) but are not accessible from the
>> public interface of the SeqRecord and Alignment classes.
>>
>> For this reason I had to modify the Bio/AlignIO/FastaIO.py file of my
>> local copy of BioPython.
>>
>> Since I feel that other people could also benefit from these changes, I
>> would like to propose to include them in the standard distribution, but
>> I do not know what is the right procedure to follow. Could you help me?
>>
>> I attach the patch file to FastaIO.py.
>>
>> The code introduces two keys in the public "annotations" member of
>> SeqRecord ("start" and "end") and one key ("sw_expect") in the public
>> "annotations" member of Alignment.
>>
>> Can someone give me any feedback on this?
>>
>> I also have a github account (rmosca)...
>>
>> Thank you in advance and thanks for this wonderful library that is
>> BioPython!
>>
>> Roberto
>
> Hi Roberto,
>
> Apologies if my quick reply to your pull request was too curt
> (I was checking emails over breakfast again):
> https://github.com/biopython/biopython/pull/51
>
> These attributes were deliberately stored under private variables
> (_al_start, _al_stop and _annotations["sw_expect"]) so that you
> can use them in the short term - but was never intended as a
> long term solution, see also:
> http://lists.open-bio.org/pipermail/biopython/2009-October/005760.html
>
> It has taken longer than I expected, but that work is now happening
> in Bow's Google Summer of Code project:
> http://biopython.org/wiki/SearchIO
>
> This will probably result in deprecating the current Bio.AlignIO.FastaIO
> module (but you'd still be able to use the "fasta-m10" format with the
> main Bio.AlignIO.parse() function).
>
> So, for the medium term please just use the private variables -
> and help out with testing or other feedback on Bow's GSoC work.
> As it happens, Bill Pearson's FASTA -m10 output is (I think) next
> on the list...
>
> More generally, I'd like to do something more organised and consistent
> for start/end coordinates over other file formats - i.e. in the SeqRecord:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-May/009646.html
>
> Regards,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list