[Biopython] fasta-m10 al_start and al_end?
Anne Pajon
ap12 at sanger.ac.uk
Mon Nov 9 15:29:20 UTC 2009
Hi Peter,
Thanks for adding these private variables. They are called _al_start
and _al_stop.
While testing the code today, I found a little bug. For the match
record:
alignment.add_sequence(match_descr, match_align_seq)
record = alignment.get_all_seqs()[-1]
assert record.id == match_descr or record.description ==
match_descr
#assert record.seq.tostring() == match_align_seq
record.id = match_descr.split(None,1)[0].strip(",")
record.name = "match"
record.annotations["original_length"] =
int(match_annotation["sq_len"])
#TODO - handle start/end coordinates properly. Short term
hack for now:
record._al_start = int(query_annotation["al_start"])
record._al_stop = int(query_annotation["al_stop"])
the al_start and al_stop should be taken from match_annotation instead
of query_annotation, I think.
Kind regards,
Anne.
On 26 Oct 2009, at 14:17, Peter wrote:
> On Mon, Oct 26, 2009 at 10:04 AM, Peter <biopython at maubp.freeserve.co.uk
> > wrote:
>> On Fri, Oct 23, 2009 at 11:00 PM, Anne Pajon <ap12 at sanger.ac.uk>
>> wrote:
>>>
>>> Hi Peter,
>>>
>>> Thanks for your fast answer.
>>>
>>> I've already discovered the _annotations and I am prepared to
>>> update my
>>> code as soon as a better solution is provided.
>>
>> Good.
>>
>>> Concerning the al_start and al_end, I am looking for a solution
>>> very soon,
>>> as I am working on an annotation pipeline prototype in python.
>>> What would be
>>> your recommendation? Writing a parser myself, using another tool
>>> (but which
>>> one?), or helping storing this information in SeqRecord in
>>> biopython as it
>>> is almost there. Thanks to let me know.
>>
>> I would rather not add them directly to the SeqRecord annotations
>> dictionary because that will make doing something meaningful with
>> slicing (the SeqRecord, or in future the Alignment) much harder. I
>> think the best way to handle these is in the Alignment object, but
>> this isn't really supported at the moment.
>>
>> Are you happy to run a development version of Biopython, or at least
>> to update the file Bio/AlignIO/FastaIO.py? I'm thinking in the short
>> term we can record these bits of information as private properties of
>> the SeqRecord, i.e. _al_start and _al_end
>
> Make that _al_start and _al_end (to match the field names used in
> the FASTA output). This change is in the repository now, which you
> can grab via github. See http://www.biopython.org/wiki/SourceCode
>
> As with any "private" variables (leading underscore), they are not
> really intended for public use, but should at least solve your
> immediate requirement for now.
>
> Peter
--
Dr Anne Pajon - Pathogen Genomics
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Biopython
mailing list