[Biopython] fasta-m10 al_start and al_end?

Mon Nov 9 15:29:20 UTC 2009

Hi Peter,

Thanks for adding these private variables. They are called _al_start  
and _al_stop.

While testing the code today, I found a little bug. For the match  
record:

         alignment.add_sequence(match_descr, match_align_seq)
         record = alignment.get_all_seqs()[-1]
         assert record.id == match_descr or record.description ==  
match_descr
         #assert record.seq.tostring() == match_align_seq
         record.id = match_descr.split(None,1)[0].strip(",")
         record.name = "match"
         record.annotations["original_length"] =  
int(match_annotation["sq_len"])
         #TODO - handle start/end coordinates properly. Short term  
hack for now:
         record._al_start = int(query_annotation["al_start"])
         record._al_stop = int(query_annotation["al_stop"])

the al_start and al_stop should be taken from match_annotation instead  
of query_annotation, I think.

Kind regards,
Anne.

On 26 Oct 2009, at 14:17, Peter wrote:

> On Mon, Oct 26, 2009 at 10:04 AM, Peter <biopython at maubp.freeserve.co.uk 
> > wrote:
>> On Fri, Oct 23, 2009 at 11:00 PM, Anne Pajon <ap12 at sanger.ac.uk>  
>> wrote:
>>>
>>> Hi Peter,
>>>
>>> Thanks for your fast answer.
>>>
>>> I've already discovered the _annotations and I am prepared to  
>>> update my
>>> code as soon as a better solution is provided.
>>
>> Good.
>>
>>> Concerning the al_start and al_end, I am looking for a solution  
>>> very soon,
>>> as I am working on an annotation pipeline prototype in python.  
>>> What would be
>>> your recommendation? Writing a parser myself, using another tool  
>>> (but which
>>> one?), or helping storing this information in SeqRecord in  
>>> biopython as it
>>> is almost there. Thanks to let me know.
>>
>> I would rather not add them directly to the SeqRecord annotations
>> dictionary because that will make doing something meaningful with
>> slicing (the SeqRecord, or in future the Alignment) much harder. I
>> think the best way to handle these is in the Alignment object, but
>> this isn't really supported at the moment.
>>
>> Are you happy to run a development version of Biopython, or at least
>> to update the file Bio/AlignIO/FastaIO.py? I'm thinking in the short
>> term we can record these bits of information as private properties of
>> the SeqRecord, i.e. _al_start and _al_end
>
> Make that _al_start and _al_end (to match the field names used in
> the FASTA output). This change is in the repository now, which you
> can grab via github.  See http://www.biopython.org/wiki/SourceCode
>
> As with any "private" variables (leading underscore), they are not
> really intended for public use, but should at least solve your
> immediate requirement for now.
>
> Peter

--
Dr Anne Pajon - Pathogen Genomics
Sanger Institute, Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SA, United Kingdom
+44 (0)1223 494 798 (office) | +44 (0)7958 511 353 (mobile)

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.