[Biopython] fasta-m10 al_start and al_end?

Mon Oct 26 15:44:23 UTC 2009

>
> On Fri, Oct 23, 2009 at 4:57 PM, Anne Pajon <ap12 at sanger.ac.uk> wrote:
> > Dear,
> >
> > I am using Biopython to parse a fasta alignment file:
> >
> ...
> >
> > I would like to print the start/end of each aligned sequences.
> >
> > I can see in Bio.AlignIO.FastaIO.next() that sq_len is stored in
> > annotations:
> > ? ? ? ?record.annotations["original_length"] =
> > int(query_annotation["sq_len"])
> > but I cannot find a way of accessing at_start and al_end.
> >
> > Thanks in advance for your help.
> > Kind regards,
> > Anne.
>
> Hi Anne,
>
> That's a good question, but the answer may be a little
> disappointing.
>
> That information isn't currently recorded in the SeqRecord,
> partly because at the time I didn't need it, but mainly I was
> undecided about if the start location should be converted
> into python counting or not (zero based versus one based).
> What would you prefer? My inclination is python counting.
>
> Peter
>
> P.S. Most of the alignment level annotation is recorded,
> but is currently hidden in a "private" property (leading
> underscore). You can access this, but be warned that this
> will change in future - Improving the alignment object is
> something I am working on for a future release.
>
>
Hi Peter,

Here's +1 for Python counting. That would match SeqFeature and the
ProteinDomain class in Bio.Tree.PhyloXML.

While we're on this topic -- I have some unpublished code for rendering an
alignment object in HTML, with plans for colorization, conservation
profiles, etc. I rolled my own alignment class since the one in
Bio.Align.Generic didn't have the attributes (start, end, selected columns)
for a particular file format I was parsing. It's not urgent, but at some
point could you publish your plans for the Alignment classes so I (and
probably others) can stay/become compatible?

Thanks,
Eric