[Biopython] How to construct a SeqRecord with the info in the SeqFeatures type mRNA or CDS?

Peter biopython at maubp.freeserve.co.uk
Thu Oct 15 21:48:20 UTC 2009


On Thu, Oct 15, 2009 at 10:35 PM, Peter wrote:
> On Thu, Oct 15, 2009 at 10:18 PM, Carlos Javier Borroto wrote:
>> Hi,
>>
>> I want to construct a SeqRecord with the sequence make from the sum of
>> the Locations of the SubFeatures I get from a SeqFeature type mRNA or
>> CDS. Does biopython has something already to do this? It looks like
>> something many people may want, but is proving to be king of difficult
>> to implement manually, so I'm wondering if is already there?
>
> There isn't anything built in now, partly because to do it properly
> means coping with a lot of possible fuzzy locations and joins.
> I can go into more detail, but it would help to know what kind
> of organisms are you working with? For prokaryotes and viruses,
> CDS locations are (usually) trivial so you just need the start, end
> and strand.

There is a partly tested function called get_feature_nuc in the
unit test file test_SeqIO_features.py, which takes a SeqFeature
and the parent Seq object. In fact looking at it now, some of
the comments look out of date (I think I fixed the GenBank
parser to cope with mixed strand features ...). This might do
what you want - but as I said, it needs more testing.

It had crossed my mind (as you can tell from the comments)
that this could be added to Biopython proper at some point.
One idea was as a method of the SeqRecord object, which
would take a SeqFeature (or just the integer index of the
desired feature in the SeqRecord's list of features).

Peter



More information about the Biopython mailing list