[Bioperl-l] How to get sequence of joins

20 Oct 2000 11:44:38 +0100

>>>>> "Henrik" == Henrik Seidel <Henrik.Seidel@schering.de> writes:

    Henrik> Hi all, I could not figure out a simple way to get the
    Henrik> sequence of a CDS which looks like

    Henrik> CDS complement(join(1..10,20..30)

    Henrik> The sequence member of the top level CDS_span object would
    Henrik> be 1..30 and not the join. Do I have to cycle through all
    Henrik> the subfeatures of the top level CDS_span and concatenate
    Henrik> the sequences of all subfeatures which are a CDS_span as
    Henrik> well, or is there an easier way? 

I think that you have to do as you suggest here i.e. take the
PrimarySeq of each sub-feature, get the sequence string from that and
then concatenate those.

    Henrik> Even if I concatenate them, how do I know that I have to
    Henrik> take the complement? In the top level CDS_span I did not
    Henrik> find any field containing or any member function returning
    Henrik> an indication that this is supposed to be the
    Henrik> complement. Is there a function returning the correct
    Henrik> sequence (i.e., after joining and taking the complement)
    Henrik> of a top level CDS_span? 

There is none at the moment. You'll need to check the strand, both for
consistency - all one strand, or the other - unless you are expecting
some of those unusual features with occur on neither/both, and then
reverse the subfeature order for reverse strand joins before
concatenating.

When/if I get round to merging my modules I intend to move these
functions across (unless someone else gets there first).

    Henrik> Or is there a function returning the complete original
    Henrik> entry as it was in the source file (i.e.,
    Henrik> "complement(join(1..10,20..30)")?

This I don't know.

cheers,

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA