[Bioperl-l] Few questions on whole chromosome & GFF

Jason Stajich jason at cgt.duhs.duke.edu
Thu Jul 8 22:26:49 EDT 2004


On Thu, 8 Jul 2004 gtg974p at mail.gatech.edu wrote:

> Hi,
> I am trying to find the positions of a particular pattern in the whole
> chromosome and write the o/p in the GFF format. I have few questions (I am a
> newbie, sorry if the questions are silly)
>
> 1) How do I split the whole chromosome into smaller chunks and then join it
> again?
>
Read the Seq Howto and see scripts/seq/split_seq.PLS
you will probably want to a good job naming the pieces so they can be
put back together
> 2) If I split them into say 20 different files
> o/p of file (GFF)
>
> seqID source Feature Start End Score Strand
>
> There will be 20 different GFF files with each start & end corresponding to its
>
> own (small chunks) file, how do I make it correspond to the whole chromosome
> coordinates?

math + perl code I hope...

encode the offset in the name of the sequence, seq_1000-2000 then use a
regexp to remap your numbers back into the original coordinate space

>
> 3) While writing the GFF file how can I specify that I want to sort it by
> Score, by default its sorting by start?
>
By default it sorts nothing.  I don't know where you are getting your GFF
from though.  read in all the features and sort by score

my @sorted_features = sort { $a->score <=> $b->score } @features;


-jason



> Please help me,
> Thanks,
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list