[Bioperl-l] Intron and exon information

David Block dblock@gnf.org
Fri, 14 Jun 2002 08:29:41 -0700


A GCG->GeneStructure parser would be great - I'm sure it would find a use somewhere :)

If you could take your input files as a model and extend the parser:

 - read the case of the sequence (regex)
 - deduce exon/intron from the case
 - create a parent GeneStructure::Gene object, and then add the exons
   by their start/stop coordinates to the Gene
 - after the exons are in, you can just ask for the introns and get them as a list,
   IIRC.
 - Once this is done, submit the code to the list, we'll take a look at it and patch
   it into the head of CVS, somewhere where it belongs (right, Jason?)

Good luck - let us know how it's going...

--
David Block                    dblock@gnf.org
GNF - San Diego, CA        http://www.gnf.org     
Genome Informatics  /  Enterprise Programming

> -----Original Message-----
> From: Lars G. T. Jorgensen [mailto:larsj@diku.dk]
> Sent: Friday, June 14, 2002 8:14 AM
> To: bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Intron and exon information
> 
> 
> "David Block" <dblock@gnf.org> writes:
> 
> > This is kind of in my area - although I've been stuck in 
> Java-land for a while.
> > 
> > These 'files' of yours - what format are they in - just FASTA?
> 
> The datafiles are outputs from the GCG suite. The Seq object accepts
> the sequences so thats fine. But, they use casing for representing
> exons/introns and the SeqIO::gcg throws this information away. 
> 
> So I was thinking about patching the parser, but I don't know if that
> is against the Design to let the SeqIO add features to a Seq object. 
> 
> But I think GCG can do FASTA output. Does this contain information
> about introns?
> 
> BTW. Is there a printable class diagram somewhere. We don't have a A0
> printer here...
> 
> > 
> > The alignments are done, right?  So what you need to do is 
> figure out where the introns are, and then deduce the phase?
> > 
> > You're going to have to create a 
> Bio::SeqFeature::GeneStructure::Gene object, and use its 
> intron capabilities.  Take a look at the perldoc for that 
> module, see if you can shoehorn your data into there, and 
> then I think Hilmar's excellent work will give you the intron data.
> > 
> > Let us know if this helps.
> 
> -- 
> Mvh|Regards, Lars
> System administrator     | Student 
> Bioinformatics Centre    | Department of Computer Science  
> University of Copenhagen | University of Copenhagen
> http://www.binf.ku.dk    | http://www.diku.dk
> When's the last time you used duct tape on a duct? -- Larry Wall
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>