[Bioperl-l] Intron and exon information

Jason Stajich jason@cgt.mc.duke.edu
Fri, 14 Jun 2002 11:54:00 -0400 (EDT)


We sort of did this for the Bio::Tools::Sim4 and Genscan parsers so see
that as an example.

But the sim4 tools don't connect to the
Bio::SeqFeature::Gene:: objects just yet.  I actually am hoping to work on
those objects some more this summer to deal with mapping and being a
little rich wrt to functionality.

That said a GCG genestructure parser would be helpful - me thinks a
reasonable namespace needs to be established here at some point but let's
see what is really on the table first.

On Fri, 14 Jun 2002, David Block wrote:

> A GCG->GeneStructure parser would be great - I'm sure it would find a use somewhere :)
>
> If you could take your input files as a model and extend the parser:
>
>  - read the case of the sequence (regex)
>  - deduce exon/intron from the case
>  - create a parent GeneStructure::Gene object, and then add the exons
>    by their start/stop coordinates to the Gene
>  - after the exons are in, you can just ask for the introns and get them as a list,
>    IIRC.
>  - Once this is done, submit the code to the list, we'll take a look at it and patch
>    it into the head of CVS, somewhere where it belongs (right, Jason?)
>
> Good luck - let us know how it's going...
>
> --
> David Block                    dblock@gnf.org
> GNF - San Diego, CA        http://www.gnf.org
> Genome Informatics  /  Enterprise Programming
>
> > -----Original Message-----
> > From: Lars G. T. Jorgensen [mailto:larsj@diku.dk]
> > Sent: Friday, June 14, 2002 8:14 AM
> > To: bioperl-l@bioperl.org
> > Subject: Re: [Bioperl-l] Intron and exon information
> >
> >
> > "David Block" <dblock@gnf.org> writes:
> >
> > > This is kind of in my area - although I've been stuck in
> > Java-land for a while.
> > >
> > > These 'files' of yours - what format are they in - just FASTA?
> >
> > The datafiles are outputs from the GCG suite. The Seq object accepts
> > the sequences so thats fine. But, they use casing for representing
> > exons/introns and the SeqIO::gcg throws this information away.
> >
> > So I was thinking about patching the parser, but I don't know if that
> > is against the Design to let the SeqIO add features to a Seq object.
> >
> > But I think GCG can do FASTA output. Does this contain information
> > about introns?
> >
> > BTW. Is there a printable class diagram somewhere. We don't have a A0
> > printer here...
> >
> > >
> > > The alignments are done, right?  So what you need to do is
> > figure out where the introns are, and then deduce the phase?
> > >
> > > You're going to have to create a
> > Bio::SeqFeature::GeneStructure::Gene object, and use its
> > intron capabilities.  Take a look at the perldoc for that
> > module, see if you can shoehorn your data into there, and
> > then I think Hilmar's excellent work will give you the intron data.
> > >
> > > Let us know if this helps.
> >
> > --
> > Mvh|Regards, Lars
> > System administrator     | Student
> > Bioinformatics Centre    | Department of Computer Science
> > University of Copenhagen | University of Copenhagen
> > http://www.binf.ku.dk    | http://www.diku.dk
> > When's the last time you used duct tape on a duct? -- Larry Wall
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu