[Bioperl-l] 0.7.2

Jason Stajich jason@chg.mc.duke.edu
Fri, 6 Jul 2001 12:59:40 -0400 (EDT)


On Fri, 6 Jul 2001, Vladimir Babenko wrote:

>    Thanks, but it didn't work. I wonder how could I make parser to use
> Bio::SeqFeature::Transcript instead of Bio::SeqFeature::Generic? While using
> Bio::Seq it always goes to Generic SeqFeature module.

yep, that's intentional.  You are asking the parser to interpret the
feature tags, see that one is mRNA and make them into transcripts.  Some
other people Dave Block among them, are grappling with the best way to do
this.  It really means either adding a filter into the parser modules to
know how to instantiate Transcripts, or build an object that will take a
Sequence object, look at all the features contained within, and aggregate
some of them into a Gene/transcript object.

Or you could write some code that would loop through all of the features
that are in a seq after parsing a file, if the feature was mrna, build a
transcript from it and add that new feature to the sequence object.  

I'm concerned about putting this type of magic directly in the parser
objects - would rather it be a post-processing procedure, but I can be
convinced otherwise.

The Gene/Transcript models are in Hilmar's court I believe?

-jason
>       Bob
> 
> Jason Stajich wrote:
> 
> > would the following work - testing for a pattern rather than an exact
> > string?
> >
> > On Thu, 5 Jul 2001, Vladimir Babenko wrote:
> >
> > >    Hi, Jason,
> > > just changed the versions and in the course of parsing the EMBL - formatted
> > > file I found that there is no mRNA_span primary tag any more.
> > >
> > >             Here is a fragment of my code:
> > >   foreach my $feature ($seq->top_SeqFeatures()) {
> > >     if ($feature->primary_tag eq 'mRNA_span') {
> > if( $feature->primary_tag =~ /mrna/i ) {
> > >       push(@transcript_spans, $feature);
> > >       push(@transcript_types, 'M');
> > >     } elsif ($feature->primary_tag eq 'CDS_span') {
> > >       push(@transcript_spans, $feature);
> > >       push(@transcript_types, 'C');
> > >     }
> > >
> > >   to parse the following string in FT:
> > > FT   mRNA          complement(join(500..620,4625..4706,5556..5644,6463..6510,
> > > FT                         6729..6794,7947..8015,15078..15150,17749..17879,
> > > FT
> > > 21563..21684,24944..24976,25346..25478,28959..29032))
> > >
> > > and though there is a feature with primary tag = 'mRNA' and boundaries
> > > (500,29032), there is no primary tag='mRNA_span', so the script presented
> > > above has no get-in.
> > >    Is it a novel approach, and what structure should I use for identifying
> > > exons?
> > >           Thanks,
> > >                  Bob
> > > --
> > >
> > > Vladimir Babenko, PhD
> > > Center for Bioinformatics
> > > University of Pennsylvania
> > > 1313 Blockley Hall
> > > Philadelphia, PA 19104-6021
> > > (V) 215-573-2280
> > > (F) 215-573-3111
> > > http://www.pcbi.upenn.edu
> > >
> > >
> > >
> >
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center
> > http://www.chg.duke.edu/
> 
> 
> 
> 

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/