[Bioperl-l] Re: Fixing bioperl [was Re: Analysis features]

Mon Aug 1 15:45:00 EDT 2005

On Fri, 2005-07-29 at 17:20 -0700, Hilmar Lapp wrote:
> On Jul 29, 2005, at 8:17 AM, Scott Cain wrote:
> 
> >
> > The main section of affected code in gmod is the GFF bulk loader, but
> > after we make the changes to the bioperl API, it shouldn't be too hard
> > to fix the loader.  In fact, some of those changes may have already
> > started.  I remember a few weeks before I release the gmod/chado
> > package, Hilmar sent out an announcement that he made some changes.
> 
> You mean around the time of ISMB? I fixed the ontology modules ... they  
> should actually work better now not worse unless you assumed the  
> presence of some bugs ;)

I guess I must have been assuming bugs :-)  I didn't look at diffs, or
in much detail what the exact problem was.  Since this is the last
release that will be using Bio::Onotology, and it is an alpha release, I
was not too concerned.
> 
> > While I should have paid attention then, I was busy getting my release
> > together, and everything seemed to work, so I ignored it.
> > Unfortunately, the reason things continued to work was that I forgot to
> > update my bioperl-live, and as a result, the gmod release doesn't work
> > with bioperl-live.
> 
> Scott, what would really help sometimes is if in such a situation you  
> run the bioperl test suite and report the result if there are any  
> failures, especially those that appear potentially connected to your  
> problem. Last time the gmod ontology loader ceased to work the problem  
> would have been readily exposed by the ontology tests in bioperl. It  
> just helps in zooming in on the problem.

I run make test frequently; what I do less often is pay close attention
to the result.  When working with bioperl-live, one gets a little numb
to test failures :-/
> 
> I'd be eager to help make bioperl work with gmod and vice versa and I'm  
> sure many others are too, but it'll be difficult if we don't work  
> towards this collaboratively. For this I really liked the spirit of  
> Chris' proposal - that's the way to make this work.
> 
> > [...]
> > The other section of code that could have been affected but won't be is
> > the ontology loader.  The current ontology loader depends on
> > Bio::Ontology, but I was already planning on migrating to go-perl for
> > loading ontologies anyway, so that won't be a problem.
> 
> I'm closing in on the last bugs in the go-perl integration. It remains  
> to be seen how fast the result is as Chris made me aware in Detroit,  
> but if it works this will give you both worlds at your choosing.
> 
> 	-hilmar
> 
> >
> > So, who wants to take the lead on this?
> >
> > Thanks,
> > Scott
> >
> >
> > On Thu, 2005-07-28 at 12:42 -0700, Chris Mungall wrote:
> >> I think the answer may be even more complicated than this.
> >>
> >> Lurkers and contributors to the bioperl mailing list may have noticed  
> >> that
> >> there has been some major obstacles in progressing lately,  
> >> particularly in
> >> getting a stable release of the code out. bp1.4 is fairly old, 1.5 is  
> >> a
> >> developers release, though this is the one required by GMOD.
> >>
> >> My understanding is that this bottleneck can be traced back to  
> >> changes in
> >> the SeqFeature and Annotation model. These changes appear to be  
> >> required
> >> by Bio::SeqFeature::Annotated which is produced by Bio::FeatureIO::gff
> >> (which in turn is used by the GMOD bulk loader, which is the main  
> >> reason
> >> GMOD requires 1.5, I believe?). Unfortunately, these changes also  
> >> break
> >> existing code and have a severe negative impact on memory usage.
> >>
> >> Before advising Cyril and others to switch to BFIO::gff I think it's
> >> important to make sure there is a clear path forward with bioperl. My
> >> impression is that there is something of a stalemate here. The bioperl
> >> developers would like to retract the aforementioned changes, but they
> >> believe they cannot do this without breaking GMOD code.  They are also
> >> extremely uncomfortable about leaving these changes in. Everyone  
> >> gives up
> >> and starts coding around bioperl.
> >>
> >> Here is why the changes were introduced:
> >>
> >> BioPerl has a 'scruffy' typing model, whereby feature types  
> >> (primary_tag
> >> in bioperl) and featureprop types (tags in bioperl) are labels or  
> >> strings.
> >> In contrast, Chado forces all types to be some class or relation in an
> >> ontology.
> >>
> >> Now obviously I'm rather partial to the Chado model, but that doesn't  
> >> mean
> >> I think it should be forced upon bioperl. I often use bioperl in  
> >> scruffy
> >> mode (on scruffy data); or in some combination whereby I map the  
> >> scruffy
> >> types to ontologies in some non-bioperl code. When using bioperl as a
> >> middleware component over a nicely organised database, ontology-typed  
> >> mode
> >> is definitely best. However, the majority of bioperl users (including
> >> myself) spend a large proportion of their time working with scruffy  
> >> data,
> >> in which case lightweight scruffy types are more appropriate.
> >>
> >> It seems that there is a perfectly simple way of reconciling both
> >> approaches. We revert bioperl back to the simpler scruffy model. The
> >> majority of users and developers breathe a sigh of relief. We then  
> >> extend
> >> SeqFeatureI with something like SeqFeatureAnnotatedI. This forces  
> >> types to
> >> be stored as OntologyTerms (and I haven't even touched on some of the
> >> problems here, but at least we are insulating the standard bioperl  
> >> layer
> >> that 99% of users use from these issues). All classes implementing  
> >> SFAI
> >> will necessarily implement SFI, and the primary_tag and tag_values  
> >> methods
> >> will be supported (not deprecated) as simple delegations to the
> >> OntologyTerm objects.
> >>
> >> We can then modify BFIO::gff (which is an incredibly useful piece of  
> >> code)
> >> and get rid of all the dependencies on SO and Bio::Ontology* and  
> >> instead
> >> allow the user of this module to plug in their own resolver/validator  
> >> - so
> >> they can choose whether they just want fast scruffy lightweight SFI
> >> features, or whether they want ontology-typed SFAI features. If the
> >> latter, then they can choose their own resolver strategy - by a user
> >> supplied hash, by a copy of SO auto-downloaded from sourceforge, by a
> >> local chado db, by the genbank->SO mapping table, during parsing vs
> >> post-parsing, whatever. In fact there is already
> >> Bio::SeqFeature::Tools::TypeMapper, but currently this is mostly  
> >> concerned
> >> with helping Bio::SeqFeature::Tools::Unflattener convert scruffy  
> >> genbank
> >> to something sensible.
> >>
> >> GMOD (and perhaps biosql) would use SFAI, everyone else would use the
> >> simpler SFI. Someone can even get a stable 1.6 release out before all  
> >> the
> >> SFAI details such as how the resolver would work are finalised. I'd  
> >> really
> >> like to see 1.6 include a simpler BFIO::gff that can optionally  
> >> produces
> >> features that aren't SeqFeature::Annotateds, but that's negotiable.
> >>
> >> There's vast swathes of both GMOD and BioPerl code I'm not familiar  
> >> with,
> >> so it's possible my analysis above is flawed in some way. If it is,  
> >> then
> >> it's up to someone from either camp to speak up! If not, then there's  
> >> no
> >> excuses for the relevant people to start sorting out this mess by
> >> commencing with the solution outlined above.
> >>
> >> Cheers
> >> Chris
> >>
> >>>
> >>> Scott
> >>>
> >>>
> >>> On Thu, 2005-07-28 at 18:37 +0200, Cyril Pommier wrote:
> >>>> Hello,
> >>>> We are going to store analysis results in chado, and we are of  
> >>>> course
> >>>> very interressed by these futur evolutions of GFF3/chado.
> >>>> So we would like to make sure that the parsers and conversions  
> >>>> programs
> >>>> we are writing now will be compatible with the futur GFF3.
> >>>>
> >>>> We are using Bio::SeqFeature::Generic objects that we write with
> >>>> Bio::Tools::GFF.
> >>>>
> >>>> Do you think that Bio::Tools::GFF will be able to handle the new  
> >>>> 'type'
> >>>> column or is it better to switch to Bio::FeatureIO::gff ?
> >>>>
> >>>> Thanks in advance for any advice.
> >>>>
> >>>> Cyril
> >>>>
> >>>> Don Gilbert wrote:
> >>>>
> >>>>>
> >>>>> Scott,
> >>>>>
> >>>>> Your notes in gmod_bulk_load_gff3.pl suggest it is headed in
> >>>>> same direction I suggest below. More about these todo points
> >>>>>
> >>>>>> - address flybase"s use of of analysisfeature combined with  
> >>>>>> feature to
> >>>>>> give source-type information (in GFF terms). This will need to
> >>>>>> be addressed in the GBrowse adaptor.
> >>>>>> - modify the bulk loader to allow "mixed" GFF3 files (that is,
> >>>>>> containing
> >>>>>> both analysis results and annotations). See perldoc
> >>>>>> gmod_bulk_load_gff3.pl
> >>>>>> for more info
> >>>>>
> >>>>>
> >>>>> Use of chado's analysisfeature table is something others who know
> >>>>> it better can comment on. But after working with it for a while
> >>>>> it makes sense to me to use in this way:
> >>>>>
> >>>>> For a future GFF -> Chado loader, treat analysis features such as
> >>>>> gene finding results, BLAST, sim4 as 'analysisfeature type' rather
> >>>>> than feature CV term type (the ones that now end up with a generic
> >>>>> 'match' cvterm). In these cases the Analysis table is populated  
> >>>>> with
> >>>>> program:database_sourcename
> >>>>> as the basis of this 'analysisfeature type', such as
> >>>>> match:blastx:na_pe.dros
> >>>>> match:sim4:DGC
> >>>>> match:genie:dummy (or maybe exon:genie)
> >>>>>
> >>>>> The program:database fits neatly in GFF source field, as
> >>>>> #ref source type start stop ...
> >>>>> chr1 blastx:na_pe.dros match 1 100 ...
> >>>>> chr1 sim4:DGC match 1 100 ...
> >>>>>
> >>>>> These can be treated in database adaptor analogously to the CVterm
> >>>>> table feature types. See at end a list of current GFF feature
> >>>>> type:source from worm, rice, yeast, fly MODs. Fly and rice use a
> >>>>> syntax like above and worm gff uses BLAT_EMBL_BEST, instead of
> >>>>> BLAT:EMBL_BEST.
> >>>>>
> >>>>> From POD of your bulk_load_gff3.pl
> >>>>>> Analysis
> >>>>>> If you are loading analysis results (ie, BLAT results, gene
> >>>>>> predictions), you should specify the -a flag. If no arguments are
> >>>>>> supplied with the -a, then the loader will assume that the results
> >>>>>> belong to an analysis set with a name that is the concatenation of
> >>>>>> the source (column 2) and the method (column 3) with an underscore
> >>>>>> in between.
> >>>>>
> >>>>> "... then the loader will assume that the results belong to an
> >>>>> analysis table row with a program name and database source name
> >>>>> taken from Source (column 2, colon separated program:sourcename),
> >>>>> with a SOFA feature type taken from Method (column 3). If
> >>>>> sourcename doesn't apply, e.g. genefinder, don't add or use  
> >>>>> 'dummy'.
> >>>>> Use the generic 'match' SOFA type if others don't apply."
> >>>>> [see also http://song.sourceforge.net/gff3-jan04.shtml#ALIGNMENTS]
> >>>>>
> >>>>> Note that sourcename of database is a common attribute (all those
> >>>>> blasts, blats, sim4, ... are run on several different databases).
> >>>>>
> >>>>> For that underscore between method and source, where does that go  
> >>>>> into
> >>>>> database? It is used as parts of program or database sourcename  
> >>>>> names,
> >>>>> so it may be problematic to add one if not needed.
> >>>>>
> >>>>> Oh, I see now from bulk_load_gff3.PLS, you are creating a 'Name'  
> >>>>> entry
> >>>>> for analysis table. This probably is less useful than using Program
> >>>>> and Sourcename fields as flybase does, which comes from the common
> >>>>> usage where people run various programs, with various database  
> >>>>> sources
> >>>>> and want to plop the results into a database easily. These go into  
> >>>>> those
> >>>>> two fields directly, no need to create or parse a Name entry
> >>>>> (which can be and is null in flybase data).
> >>>>>
> >>>>>> my $search_analysis
> >>>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE name=?");
> >>>>>
> >>>>> I think it would be better as
> >>>>> my $search_analysis
> >>>>> = $db->prepare("SELECT analysis_id FROM analysis WHERE program=?  
> >>>>> and
> >>>>> sourcename=?");
> >>>>>
> >>>>>> Otherwise, the argument provided with -a will be taken
> >>>>>> as the name of the analysis set. Either way, the analysis set must
> >>>>>> already be in the analysis table. The easist way to do this is to
> >>>>>> insert it directly in the psql shell:
> >>>>>>
> >>>>>> INSERT INTO analysis (name, program, programversion)
> >>>>>> VALUES ('genscan 2005-2-28','genscan','5.4');
> >>>>>
> >>>>> My choice would be to populate the analysis table from GFF data,  
> >>>>> rather
> >>>>> than expect prepraration by user (or as another option).
> >>>>>
> >>>>> INSERT INTO analysis (program, sourcename)
> >>>>> VALUES ('tblastx','na_baylorf1_scfchunk.dpse');
> >>>>> INSERT INTO analysis (program, sourcename)
> >>>>> VALUES ('sim4','na_gb.dmel');
> >>>>> INSERT INTO analysis (program, sourcename, programversion)
> >>>>> VALUES ('genie_masked','dummy', '1.0');
> >>>>>
> >>>>>> There are other columns in the analysis table that are optional;  
> >>>>>> see
> >>>>>> the schema documentation and '\d analysis' in psql for more
> >>>>>> information.
> >>>>>>
> >>>>> ....
> >>>>>> A planned addtion to the functionality of handling analysis  
> >>>>>> results
> >>>>>> is to allow "mixed" GFF files, where some lines are analysis  
> >>>>>> results
> >>>>>> and some are not.
> >>>>>
> >>>>> This is the case for drosophila GFF now (see others also below). If
> >>>>> you make the default assumption that if ($method =~ /.*match/) and
> >>>>> ($source =~ m/([^:]+):(.+)/), you should get all/most of
> >>>>> analysisfeature types, and probably not anything else.
> >>>>>
> >>>>>> Additionally, one will be able to supply lists of
> >>>>>> types (optionally with sources) and their associated entry in the
> >>>>>> analysis table. The format will probably be tag value pairs:
> >>>>>>
> >>>>>> --analysis match:Rice_est=rice_est_blast, \
> >>>>>> match:Maize_cDNA=maize_cdna_blast, \
> >>>>>> mRNA=genscan_prediction,exon=genscan_prediction
> >>>>>
> >>>>> My suggestion for this (as per GFF source,type columns) would be
> >>>>> --analysis match:program:sourcename ...
> >>>>> --analysis match:blast:Rice_est,match:blast:Maize_cDNA,\
> >>>>> mRNA:genscan:dummy, exon:genscan:dummy
> >>>>>
> >>>>> I guess the 'dummy' data sourcename need not be added; flybase  
> >>>>> uses it
> >>>>> to keep that field not-null, but it isn't required by the schema.
> >>>>>
> >>>>> Here are some snippets from the ChadoFC adaptor I modified
> >>>>> from yours (will get into cvs.sf.net 'real soon'), showing that
> >>>>> it isn't much work to add this as an analog to how cvterm types
> >>>>> are used.
> >>>>>
> >>>>> -- Don
> >>>>>
> >>>>> ## Bio::DB::Das::ChadoFC.pm, part of new() - load analysis types
> >>>>> ## treat similar to CV table types
> >>>>>
> >>>>> sub getAnalysisFeatureHash
> >>>>> {
> >>>>> my $self= shift;
> >>>>>
> >>>>> my $dbh= $self->dbh();
> >>>>> my $sth = $dbh->prepare("select analysis_id,program,sourcename from
> >>>>> analysis")
> >>>>> or warn "unable to prepare select cvterms";
> >>>>> $sth->execute or $self->throw("unable to select cvterms");
> >>>>>
> >>>>> my(%term2name,%name2term) = ({},{});
> >>>>>
> >>>>> while (my $hashref = $sth->fetchrow_hashref) {
> >>>>>
> >>>>> ## this is dgg syntax of analysis feature names for GFF
> >>>>> ## all have generic 'match' method and program:source as 'source'
> >>>>> ## a problem, want other main types: EST_match:xxx, mRNA:genie ..  
> >>>>> etc.
> >>>>> my $anfeat=  
> >>>>> "match:".$hashref->{program}.":".$hashref->{sourcename};
> >>>>>
> >>>>> $term2name{ $hashref->{analysis_id} } = $anfeat;
> >>>>> $name2term{ $anfeat } = $hashref->{analysis_id};
> >>>>> }
> >>>>> $self->an_term2name(\%term2name);
> >>>>> $self->an_name2term(\%name2term);
> >>>>> }
> >>>>>
> >>>>> ## Das::ChadoFC::Segment snippets
> >>>>> sub features {
> >>>>> $self->{has_anatype}=0;
> >>>>> my $sql_range = '';
> >>>>> my ($interbase_start,$rend,$srcfeature_id,$sql_types);
> >>>>> unless ($feature_id) {
> >>>>> $sql_range = $self->sql_range($rangetype);
> >>>>>
> >>>>> $sql_types = $self->sql_types($types, -1); # dgg
> >>>>>
> >>>>> $srcfeature_id = $self->{srcfeature_id};
> >>>>> }
> >>>>> ...
> >>>>> elsif($self->{has_anatype}) {
> >>>>> $from_part .= "left join analysisfeature af using (feature_id) ";
> >>>>> }
> >>>>>
> >>>>>
> >>>>> sub sql_types
> >>>>> ..
> >>>>> $valid_type = $factory->name2term($temp_type);
> >>>>> $is_anatype= 0;
> >>>>> unless ($valid_type) {
> >>>>> $valid_type = $factory->an_name2term($temp_type);
> >>>>> $self->{has_anatype}= $is_anatype= 1 if ($valid_type);
> >>>>> }
> >>>>> ..
> >>>>> ## leave out extra invalid types
> >>>>> if (!$valid_type) {
> >>>>> ### skip
> >>>>> } elsif ($temp_dbxref) {
> >>>>> $sql_types .= $orsql."(f.type_id = $valid_type and fd.dbxref_id =
> >>>>> $temp_dbxref)";
> >>>>> } elsif($is_anatype) {
> >>>>> $sql_types .= $orsql."(af.analysis_id = $valid_type)"; #<<<
> >>>>> } else {
> >>>>> $sql_types .= $orsql."(f.type_id = $valid_type)";
> >>>>> }
> >>>>>
> >>>>>
> >>>>> Lists of GFF feature type:source from some current MOD data
> >>>>> where * are probably analysisfeature types (program:database)
> >>>>>
> >>>>> rice gff type:source
> >>>>> ftp://ftp.gramene.org/pub/gramene/release17/data/ 
> >>>>> sequence_annotation/
> >>>>> gff3/
> >>>>> --------------------
> >>>>> CDS:known
> >>>>> CDS:tigr
> >>>>> EST:cmap
> >>>>> EST_match:Barley (? might be EST_match:someprogram:Barley)
> >>>>> EST_match:Maize
> >>>>> EST_match:Millet
> >>>>> EST_match:Rice
> >>>>> EST_match:Sorghum
> >>>>> EST_match:Wheat
> >>>>> cDNA_match:Rice
> >>>>> cross_genome_match:Maize
> >>>>> cross_genome_match:Rice
> >>>>> cross_genome_match:Sorghum
> >>>>> * exon:FgenesH:Monocot
> >>>>> exon:known
> >>>>> exon:tigr
> >>>>> five_prime_UTR:tigr
> >>>>> gene:known
> >>>>> gene:tigr
> >>>>> * mRNA:FgenesH:Monocot
> >>>>> mRNA:known
> >>>>> mRNA:tigr
> >>>>> microsatellite:cmap
> >>>>> three_prime_UTR:known
> >>>>> three_prime_UTR:tigr
> >>>>> transposable_element_insertion_site:cmap
> >>>>>
> >>>>> worm gff type:source
> >>>>> ftp://ftp.wormbase.org/pub/wormbase/species/elegans/
> >>>>> genome_feature_tables/GFF3/
> >>>>> ----------------------
> >>>>> CDS:Coding_transcript
> >>>>> * CDS:Genefinder
> >>>>> CDS:Transposon_CDS
> >>>>> CDS:history
> >>>>> * CDS:twinscan
> >>>>> * EST_match:BLAT_EST_BEST (~ EST_match:BLAT:EST_BEST)
> >>>>> * EST_match:BLAT_EST_OTHER
> >>>>> PCR_product:GenePair_STS
> >>>>> PCR_product:Orfeome
> >>>>> RNAi_reagent:RNAi_primary
> >>>>> RNAi_reagent:RNAi_secondary
> >>>>> SNP:Allele
> >>>>> binding_site:binding_site
> >>>>> * cDNA_match:BLAT_mRNA_BEST (~ cDNA_match:BLAT:mRNA_BEST )
> >>>>> * cDNA_match:BLAT_mRNA_OTHER
> >>>>> clone_end:.
> >>>>> clone_start:.
> >>>>> complex_substitution :Allele
> >>>>> deletion:Allele
> >>>>> exon:Coding_transcript
> >>>>> * exon:Genefinder
> >>>>> exon:Non_coding_transcript
> >>>>> exon:Pseudogene
> >>>>> exon:Transposon_CDS
> >>>>> exon:history
> >>>>> exon:miRNA
> >>>>> exon:rRNA
> >>>>> exon:scRNA
> >>>>> exon:snRNA
> >>>>> exon:snoRNA
> >>>>> exon:tRNA
> >>>>> * exon:tRNAscan-SE-1.23
> >>>>> * exon:twinscan
> >>>>> experimental_result_region:Expr_profile
> >>>>> experimental_result_region:cDNA_for_RNAi
> >>>>> * expressed_sequence_match:BLAT_OST_BEST (~
> >>>>> expressed_sequence_match:BLAT:OST_BEST )
> >>>>> * expressed_sequence_match:BLAT_OST_OTHER
> >>>>> five_prime_UTR:Coding_transcript
> >>>>> gene:Coding_transcript
> >>>>> gene:gene
> >>>>> gene:history
> >>>>> gene:landmark
> >>>>> insertion:Allele
> >>>>> inverted_repeat:inverted
> >>>>> mRNA:Coding_transcript
> >>>>> * mRNA:Genefinder
> >>>>> mRNA:Transposon_CDS
> >>>>> mRNA:history
> >>>>> * mRNA:twinscan
> >>>>> miRNA:miRNA
> >>>>> nc_primary_transcript:Non_coding_transcript
> >>>>> * nucleotide_match:BLAT_EMBL_BEST (~  
> >>>>> nucleotide_match:BLAT:EMBL_BEST )
> >>>>> * nucleotide_match:BLAT_EMBL_OTHER
> >>>>> * nucleotide_match:BLAT_TC1_BEST
> >>>>> * nucleotide_match:BLAT_TC1_OTHER
> >>>>> * nucleotide_match:BLAT_ncRNA_BEST
> >>>>> * nucleotide_match:BLAT_ncRNA_OTHER
> >>>>> * nucleotide_match:TEC_RED
> >>>>> * nucleotide_match:waba_coding
> >>>>> * nucleotide_match:waba_strong
> >>>>> * nucleotide_match:waba_weak
> >>>>> oligo:.
> >>>>> operon:operon
> >>>>> polyA_signal_sequence:polyA_signal_sequence
> >>>>> polyA_site:polyA_site
> >>>>> processed_transcript:gene
> >>>>> protein_coding_primary_transcript:Coding_transcript
> >>>>> * protein_match:wublastx
> >>>>> pseudogene:Pseudogene
> >>>>> pseudogene:history
> >>>>> rRNA:rRNA
> >>>>> reagent:Oligo_set
> >>>>> region:.
> >>>>> region:Genbank
> >>>>> region:Genomic_canonical
> >>>>> region:Link
> >>>>> * repeat_region:RepeatMasker
> >>>>> scRNA:scRNA
> >>>>> sequence_variant:.
> >>>>> sequence_variant:Allele
> >>>>> snRNA:snRNA
> >>>>> snoRNA:snoRNA
> >>>>> substitution:Allele
> >>>>> tRNA:tRNA
> >>>>> * tRNA:tRNAscan-SE-1.23
> >>>>> tandem_repeat:tandem
> >>>>> three_prime_UTR:Coding_transcript
> >>>>> trans_splice_acceptor_site:SL1
> >>>>> trans_splice_acceptor_site:SL2
> >>>>> transcript:SAGE_transcript
> >>>>> * translated_nucleotide_match:BLAT_NEMATODE (~
> >>>>> translated_nucleotide_match:BLAT:NEMATODE )
> >>>>> transposable_element:Transposon
> >>>>> transposable_element:Transposon_CDS
> >>>>> transposable_element_insertion_site:Allele
> >>>>> transposable_element_insertion_site:Mos_insertion_allele
> >>>>>
> >>>>>
> >>>>> fly gff type:source
> >>>>> ftp://ftp.flybase.net/genomes/dmel/current/gff/
> >>>>> -----------------------
> >>>>> BAC:.
> >>>>> CDS:.
> >>>>> aberration_junction:.
> >>>>> chromosome:.
> >>>>> chromosome_arm:.
> >>>>> chromosome_band:.
> >>>>> enhancer:.
> >>>>> exon:.
> >>>>> five_prime_UTR:.
> >>>>> gene:.
> >>>>> insertion_site:.
> >>>>> intron:.
> >>>>> mRNA:.
> >>>>> * match:RNAiHDP
> >>>>> * match:assembly:path
> >>>>> * match:blastx:aa_SPTR.dmel
> >>>>> * match:blastx:aa_SPTR.insect
> >>>>> * match:blastx:aa_SPTR.othinv
> >>>>> * match:blastx:aa_SPTR.othvert
> >>>>> * match:blastx:aa_SPTR.plant
> >>>>> * match:blastx:aa_SPTR.primate
> >>>>> * match:blastx:aa_SPTR.rodent
> >>>>> * match:blastx:aa_SPTR.worm
> >>>>> * match:blastx:aa_SPTR.yeast
> >>>>> * match:genscan
> >>>>> * match:repeatmasker
> >>>>> * match:sim4:na_ARGs.dros
> >>>>> * match:sim4:na_ARGsCDS.dros
> >>>>> * match:sim4:na_DGC_dros
> >>>>> * match:sim4:na_dbEST.diff.dmel
> >>>>> * match:sim4:na_dbEST.same.dmel
> >>>>> * match:sim4:na_gadfly_dmel_r2
> >>>>> * match:sim4:na_gb.dmel
> >>>>> * match:sim4:na_gb.tpa.dmel
> >>>>> * match:sim4:na_smallRNA.dros
> >>>>> * match:sim4:na_transcript_dmel_r31
> >>>>> * match:sim4:na_transcript_dmel_r32
> >>>>> * match:tRNAscan-SE:.
> >>>>> * match:tblastx:na_agambiae
> >>>>> * match:tblastx:na_dbEST.insect
> >>>>> * match:tblastx:na_dpse
> >>>>> * match_part:RNAiHDP
> >>>>> * match_part:assembly:path
> >>>>> * match_part:blastx:aa_SPTR.dmel
> >>>>> * match_part:blastx:aa_SPTR.insect
> >>>>> * match_part:blastx:aa_SPTR.othinv
> >>>>> * match_part:blastx:aa_SPTR.othvert
> >>>>> * match_part:blastx:aa_SPTR.plant
> >>>>> * match_part:blastx:aa_SPTR.primate
> >>>>> * match_part:blastx:aa_SPTR.rodent
> >>>>> * match_part:blastx:aa_SPTR.worm
> >>>>> * match_part:blastx:aa_SPTR.yeast
> >>>>> * match_part:genscan
> >>>>> * match_part:repeatmasker
> >>>>> * match_part:sim4:na_ARGs.dros
> >>>>> * match_part:sim4:na_ARGsCDS.dros
> >>>>> * match_part:sim4:na_DGC_dros
> >>>>> * match_part:sim4:na_dbEST.diff.dmel
> >>>>> * match_part:sim4:na_dbEST.same.dmel
> >>>>> * match_part:sim4:na_gadfly_dmel_r2
> >>>>> * match_part:sim4:na_gb.dmel
> >>>>> * match_part:sim4:na_gb.tpa.dmel
> >>>>> * match_part:sim4:na_smallRNA.dros
> >>>>> * match_part:sim4:na_transcript_dmel_r31
> >>>>> * match_part:sim4:na_transcript_dmel_r32
> >>>>> * match_part:tRNAscan-SE:.
> >>>>> * match_part:tblastx:na_agambiae
> >>>>> * match_part:tblastx:na_dbEST.insect
> >>>>> * match_part:tblastx:na_dpse
> >>>>> mature_peptide:.
> >>>>> ncRNA:.
> >>>>> oligo:.
> >>>>> point_mutation:.
> >>>>> polyA_site:.
> >>>>> protein_binding_site:.
> >>>>> pseudogene:.
> >>>>> region:.
> >>>>> regulatory_region:.
> >>>>> rescue_fragment:.
> >>>>> scaffold:.
> >>>>> sequence_variant:.
> >>>>> snRNA:.
> >>>>> snoRNA:.
> >>>>> tRNA:.
> >>>>> three_prime_UTR:.
> >>>>> transcription_start_site:.
> >>>>> transposable_element:.
> >>>>> transposable_element_insertion_site:. 3116
> >>>>>
> >>>>>
> >>>>> yeast gff type:source count
> >>>>> ftp://genome-ftp.stanford.edu/pub/yeast/data_download/
> >>>>> chromosomal_feature/saccharomyces_cerevisiae.gff
> >>>>> -------------------------
> >>>>> ARS:SGD
> >>>>> CDS:SGD
> >>>>> binding_site:SGD
> >>>>> centromere:SGD
> >>>>> chromosome:SGD
> >>>>> gene:SGD
> >>>>> insertion:SGD
> >>>>> intron:SGD
> >>>>> ncRNA:SGD
> >>>>> nc_primary_transcript:SGD
> >>>>> nucleotide_match:SGD
> >>>>> pseudogene:SGD
> >>>>> rRNA:SGD
> >>>>> region:SGD
> >>>>> region:landmark
> >>>>> repeat_family:SGD
> >>>>> repeat_region:SGD
> >>>>> snRNA:SGD
> >>>>> snoRNA:SGD
> >>>>> tRNA:SGD
> >>>>> telomere:SGD
> >>>>> transposable_element:SGD
> >>>>> transposable_element_gene:SGD
> >>>>>
> >>>>> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> >>>>> -- gilbertd at indiana.edu -- http://marmot.bio.indiana.edu/
> >>>>>
> >>>>>
> >>>>>
> >>>>> -------------------------------------------------------
> >>>>> This SF.Net email is sponsored by the 'Do More With Dual!' webinar
> >>>>> happening
> >>>>> July 14 at 8am PDT/11am EDT. We invite you to explore the latest  
> >>>>> in dual
> >>>>> core and dual graphics technology at this free one hour event  
> >>>>> hosted
> >>>>> by HP, AMD, and NVIDIA. To register visit
> >>>>> http://www.hp.com/go/dualwebinar
> >>>>> _______________________________________________
> >>>>> Gmod-gbrowse mailing list
> >>>>> Gmod-gbrowse at lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>>>
> >>>>
> >>>>
> >>> --
> >>> --------------------------------------------------------------------- 
> >>> ---
> >>> Scott Cain, Ph. D.                                          
> >>> cain at cshl.edu
> >>> GMOD Coordinator (http://www.gmod.org/)                      
> >>> 216-392-3087
> >>> Cold Spring Harbor Laboratory
> >>>
> >>>
> >>>
> >>> -------------------------------------------------------
> >>> SF.Net email is Sponsored by the Better Software Conference & EXPO  
> >>> September
> >>> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> >>> Agile & Plan-Driven Development * Managing Projects & Teams *  
> >>> Testing & QA
> >>> Security * Process Improvement & Measurement *  
> >>> http://www.sqe.com/bsce5sf
> >>> _______________________________________________
> >>> Gmod-devel mailing list
> >>> Gmod-devel at lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> >>>
> >>
> >>
> >>
> >>
> >> -------------------------------------------------------
> >> SF.Net email is Sponsored by the Better Software Conference & EXPO  
> >> September
> >> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> >> Agile & Plan-Driven Development * Managing Projects & Teams * Testing  
> >> & QA
> >> Security * Process Improvement & Measurement *  
> >> http://www.sqe.com/bsce5sf
> >> _______________________________________________
> >> Gmod-devel mailing list
> >> Gmod-devel at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/gmod-devel
> > -- 
> > ----------------------------------------------------------------------- 
> > -
> > Scott Cain, Ph. D.                                          
> > cain at cshl.edu
> > GMOD Coordinator (http://www.gmod.org/)                      
> > 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory