[Bioperl-l] Bio::*Taxonomy* changes

Chris Fields cjfields at uiuc.edu
Mon Jul 24 22:06:16 EDT 2006


>
>> I'll tell you what. This will be easier if I just write the code
>> for my
>> proposals, including whatever changes would be needed in
>> Bio::SeqIO::genbank et al.
>
> Never get in the way of somebody who threatens to code :-) so I
> certainly won't. I think you're on the right track.

Fine by me.  My only request: I don't want every sequence passing  
through SeqIO having an automatic DB lookup performed on it.  SeqIO  
parsing of GenBank files is slow enough as it is w/o enforcing  
lookups, even if they are cached.

If you want lookups, have it as an option and not as default  
behavior.  We could have the option for a lookup added pretty easily  
in genbank.pm _initialize or the main SeqIO constructor as a simple  
Boolean flag.  That might be pretty nice.

...

> (). But what happens behind the scene?)
> 	- genbank.pm writing the SOURCE information for a sequence

You know, the only really divisive point here is the lineage data and  
how to store it in _read_GenBank_Species or reproduce it in write_seq 
().  Again, I don't think we should have a forced lookup for this; it  
should just be stored as is, either in Node or SimpleValue.  Again, I  
think the latter as everyone seems averse to containing this in Node.

> Then maybe some advanced uses:
>
> 	- from a sequence stream, retain only those of primates
> 	- like above, but only mitochondrial sequences
> 	- for an organism, query entrez for all sequences of strains,
> varieties, or subspecies sequences for that organism

For the primate example, would you screen those out via the in-file  
lineage or using lookups?

Something like '$seqout->write_seq($seq) if ($seq->species->organelle  
eq 'mitochondrion');' for the mitochondria example, which would mean  
leaving organelle() in Species/Node or whatever is used.

The last one, I think, can be done w/o using the sequence directly  
using NCBI's ELink and the TaxID to cross-reference the nucleotide  
database.  You would probably have to walk through all child nodes,  
but it's feasible that way.

> Add your own if these sound stupid ...
>
> Just an idea.
>
> 	-hilmar
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list