[Bioperl-l] Bio::*Taxonomy* changes
Chris Fields
cjfields at uiuc.edu
Mon Jul 24 22:06:16 EDT 2006
>
>> I'll tell you what. This will be easier if I just write the code
>> for my
>> proposals, including whatever changes would be needed in
>> Bio::SeqIO::genbank et al.
>
> Never get in the way of somebody who threatens to code :-) so I
> certainly won't. I think you're on the right track.
Fine by me. My only request: I don't want every sequence passing
through SeqIO having an automatic DB lookup performed on it. SeqIO
parsing of GenBank files is slow enough as it is w/o enforcing
lookups, even if they are cached.
If you want lookups, have it as an option and not as default
behavior. We could have the option for a lookup added pretty easily
in genbank.pm _initialize or the main SeqIO constructor as a simple
Boolean flag. That might be pretty nice.
...
> (). But what happens behind the scene?)
> - genbank.pm writing the SOURCE information for a sequence
You know, the only really divisive point here is the lineage data and
how to store it in _read_GenBank_Species or reproduce it in write_seq
(). Again, I don't think we should have a forced lookup for this; it
should just be stored as is, either in Node or SimpleValue. Again, I
think the latter as everyone seems averse to containing this in Node.
> Then maybe some advanced uses:
>
> - from a sequence stream, retain only those of primates
> - like above, but only mitochondrial sequences
> - for an organism, query entrez for all sequences of strains,
> varieties, or subspecies sequences for that organism
For the primate example, would you screen those out via the in-file
lineage or using lookups?
Something like '$seqout->write_seq($seq) if ($seq->species->organelle
eq 'mitochondrion');' for the mitochondria example, which would mean
leaving organelle() in Species/Node or whatever is used.
The last one, I think, can be done w/o using the sequence directly
using NCBI's ELink and the TaxID to cross-reference the nucleotide
database. You would probably have to walk through all child nodes,
but it's feasible that way.
> Add your own if these sound stupid ...
>
> Just an idea.
>
> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list