[Bioperl-l] Bio::DB::SeqFeature sequences with no identifier?

Fields, Christopher J cjfields at illinois.edu
Fri May 23 17:20:20 UTC 2014


On May 23, 2014, at 1:21 AM, Mark Wilkinson <markw at illuminae.com> wrote:

> On 22/05/2014 7:05 PM, Fields, Christopher J wrote:
>> The specification *allows* for FASTA to be stored with the features at the end, but it does not *require* it.  I personally never store them together, but my preferences differ from others, which is probably why this is made to be flexible.
> 
> We're actually talking about different things.  The loader script allows you to pass filenames for both/either GFF and FASTA files. If I pass *just* FASTA, I get a bunch of un-named sequences in my database.... which IMO is "just plain wrong"!  LOL!
> 
> M

S’what happens when I start reading the thread a little late. :P

Re: not using raw FASTA along, I’ll have to try that out but I agree, that is a definite bug (as you mentioned it should at least do a INSERT OR IGNORE at the sequence indexing step and create a minimal region in cases like that).  

I think if you use the 'in-memory’ database along with a FASTA, it will index the FASTA (using Bio::DB::Fasta) and the seqs are available.  Of course this also indicates the intent is to make seq data somehow available even in the absence of features.

chris



More information about the Bioperl-l mailing list