[Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files

Brian Osborne osborne1 at optonline.net
Thu Mar 9 14:26:40 EST 2006


Chris,

> Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI using
> return type of 'gbwithparts' so the work is done on their end and just

I think it's reasonable to use eutils in this way, yes. It's no longer "pure
Bioperl" but all of this stuff is depending on eutils anyway. The downside
is that their API may change but it looked like you wrote some tests for
this, yes? Just my opinion.

I believe the lack of filling Ns is a bug on Bioperl's part due to the
inability of the Bio::Location code to understand NCBI's gaps(). If there
are Ns in the sequence we shouldn't just be deleting them, that's not good.

Brian O


On 3/9/06 1:08 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> Added WGS and WGS_SCAFLD support to Bio::SeqIO::genbank as well as tests and
> WGS sample file; the previous fix missed the WGS_SCAFLD line.  I will also
> soon add support to Bio::DB::GenBank for downloading WGS and WGS_SCAFLD
> subfiles.
> 
> Brian, I found a pretty decent speed improvement for contig building in
> Bio::DB::NCBIHelper; it basically fetches the contig whole from NCBI using
> return type of 'gbwithparts' so the work is done on their end and just
> switches the CONTIG line with the sequence; it took about 10 seconds vs. ~50
> seconds using an unmodified NCBIHelper on my PC.  I haven't committed it yet
> bc I noticed the resulting contig files differ; the bioperl contig build
> lacks any N's from the 'gaps()' in the CONTIG line while NCBI's version has
> the N filler.  I didn't know if the difference was a bug or not.  Should I
> go ahead and commit?
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 




More information about the Bioperl-l mailing list