[Biopython] [Biopython-dev] Upcoming NCBI BLAST XML2 format

Jan Kim jttkim at googlemail.com
Fri May 8 10:58:01 UTC 2015


Dear All,

for what it's worth, I agree that spewing out multiple files is a really
bad idea in the context of scripted / automated processing using pipes.

As the Xinclude file is designed to be "used to generate a single XML document
that contains results from all the queries in a search", I specifically
would argue that there should be an option (command line switch etc.) for
BLAST to emit that single document, rather than a collection of files.
The coding overhead to provide that must be marginal, and it will save many
of us a substantially larger overhead resulting from generating temporary
directories and cleaning them up.

Best regards, Jan

On Thu, May 07, 2015 at 12:14:13AM +0000, Fields, Christopher J wrote:
> I agree, it???s worth asking NCBI about this.  Now, whether we get an answer or not is another issue???
> 
> chris
> 
> > On May 6, 2015, at 3:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > 
> > Hi Travis,
> > 
> > I've no idea what the rational is for this bit of the change
> > (other than the existing blast XML abuses the <iteration>
> > tag for multiple queries), but haven't yet tried looking at
> > the example output so I'm not panicking yet.
> > 
> > However, we may want to lobby the NCBI about this...
> > 
> > Peter
> > 
> > On Wed, May 6, 2015 at 5:50 PM, Travis Wrightsman <twrig002 at ucr.edu> wrote:
> >> Peter,
> >> 
> >> It seems that if support for the original single XML output for multiple
> >> queries is dropped then BioPython will need to either stitch together all
> >> the XML files using the base Xinclude file or iterate through all the files
> >> and concatenate them in an object.
> >> 
> >> Does anyone know why NCBI is changing to a multi-file output instead of a
> >> single-file output that is easier to work with programmatically? There must
> >> be someone or some software suite benefiting from this change and it's not
> >> BioPython.
> >> 
> >> Travis
> >> 
> >> On Wed, May 6, 2015 at 7:49 AM, Peter Cock <p.j.a.cock at googlemail.com>
> >> wrote:
> >>> 
> >>> On Wed, May 6, 2015 at 3:22 PM, Martin Mokrejs
> >>> <mmokrejs at fold.natur.cuni.cz> wrote:
> >>>> Hi,
> >>>>  are you aware of new changes in BLAST's XML format? Time for feedback
> >>>> before it emerges. ;-)
> >>>> 
> >>>> ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/xml2.pdf
> >>>> 
> >>>> Martin
> >>> 
> >>> Yes, but thanks for double checking:
> >>> 
> >>> http://lists.open-bio.org/pipermail/biopython-dev/2015-May/020923.html
> >>> 
> >>> I'm a little nervous about the idea that BLAST+ will not provide single
> >>> (large) XML files for multiple-query searches, and instead appears to
> >>> be going to produce one file per query and a manifest xinclude file.
> >>> 
> >>> This sounds problematic for things like parsing via stdout.
> >>> 
> >>> What have you noticed?
> >>> 
> >>> Peter
> >>> _______________________________________________
> >>> Biopython-dev mailing list
> >>> Biopython-dev at mailman.open-bio.org
> >>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> >> 
> >> 
> > _______________________________________________
> > Biopython mailing list  -  Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
> 
> 
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython

-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


More information about the Biopython mailing list