[Bioperl-l] XML BLAST parsing & accessions

T.D. Houfek tdhoufek@unity.ncsu.edu
Thu, 20 Jun 2002 16:12:55 -0400 (EDT)


>From the %MAPPING hash in Bio::SearchIO::blastxml.pm it appears that the
ID taken from the <BlastOutput_query-ID> tag is mapped to 'runid'.  I'm
not sure how to go about accessing that... and maybe <BlastOutput_query-ID>
should map to something else besides 'queryname' if the name/ID is no longer
stuck in that tag.

I'll look at this some more, but my brain is now underscoring the
tremendous importance of taking a nap on the laboratory couch.

TD

On Thu, 20 Jun 2002, T.D. Houfek wrote:

> Hi Jason,
>
> > TD - good to see you on list -
>
> Thanks!  It's good to be here.  :-)
>
> > this is entirely dependent on what BLAST
> > does, i.e. I implemented it so it just pull what is in
> > <BlastOutput_query-def> </> into query_name and then it takes the first
> > white space delimited section (i.e.) /(\S+)\s+(\S+)/ -- and makes that the
> > name, and the second one is the description.  It tries to guess the
> > accession as well based on the last '|'
>
> Aha... that's gotta be the problem then.  In my output,
> <BlastOutput_query-def> has apparently already performed some operation
> like (\S+)\s+(\S+), and taken only $2.  So with a header line like:
>
> >gnl|NCSU_FGL.blast|03E20.Contig1  M. grisea project xsal BAC03E20 Contig 1
>
> I get something like this:
>
> <BlastOutput_query-def>M. grisea project xsal BAC03E20 Contig 1</BlastOutput_query-def>
>
> And the other needed information is currently put in a
> <BlastOutput_query-ID> tag:
>
> <BlastOutput_query-ID>gnl|NCSU_FGL.blast|03E20.Contig1</BlastOutput_query-ID>
>
> I went to check what version I have and can't for the life of me figure
> out where the distribution hides the information (no -v or -V stuff seems
> to work... they tell you the info is in a file that isn't there, etc).
> But it is a very recent version; a few months ago they made changes to the
> format of their databases, and this version postdates that change.
>
>
> T.D. Houfek
>
> system administrator
> Fungal Genomics Laboratory
> Center for Integrated Fungal Research (CIFR)
> North Carolina State University
> ph: (919)513-0025  e: tdhoufek@unity.ncsu.edu
>
>
>

T.D. Houfek

system administrator
Fungal Genomics Laboratory
Center for Integrated Fungal Research (CIFR)
North Carolina State University
ph: (919)513-0025  e: tdhoufek@unity.ncsu.edu