[Bioperl-l] BLAST to FeaturePair

hilmar.lapp@pharma.Novartis.com hilmar.lapp@pharma.Novartis.com
Mon, 31 Jul 2000 18:19:20 +0100




>
> This could be done client side: Keep BPLite just "representing BLAST"
> without too much magic. But stripping '>' seems sane.

i can do that. what should be stored in $feature->seqname ?
(but as there is no full sequence object, i can't store ids and accs,
right?)

     Generally, to my understanding the name of a sequence should be at
     least somewhat unique, like an indentifier. In a BLAST report,
     alignment sections start with '>', followed by the database name of
     the hit (no spaces), whitespace, and the accession (usually), again
     whitespace and the description. So, I'd store as seqname only what
     matches />(\S+)/. Of course, the caller can do this as well.

> > b) the
> > lengths of the sequences are not stored (would require additional
parsing
> > code),

ok, i can parse the length, but how should i store it ?

     I'm working on this.

> > c) properties of the alignment are stored as 'new' tags, instead of
> > through the tag system. This prevents them from easy de/serialization
> > through the gff_string()/_from_gff_string() methods. (BTW does the
string
> > returned by $bplite_hsp->homologySeq() make sense to anyone?)
>
> Talking to Lorenz - I'm not siure about this.

what properties do you mean? score, bits, P value, matching, positives
and such things? if so then hilmar is right, they are not stored through

     Yes, that's what I mean. I'm working on a class that offers better
     support for this, so you could inherit off that instead of
     SeqFeature::Generic/FeaturePair.

why does the homologySeq make no sense?
(i just adopted it from the original BPlite...)

     It only contains bars and spaces. I wouldn't know what to do with
     that. (The gaps are lost, so you cannot use it for locating mismatched
     bases.)

and could someone please explain to me what's the purpose of those
gff_string methods?

     GFF = Generic Feature Format (used to be Gene Finding Format). I do
     not know the URL by heart, but it's documented somewhere on the Sanger
     site (www.sanger.ac.uk). It provides an easy ASCII exchange format, so
     that the methods for de/serializing feature objects are almost there.
     These alraedy take care of the tag system being de/serialized, and if
     you wish to de/serialize anything else, you would have to override
     these for each derived class (which is not a bad thing to do, but if I
     can I'd save the work).

          Hilmar