[Biojava-l] Re: [Biojava-dev] reading pdb format or using tagvalue?
Matthew Pocock
matthew_pocock@yahoo.co.uk
Fri, 17 Jan 2003 11:56:40 +0000
Hi Russell,
The tag-value stuff assumes that each line can be broken into a single
tag with a value. Things like pdb don't look quite like this (multiple
types of values on some lines), but I recently added some handlers to
fool the system. You will need the 1.3 snapshot, and a Java 1.4 or
higher vm.
Start off by creating a LineSplitParser instance. You will then have to
configure it to match PDB. For example, each record seems to have a 6
char tag, so you need to call lsp.setSplitOffset(6). Also, every line is
a new piece of data (unlike embl where multiple lines with the same tag
are part of the same entry), so you need to call
lsp.setMergeSameTag(false). Continue in this vein untill you think you
have something that should process the skeleton of the file.
Then, look at the demo code under demos-1.4/unigene/ParseUnigene.java
for a simple skeleton for hooking your customized parser to some debug
output. Once this is done, you should be able to see what kind of job
it's made of the pdb entries.
Now comes the fun bit. The values so far will be single strings for the
entire bit of the line that's not a tag. This is next to useless. You
realy need to tokenize each line. You do this using a combination of
TagDelegator and RegexFieldFinder. Let's call the instance of
TagDelegator td. Now, for example, call td.setListener("HEADER",
headerHandler). You can make headerHandler an instance of
RegexFieldFinder, configure it with a regex to match the name and date
and ID, and name them sanely. Don't forget to pass in your debug
listener as the delegate for headerHandler - that way the events will
get dumped out. For entries like AUTHOR that are lists, you can
associate a listener that splits the output up. Use ChangeTable,
RegexSplitter and ValueChanger to describe the process.
Sorry, this has got too long already. See how far you can get on your
own and then pester me. It's not that hard to write these things once
you're up to speed, but there's a steep learning curve.
Matthew
Russell Smithies wrote:
>
> Hi,
> Has anyone got an example of how to use Matthew's new
> biojava\bio\program\tagvalue package?
>
> I wantto read 'tags' off .pdb files and get the property (atom x,y,z
> coords) back and to do many(everything Brookhaven/RCSB has maybe?) files
> so converting to xml first is probably a bit time/resource-consuming.
>
> Maybe creating new Annotations is the better way to do it?
> Or can I trick SeqIOTools.readEmbl() to do it?
>
> Any ideas?
>
> thanx
> Russell
>
>
>
>
> _________________________________________________________________
> MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*
> http://join.msn.com/?page=features/virus
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
>
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk