[Biojava-dev] reading a subsequence from a .nib file
Josh Burdick
jburdick at keyfitz.org
Mon Jan 22 16:46:44 UTC 2007
I wrote some code to read a chunk of DNA sequence from a file in Jim
Kent's blat ".nib" file format. This is a simple format using four
bits/base.
I didn't attach the code, to avoid spamming the whole list; but it,
and a (very crude!) JUnit test, are at
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java
You could use 2 bits/base, but then you can't have ambiguous bases. 4
bits/base seems like a reasonable compromise; plus sites that have
"blat" installed will need to have the .nib files on a server somewhere
anyway, and this way repeat-masking can be included, which may be
convenient.
Also, it doesn't support writing a .nib file; again, presumably people
will be using Jim Kent's faToNib program to do that.
It would need some tweaking to be included in BioJava, because it
returns a plain String of ACGT, instead of a PackedSequence object.
(Probably this would just involve rewriting the setupBuffer() and
addToBuffer() methods in the code.) Also, the coordinate information
could come from a Range object.
If similar code is already somewhere in BioJava, please ignore this;
but I couldn't find it with thirty seconds of Googling, so I figured it
hadn't been written...
Josh Burdick
programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia
jburdick at keyfitz.org
More information about the biojava-dev
mailing list