[Biojava-dev] reading a subsequence from a .nib file

Josh Burdick jburdick at keyfitz.org
Mon Jan 22 16:46:44 UTC 2007


  I wrote some code to read a chunk of DNA sequence from a file in Jim
Kent's blat ".nib" file format.  This is a simple format using four
bits/base.

  I didn't attach the code, to avoid spamming the whole list; but it,
and a (very crude!) JUnit test, are at

http://www.keyfitz.org/jburdick/read_nib_file_java/NibFile.java
http://www.keyfitz.org/jburdick/read_nib_file_java/NibFileTest.java

  You could use 2 bits/base, but then you can't have ambiguous bases.  4
bits/base seems like a reasonable compromise; plus sites that have
"blat" installed will need to have the .nib files on a server somewhere
anyway, and this way repeat-masking can be included, which may be
convenient.

  Also, it doesn't support writing a .nib file; again, presumably people
will be using Jim Kent's faToNib program to do that.

  It would need some tweaking to be included in BioJava, because it
returns a plain String of ACGT, instead of a PackedSequence object.
(Probably this would just involve rewriting the setupBuffer() and
addToBuffer() methods in the code.)  Also, the coordinate information
could come from a Range object.

  If similar code is already somewhere in BioJava, please ignore this;
but I couldn't find it with thirty seconds of Googling, so I figured it
hadn't been written...

Josh Burdick
programmer, Vivian Cheung's lab, Children's Hospital of Philadelphia
jburdick at keyfitz.org





More information about the biojava-dev mailing list