[BioPython] Reading Roche 454 binary SFF files in Python
Jose Blanca
jblanca at btc.upv.es
Wed Apr 15 07:07:27 UTC 2009
> I was aware that some information was available about the SFF file
> format, and it should be possible to reverse engineer the format in
> order to read and write it directly from Biopython.
The sff format is fully documented in the NCBI's SRA web site.
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=formats#sff
> Right now with your code under the GPL, we can't incorporate it into
> Biopython, but if you and Bastien are prepared to offer it to
> Biopython under our MIT/BSD licence that could be very useful. Even
> without that, any documentation on the file format or example files
> you might be able to share could be valuable.
I guess that it wouldn't be a problem to offer you the code under your
licence. But I don't think that's the best approach. The code as it is right
now is not well suited to be integrated in a library. It would be easier to
rewrite the sff reading part from scratch. I could do that for you in no
time. The main problem would be to have sff files small enough to be used for
the test. If you could provide that I could write the code to extract the
information from the sff file for you. It would be easy to build a generator
able to deliver the sequences one by one.
sff_extract also is able to split the paired-ends reads. That's the part that
Bastien wrote. Integrating that would be nice, but I think that in Biopython
that should be treated as an independent problem.
> P.S. Have you tested your sff_extract software on SFF files from the
> new Roche v2 software, released about the same time as the "titanium"
> 454 upgrade?
Not me, but I think that Bastien has and he has found no problem at all with
that. The sff format is well thought and consistent, the 454 people did a
much better job than the ABI people did with the abi format.
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
More information about the Biopython
mailing list