[Bioperl-l] Re: Question on the scf file format

Heikki Lehvaslaiho heikki at ebi.ac.uk
Mon Oct 27 11:24:52 EST 2003


Guillaume,

In other words: Tony's fixes are only in the CVS head and in developer
releases in our website. They never made it out into the 1.2 release
series. It was an oversight  which I apologise profusely.

Please install the 1.3.02 from 
http://bioperl.org/DIST/current_core_unstable.tar.gz
and try again.

Yours,
	-Heikki




On Mon, 2003-10-27 at 15:38, Tony Cox wrote:
> On Mon, 27 Oct 2003, Guillaume Giraudon wrote:
> 
> Hi Guillaume,
> 
> This looks like this may be the "cast" you have to apply if you are using
> 8bit/16bit data values. Check the code I added to the Bioperl SCF.pm module to
> make this work properly for all SCF files.
> 
> Tony
> 
> 
> 
> +>Hi Jason, Hi Tony, Hi Heikki
> +>
> +>I'm sorry to bother you with this simple question but I saw one of your source files (scf.pm) on the web and though you might be able to help me out on this matter : I am trying to write a web based scf file viewer (in php). I came across a lot of documents that seem to all be based on the RFC I found at
> +>http://www.mrc-lmb.cam.ac.uk/pubseq/scf-rfc.html
> +>
> +>I have attached a zip file of the files I'm working with so that you might take a look at them. I'm comparing the results I get from my program with what Chromas (v1.45) gives me. So far, I believe I'm parsing the header correctly. What I get makes sense :
> +>
> +>scf_header Object
> +>(
> +>    [magic_number] => 779314022
> +>    [samples] => 10934
> +>    [samples_offset] => 128
> +>    [bases] => 899
> +>    [bases_left_clip] => 0
> +>    [bases_right_clip] => 0
> +>    [bases_offset] => 87600
> +>    [comments_size] => 364
> +>    [comments_offset] => 98388
> +>    [version] => 3.00
> +>    [sample_size] => 2
> +>    [code_set] => 0
> +>    [private_size] => 0
> +>    [private_offset] => 0
> +>    [spare] =>
> +>)
> +>
> +>Now when I start parsing the Samples section, I get confused. From what I can gather, Its composed of delta differences between each sample (and not the values themselves as I originally thought).
> +>
> +>Strangely, I believe I'm calculating my offsets fine because the very first vales of all A,C,G and T match what I have in the raw_data.txt file (exported with chromas). But I cant seem to read the rest of the samples correctly.
> +>
> +>Here is a little HEX extract from the scf file :
> +>
> +>00000000h: 2E 73 63 66 00 00 2A B6 00 00 00 80 00 00 03 83 ; .scf..*¶...€...ƒ
> +>00000010h: 00 00 00 00 00 00 00 00 00 01 56 30 00 00 01 6C ; ..........V0...l
> +>00000020h: 00 01 80 54 33 2E 30 30 00 00 00 02 00 00 00 00 ; ..€T3.00........
> +>00000030h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000040h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000050h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000060h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000070h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000080h: 00 A0 FF 60 00 00 00 10 00 03 FF FE FF FD FF FB ; . ÿ`......ÿþÿýÿû
> +>00000090h: FF FE FF FA FF FB FF FD FF FB FF FF FF FF 00 00 ; ÿþÿúÿûÿýÿûÿÿÿÿ..
> +>000000a0h: FF FF 00 01 00 00 FF FD 00 01 FF FD 00 03 00 00 ; ÿÿ....ÿý..ÿý....
> +>000000b0h: 00 04 00 01 00 03 00 01 00 03 00 00 00 02 FF FF ; ..............ÿÿ
> +>000000c0h: 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>
> +>The first 128 bytes are the header and I seem to read the part fine. My samples_offset is 128 (80h) and my sample_size is 2. At that location (80h), the first value is '00 A0' witch is 160 in decimal. That’s exactly my first value so that look about right. But then I have FF 60. Is this supposed to be a delta ?
> +>I though it could be but then, if its an unsigned value, it’s a bit huge.
> +>I tried to consider it as a signed value but then again, I cant seem to get the same thing as Chromas.
> +>
> +>Here is a short extract of what I get :
> +>
> +>A	C	G	T
> +>160	0	72	6
> +>-160	0	-84	-8
> +>0	0	-5	-1
> +>16	0	2	2
> +>3	0	3	1
> +>-2	0	2	0
> +>-3	0	5	0
> +>-5	0	4	0
> +>-2	0	1	0
> +>-6	0	0	0
> +>-5	0	0	0
> +>-3	0	0	0
> +>-5	0	0	0
> +>-1	0	0	0
> +>
> +>Any idea of what I might be doing wrong ?
> +>
> +>Thank you in advance,
> +>
> +>G.Giraudon
> +>
> +>
> +>
> 
> ******************************************************
> Tony Cox			Email:avc at sanger.ac.uk
> Sanger Institute		WWW:www.sanger.ac.uk
> Wellcome Trust Genome Campus	Head,Software Services
> Hinxton				Tel: +44 1223 834244
> Cambs. CB10 1SA			Fax: +44 1223 494919
> ******************************************************
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list