[Bioperl-l]ABI.pm and .ab1 files

Malay mbasu at mail.nih.gov
Wed Apr 28 21:40:10 EDT 2004


Kevin Roland Viel wrote:
> Malay,
> 
>    Thanks.  I diverted some attention to perl, but just not enough :(
> 
>    This still frustrates me, but I have found a kludge.  I use phred so I
> just added the -cd <directory> -cp 2.  The SCF file produced has a 128
> byte header followed by the data of interest.  I read this file directly
> using SAS (using the S370FPIB2. informat).  The file is in big endian
> format.  If anyone following this thread might suggests where and how
> the ab1 format stores these data, I would be very thankful as it would
> save space (why have a .scf and .ab1 file?).

Here you go:

ABI file can start with a 128 byte header (if generated in MAC ). So
basically search for the the either the 0-3 or bytes 128-131 for the
string "ABI", For all calculation after that offset the byte number
accordingly if the mac header is present.

Read bytes 18-21 shows a number read as N

Read byte 26-29 shows the the offset of a table index. Read that number (A).

Go to the position A and loop N times reading each time 28 bytes at a
time each time check for the presence of string "DATA" in the first 4
bytes of the 28 bytes read and increment a counter, whenever the string
is present. DATA segments 9 - 12 contains the adresses of traces. We
don't know which segments represents which base. To know that you also
have to look for the 28 bytes segments starting with "FWO_". The address
of each segment is given as 32 bit long integer presnt form bytes 20-23
of each 28 byte segment. Note all the five offset addresses. In each 28
bytes of DATA segment the bytes 8-11 reprenstn a 32 bit integer
containing the number of point in the trace value.

Go to "FWO_"  offset and read 4 bytes each containing a base. For
example if you read A then G then C then T that means the DATA segments
9 - 12 has the same order.

Now go to offset of each DATA segment (segements 9 - 12) read a series
of 16 bit long interger as many times shown by the length of the DATA
segment.

There you are you have your trace values. :)





> 
>    For what is worth, I have attached a gif of my subregion.  I have found
> it very useful and very informative for review.
> 
> Regards,
> 
> Kevin
> 
> Kevin Viel
> Department of Epidemiology
> Rollins School of Public Health
> Emory University
> Atlanta, GA 30322
> 
> On Wed, 28 Apr 2004, Malay wrote:
> 
> 
>>Here is a way to do this out of the BioPerl:
>>
>>Download and istall ABI.pm
>>
>>http://cpan.uwinnipeg.ca/cpan/authors/id/M/MA/MALAY/ABI-0.01.tar.gz
>>
>>use ABI;
>>
>>my $abi = ABI->new(-file=>"mysequence.abi");
>>my $seq = $abi->get_sequence(); # To get the sequence
>>my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>my @base_calls = $abi->get_base_calls(); # Get the base calls
>>
>>Malay
>>malay at mail.nih.gov
>>
> 
>

I hope this helped.

Malay
mbasu at mail.nih.gov


More information about the Bioperl-l mailing list