[Biojava-dev] ABI file parser

Yan Bai Yan.Bai at UTSouthwestern.edu
Tue Apr 4 15:37:00 UTC 2006


I will need to parse the information about the sequences, i.e., sample name, comment, instrument model, run date/time and etc., plus quality calls (sample scores) and save them into a database. I guess I have to modify some library files to fullfill what I need, but dont' know where to start from, which files I need to look into. your inputs are highly appreciated.

While reading the source code, I wondered about a variable named MacJunk, which is, according to the descriptions, prepended junks to real data. does it exit in mac files only? it looks like a cross-platform general offset for all data information, not limited to Macintosh, am I miss somthing here?

Thanks,

Yan

>>> Richard Holland <richard.holland at ebi.ac.uk> 03/23/06 3:30 AM >>>
I've used the BioJava ABI parser to parse 3730 ABI files without any
problems, and it successfully reads both base calls and quality scores.

You should use the ABIFChromatogram method getBaseCalls() to return an
alignment of two sequences - the first sequence is the sequence data,
the second is a sequence made up of Integer scores.

cheers,
Richard

On Wed, 2006-03-22 at 14:25 -0700, Russ Kepler wrote:
> On Wednesday 22 March 2006 02:05 pm, Yan Bai wrote:
> 
> > Another question is about the ABI file parser, located in the package 
> > org.biojava.bio.program.ABIFParser. Comments of this file indicate that it
> > parses files from 377 DNA sequencer, while our sequence files are generated
> > by 3730 XL,  are there any mismatches between these two formats? Is there a
> > parser specific for 3730? I couldn't find anything describe the 3730 XL
> > format like the one Clark Tibbett wrote.
> 
> The differences that I can really are the addition of the quality calls and 
> (maybe) caller name.  I'm sure that there are others, but since I wasn't 
> looking for them I never really noticed their absence.  I've got a parser 
> that keeps the quality call values if you need it.
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-- 
Richard Holland
European Bioinformatics Institute
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, UK
Tel: +44-(0)1223-494416
---------------

_______________________________________________
biojava-dev mailing list
biojava-dev at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list