[Bioperl-l] SeqIO

Marc Logghe Marc.Logghe at ablynx.com
Fri Mar 7 09:04:35 UTC 2008


Ahh, my reply did not make much sense when I took a new look. I was the
one who learnt something here :-)
Did not know that Bio::SeqIO was already using B::T::GuessSeqFormat
under the hood. Learnt as well that you have to be careful with the
filename extension because this seems to have precedence.
Regards,
Marc


> -----Original Message-----
> From: Staffa, Nick (NIH/NIEHS) [mailto:staffa at niehs.nih.gov]
> Sent: vrijdag 7 maart 2008 0:28
> To: Marc Logghe; Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] SeqIO
> 
> Thanks
> I really appreciate all the interest given and help generated.
> that sure sounds like a great idea, but i think
> Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
> Is there a substitute?
> It works great with
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> 
> as seen in:
> gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
> guesser guesses gcg
>
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAG
GG
> CA
> GAGCCTCGGGA  et c.
> (yes, I added
> my $file_type = $guesser->guess;
> print "guesser guesses $file_type\n";
> )
> 
> BUT
> when applied to a genbank sequence passed thru the Seqlab editor and
> turned
> into GCG, to wit:
> !!NA_SEQUENCE 1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> DEFINITION  Human testis-specific PGK-2 gene for phosphoglycerate
kinase
>             (ATP:3-phospho-D-glycerate 1-phosphotransferase, EC
2.7.2.3).
> ACCESSION   X05246 Y00261
> ...
> ...
> BASE COUNT      583 a    367 c    442 g    519 t
> ORIGIN
> 
>  HSPGK2G  Length: 1911  August 24, 1998 10:56  Type: N  Check: 4156
..
> 
>        1  GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
> et c.
> 
> It thinks it is a flawed PIR:
> 
> gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
> guesser guesses pir
> 
> ------------- EXCEPTION  -------------
> MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE
1.0
> LOCUS       HSPGK2G      1911 bp    DNA             PRI
12-SEP-1993
> 
> 
> Must look at why guesser is thinking PIR.
> 
> 
> 
> 
> On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> 
> > Hi Nick,
> > I don't think you should leave out the -format option. You have to
leave
> > it in but the format should be provided by the B::T::GuessSeqFormat
> > object.
> > Something like:
> >
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GuessSeqFormat;
> >
> > $| = 1;
> > my $number_of_files = @ARGV;
> > if(!$number_of_files){print "no files entered\n";exit:}
> > foreach my $file (@ARGV){
> >   my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> >   my $seqio_object = Bio::SeqIO->new(-file => $guesser->file,
-format =>
> > $guesser->guess);
> >   my $seq_object = $seqio_object->next_seq;
> >   my $sequence = $seq_object->seq;
> >   print "$sequence\n";
> > }
> >
> > HTH,
> > Marc
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
> >> Sent: donderdag 6 maart 2008 16:24
> >> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
> >> Cc: Chris Fields
> >> Subject: Re: [Bioperl-l] SeqIO
> >>
> >> Here's the scoop:
> >> When I use Jason's suggestion, (-format => 'gcg'),
> >> My program works without complaint on the original file that looks
> > like:
> >> !!NA_SEQUENCE 1.0
> >>    NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368
..
> >>
> >>        1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
> >> et c.
> >>
> >> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
> >> (which should be retro-gcg format (before version 11?)),
> >> my program runs, but there IS a complaint:
> >> Use of uninitialized value in scalar chomp at
> >> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0>
line
> > 1.
> >> BUT
> >> If I remove (-format => 'gcg'),  I get no complaint, but the
sequence
> >> returned still has its numbers imbedded. This effects my
calculations.
> >>
> >> Thanks, at least i know what my options are.
> >>
> >>
> >>
> >> Nick Staffa
> >> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >> Scientific Computing Support Group
> >> NIEHS Information Technology Support Services Contract
> >> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >> National Institute of Environmental Health Sciences
> >> National Institutes of Health
> >> Research Triangle Park, North Carolina
> >





More information about the Bioperl-l mailing list