[Bioperl-l] SeqIO
Staffa, Nick (NIH/NIEHS)
staffa at niehs.nih.gov
Thu Mar 6 23:27:31 UTC 2008
Thanks
I really appreciate all the interest given and help generated.
that sure sounds like a great idea, but i think
Bio::Tools::GuessSeqFormat needs more RIGOR before it declares itself.
Is there a substitute?
It works great with
>> !!NA_SEQUENCE 1.0
>> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 ..
>>
>> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
as seen in:
gir.niehs.nih.gov> CGwindows.pl TestDNA.seq.org | more
guesser guesses gcg
TGTTCGAATTCCGTGCGGTCCACCTCCCCTAGGAGCTCAGTGGGCTGGTTGGATTCCGTGCCATCCCGGCAGGGCA
GAGCCTCGGGA et c.
(yes, I added
my $file_type = $guesser->guess;
print "guesser guesses $file_type\n";
)
BUT
when applied to a genbank sequence passed thru the Seqlab editor and turned
into GCG, to wit:
!!NA_SEQUENCE 1.0
LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993
DEFINITION Human testis-specific PGK-2 gene for phosphoglycerate kinase
(ATP:3-phospho-D-glycerate 1-phosphotransferase, EC 2.7.2.3).
ACCESSION X05246 Y00261
...
...
BASE COUNT 583 a 367 c 442 g 519 t
ORIGIN
HSPGK2G Length: 1911 August 24, 1998 10:56 Type: N Check: 4156 ..
1 GCCCCTCAAC AGCAAGTTGG TTCTTCAGCA TTAAGATCCA GGTGTCAGCC
et c.
It thinks it is a flawed PIR:
gir.niehs.nih.gov> CGwindows.pl hspgk2g.seq | more
guesser guesses pir
------------- EXCEPTION -------------
MSG: PIR stream read attempted without leading '>P1;' [ !!NA_SEQUENCE 1.0
LOCUS HSPGK2G 1911 bp DNA PRI 12-SEP-1993
Must look at why guesser is thinking PIR.
On 3/6/08 11:22 AM, "Marc Logghe" <Marc.Logghe at ablynx.com> wrote:
> Hi Nick,
> I don't think you should leave out the -format option. You have to leave
> it in but the format should be provided by the B::T::GuessSeqFormat
> object.
> Something like:
>
> #!/usr/bin/perl
> use strict;
> use Bio::SeqIO;
> use Bio::Tools::GuessSeqFormat;
>
> $| = 1;
> my $number_of_files = @ARGV;
> if(!$number_of_files){print "no files entered\n";exit:}
> foreach my $file (@ARGV){
> my $guesser = Bio::Tools::GuessSeqFormat->new(-file => $file);
> my $seqio_object = Bio::SeqIO->new(-file => $guesser->file, -format =>
> $guesser->guess);
> my $seq_object = $seqio_object->next_seq;
> my $sequence = $seq_object->seq;
> print "$sequence\n";
> }
>
> HTH,
> Marc
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Staffa, Nick (NIH/NIEHS)
>> Sent: donderdag 6 maart 2008 16:24
>> To: Heikki Lehvaslaiho; bioperl-l at lists.open-bio.org
>> Cc: Chris Fields
>> Subject: Re: [Bioperl-l] SeqIO
>>
>> Here's the scoop:
>> When I use Jason's suggestion, (-format => 'gcg'),
>> My program works without complaint on the original file that looks
> like:
>> !!NA_SEQUENCE 1.0
>> NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 ..
>>
>> 1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
>> et c.
>>
>> BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
>> (which should be retro-gcg format (before version 11?)),
>> my program runs, but there IS a complaint:
>> Use of uninitialized value in scalar chomp at
>> /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line
> 1.
>> BUT
>> If I remove (-format => 'gcg'), I get no complaint, but the sequence
>> returned still has its numbers imbedded. This effects my calculations.
>>
>> Thanks, at least i know what my options are.
>>
>>
>>
>> Nick Staffa
>> Telephone: 919-316-4569 (NIEHS: 6-4569)
>> Scientific Computing Support Group
>> NIEHS Information Technology Support Services Contract
>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>> National Institute of Environmental Health Sciences
>> National Institutes of Health
>> Research Triangle Park, North Carolina
>
More information about the Bioperl-l
mailing list