[Bioperl-l] SeqIO
Staffa, Nick (NIH/NIEHS)
staffa at niehs.nih.gov
Thu Mar 6 15:23:34 UTC 2008
Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
NewDNA Length: 810 March 5, 2008 18:26 Type: N Check: 3368 ..
1 TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.
BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT
If I remove (-format => 'gcg'), I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.
Thanks, at least i know what my options are.
Nick Staffa
Telephone: 919-316-4569 (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina
On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:
>
> Nick,
>
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
>
> /Length: .*Type: .*Check: .*\.\.$/
>
> It is the second line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
>
> Let us know,
>
> -Heikki
>
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>>
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>>
>>> -j
>>>
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong? Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>>
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>>
>>>> chris
>>>>
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>>
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>>
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>>
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>>
>>>>>
>>>>>
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569 (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list