[Bioperl-l] SeqIO

Staffa, Nick (NIH/NIEHS) staffa at niehs.nih.gov
Thu Mar 6 10:23:34 EST 2008


Here's the scoop:
When I use Jason's suggestion, (-format => 'gcg'),
My program works without complaint on the original file that looks like:
!!NA_SEQUENCE 1.0
   NewDNA  Length: 810  March 5, 2008 18:26  Type: N  Check: 3368  ..

       1  TGTTCGAATT CCGTGCGGTC CACCTCCCCT AGGAGCTCAG TGGGCTGGTT
et c.

BUT if I remove the first line to test Bio::Tools::GuessSeqFormat,
(which should be retro-gcg format (before version 11?)),
my program runs, but there IS a complaint:
Use of uninitialized value in scalar chomp at
/usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/gcg.pm line 118, <GEN0> line 1.
BUT 
If I remove (-format => 'gcg'),  I get no complaint, but the sequence
returned still has its numbers imbedded. This effects my calculations.

Thanks, at least i know what my options are.


 
Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina










On 3/6/08 7:20 AM, "Heikki Lehvaslaiho" <heikki at sanbi.ac.za> wrote:

> 
> Nick,
> 
> This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:
> 
> /Length: .*Type: .*Check: .*\.\.$/
> 
> It is the second  line in GCG file. If first line matches to some other format
> regex, this will not not be evaluated.
> 
> Let us know,
> 
> -Heikki
> 
> On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
>> Verily,
>> One interpretation of the docs might be: will read any format if the format
>> is specified.
>> I was hoping that I could write a program that one needn't specify format.
>> It'd be more user-friendly and useful.
>> 
>> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
>>> probably you should try specifying the format explicitly first- as in
>>> (-format => 'gcg')
>>> 
>>> -j
>>> 
>>> On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
>>>> I thought GCG format changed somewhere along the way but I maybe
>>>> I'm wrong?  Regardless, you'll have to post this as a bug (along
>>>> with an example file).
>>>> 
>>>> Also, kind of odd that the sequence data wasn't checked...
>>>> 
>>>> chris
>>>> 
>>>> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
>>>>> So the Howto says that Bio::SeqIO will read almost any known format
>>>>> including GCG.
>>>>> So I create a GCG file with Seqlab and try to printout its
>>>>> sequence as a
>>>>> string. ( I did guess at the way to get the sequence string:
>>>>> 
>>>>> #!/usr/bin/perl -w
>>>>> use strict;
>>>>> $| = 1;
>>>>> use Bio::SeqIO;
>>>>> my $number_of_files = @ARGV;
>>>>> if(!$number_of_files){print "no files entered\n";exit:}
>>>>> foreach my $file (@ARGV){
>>>>> my $seqio_object = Bio::SeqIO->new(-file => $file);
>>>>> my $seq_object = $seqio_object->next_seq;
>>>>> my $sequence = $seq_object->seq;
>>>>> print "$sequence\n";
>>>>> my $status = &windowscore($sequence);
>>>>> }
>>>>> 
>>>>> But what it returned was the entire contents of the file with no
>>>>> format
>>>>> decoding. Have I been deluded?
>>>>> 
>>>>> NewDNALength:810March5,200818:26Type:NCheck:
>>>>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
>>>>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
>>>>> CGAAGGT
>>>>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
>>>>> GGCTGCT
>>>>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
>>>>> GCAGAGC
>>>>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
>>>>> GCCAGCG
>>>>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
>>>>> TCCCCTG
>>>>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
>>>>> 51GGCAG
>>>>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
>>>>> AGACATC
>>>>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
>>>>> CCGCCC6
>>>>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
>>>>> TCATGCG
>>>>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
>>>>> CAGCCGC
>>>>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
>>>>> GGG
>>>>> 
>>>>> 
>>>>> 
>>>>> Nick Staffa
>>>>> Telephone: 919-316-4569  (NIEHS: 6-4569)
>>>>> Scientific Computing Support Group
>>>>> NIEHS Information Technology Support Services Contract
>>>>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
>>>>> National Institute of Environmental Health Sciences
>>>>> National Institutes of Health
>>>>> Research Triangle Park, North Carolina
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 



More information about the Bioperl-l mailing list