[Bioperl-l] SeqIO

Heikki Lehvaslaiho heikki at sanbi.ac.za
Thu Mar 6 12:20:03 UTC 2008


Nick,

This is the regex that Bio::Tools::GuessSeqFormat uses to identify a gcg file:

/Length: .*Type: .*Check: .*\.\.$/

It is the second  line in GCG file. If first line matches to some other format 
regex, this will not not be evaluated.

Let us know,

	-Heikki

On Thursday 06 March 2008 05:09:11 Staffa, Nick (NIH/NIEHS) wrote:
> Verily,
> One interpretation of the docs might be: will read any format if the format
> is specified.
> I was hoping that I could write a program that one needn't specify format.
> It'd be more user-friendly and useful.
>
> On 3/5/08 9:33 PM, "Jason Stajich" <jason at bioperl.org> wrote:
> > probably you should try specifying the format explicitly first- as in
> > (-format => 'gcg')
> >
> > -j
> >
> > On Mar 5, 2008, at 6:22 PM, Chris Fields wrote:
> >> I thought GCG format changed somewhere along the way but I maybe
> >> I'm wrong?  Regardless, you'll have to post this as a bug (along
> >> with an example file).
> >>
> >> Also, kind of odd that the sequence data wasn't checked...
> >>
> >> chris
> >>
> >> On Mar 5, 2008, at 5:43 PM, Staffa, Nick (NIH/NIEHS) wrote:
> >>> So the Howto says that Bio::SeqIO will read almost any known format
> >>> including GCG.
> >>> So I create a GCG file with Seqlab and try to printout its
> >>> sequence as a
> >>> string. ( I did guess at the way to get the sequence string:
> >>>
> >>> #!/usr/bin/perl -w
> >>> use strict;
> >>> $| = 1;
> >>> use Bio::SeqIO;
> >>> my $number_of_files = @ARGV;
> >>> if(!$number_of_files){print "no files entered\n";exit:}
> >>> foreach my $file (@ARGV){
> >>> my $seqio_object = Bio::SeqIO->new(-file => $file);
> >>> my $seq_object = $seqio_object->next_seq;
> >>> my $sequence = $seq_object->seq;
> >>> print "$sequence\n";
> >>> my $status = &windowscore($sequence);
> >>> }
> >>>
> >>> But what it returned was the entire contents of the file with no
> >>> format
> >>> decoding. Have I been deluded?
> >>>
> >>> NewDNALength:810March5,200818:26Type:NCheck:
> >>> 3368..1TGTTCGAATTCCGTGCGGTCCACCT
> >>> CCCCTAGGAGCTCAGTGGGCTGGTT51GGATTCCGTGCCATCCCGGCAGGGCAGAGCCTCGGGAGGGGG
> >>> CGAAGGT
> >>> T101GCCCGGGGCCGTGCGCTGGGTGCTGCTGCTGCGGTGGCGGCGGCGGTGCC151TGCGGTTGCAGC
> >>> GGCTGCT
> >>> GGGGTTGCGCGTGGAAACCGCGCCCCGCACT201TGCGGCGGGCGAGCCCATCGCGCCGTAGTACAGGT
> >>> GCAGAGC
> >>> GCTGGGGG251GCGCCAGGATCCCCGGCATCGCAGGGCCCGAGGGGTCCGGCCCCACTCGC301ATGGG
> >>> GCCAGCG
> >>> GGCGGCTCTACGGACACTGCATAGTCCGAGACTGGAGC351GTAAGTGTAGGTGCCGGCCGCCGGGCAG
> >>> TCCCCTG
> >>> GCAGCGGGGCTGCAA401AGAAAGCCGGGTCCTGCTCCACGCCATCCAGCGGGGATGTGTCCGGAGTG4
> >>> 51GGCAG
> >>> AGGGTAGCCGTCGAGCGCGGGAGCGCCCAGTCCCTGGCAGTCCCG501ATAGTGGGGGCCCATGTGCGG
> >>> AGACATC
> >>> AGCGGAGGACCGGCCGGATAGC551CCGGCTCCGGGAAAGGCAGACCCAGGCCATCCATGGCCACGCGG
> >>> CCGCCC6
> >>> 01TCGGGACCAAGCGCGCCGGCCTGGGGCTCGACGAGAGCGTGCAGGAAGCC651TCCCTCCACCCGCT
> >>> TCATGCG
> >>> CTTCACCTGCTTGCGCCGCCGCGGCCGGT701ACTTGTAGTTGGGGTGGTCCTGCATATGCTGCACGCG
> >>> CAGCCGC
> >>> TCGGCC751TCTTCCACGAAGGGCCGCTTCTCTGCCAAGGTCAACGCCTTCCAAGACTT801GCCTGCA
> >>> GGG
> >>>
> >>>
> >>>
> >>> Nick Staffa
> >>> Telephone: 919-316-4569  (NIEHS: 6-4569)
> >>> Scientific Computing Support Group
> >>> NIEHS Information Technology Support Services Contract
> >>> (Science Task Monitor: Roy W. Reter (reter at niehs.nih.gov)
> >>> National Institute of Environmental Health Sciences
> >>> National Institutes of Health
> >>> Research Triangle Park, North Carolina
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



More information about the Bioperl-l mailing list