[Bioperl-l] Bio::SeqIO issue
Kevin Brown
Kevin.M.Brown at asu.edu
Wed Aug 5 21:45:03 UTC 2009
I'm not sure, but I think the module is fasta, not Fasta. So it should
be -format=>'fasta', unless you're on a case-insensitive system that is
forgiving the capital...
Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> Chris Fields
> Sent: Wednesday, August 05, 2009 2:38 PM
> To: Hilgert, Uwe
> Cc: BioPerl List
> Subject: Re: [Bioperl-l] Bio::SeqIO issue
>
> Uwe,
>
> Please keep replies on the list.
>
> It's very possible that's the issue; IIRC the fasta parser pulls out
> the full sequence in chunks (based on local $/ = "\n>") and
> splits the
> header off as the first line in that chunk. You could probably try
> leaving the format out and letting SeqIO guess it, or passing
> the file
> into Bio::Tools::GuessSeqFormat directly, but it's probably
> better to
> go through the files and add a file extension that
> corresponds to the
> format.
>
> chris
>
> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote:
>
> > Thanks, Chris. The files have no extension, but we indicate what
> > format
> > to use, like in the manual:
> >
> > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta');
> >
> > I wonder now whether this could exactly cause the problem: as we are
> > telling that input files are in fasta format they are being
> treated as
> > such (=remove first line) - regardless of whether they really are
> > fasta?
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > Uwe Hilgert, Ph.D.
> > Dolan DNA Learning Center
> > Cold Spring Harbor Laboratory
> >
> > C: (516) 857-1693
> > V: (516) 367-5185
> > E: hilgert at cshl.edu
> > F: (516) 367-5182
> > W: http://www.dnalc.org
> >
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, August 05, 2009 5:04 PM
> > To: Hilgert, Uwe
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Bio::SeqIO issue
> >
> > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:
> >
> >> Is my impression correct that Bio::SeqIO just assumes that
> sequences
> >> are
> >> being submitted in FASTA format?
> >
> > No. See:
> >
> > http://www.bioperl.org/wiki/HOWTO:SeqIO
> >
> > SeqIO tries to guess at the format using the file extension, and if
> > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's
> > possible that the extension is causing the problem, or that
> > GuessSeqFormat guessing wrong (it's apt to do that, as it's
> forced to
> > guessing). In any case, it's always advisable to
> explicitly indicate
> > the format when possible.
> >
> > Relevant lines:
> >
> > return 'fasta' if
> /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/
> > i;
> > ...
> > return 'raw' if /\.(txt)$/i;
> >
> >> In our experience, implementing
> >> Bio::SeqIO led to the first line of files being cut off,
> regardless
> >> of
> >> whether the files were indeed fasta files or files that only
> >> contained
> >> sequence.
> >
> > Files that only contain sequence are 'raw'. Ones in FASTA are
> > 'fasta'.
> >
> >> Which, in the latter, led to sequence submissions that had the
> >> first line of nucleotides removed. Has anyone tried to
> write a fix
> >> for
> >> this?
> >
> > This sounds like a bug, but we have very little to go on beyond your
> > description. What version of bioperl are you using, OS, etc? What
> > does your data look like? File extension?
> >
> > chris
> >
> >> Thanks,
> >>
> >> Uwe
> >>
> >>
> >>
> >>
> >>
> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >>
> >> Uwe Hilgert, Ph.D.
> >>
> >> Dolan DNA Learning Center
> >>
> >> Cold Spring Harbor Laboratory
> >>
> >>
> >>
> >> V: (516) 367-5185
> >>
> >> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
> >>
> >> F: (516) 367-5182
> >>
> >> W: http://www.dnalc.org
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list