[Bioperl-l] Bio::SeqIO::new possible wierdness

Brian Osborne brian_osborne at cognia.com
Wed Jan 28 22:02:00 EST 2004


Jason,

I'm a bit suspicious of search.open-bio.org. I enter a term like 'Root' or
'GFF' and get back a dozen hits or so. It's inconceivable to me that there's
only 12 messages in bioperl-l since 1999 containing the string 'GFF'.
Something's wrong, either with the search or the display. And if there are
no matches I see only a blank page, which is a bit inscrutable. Then if I
select 'no restriction', which I guess means everything in the selectable
list I don't see the Bioperl matches anymore, I just see a dozen or so
Biojava matches.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
Sent: Wednesday, January 28, 2004 4:34 PM
To: Peter van Heusden
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] Bio::SeqIO::new possible wierdness

The bioperl list is searchable - just not the bioperl-guts though -
http://search.open-bio.org
and/or google works fine for me


This is the change Lincoln made though (cvs log on Bio/Root/IO.pm
and found the last commit by lincoln).  I had put the \*ARGV in there so
that we could use the magic <> operator (allows STDIN or a list of files
to all be used as transparent input).  This caused some problems with
tests in GFF, SeqFeature, or Registry.

Here is his log message
revision 1.50
date: 2003/11/21 03:03:38;  author: lstein;  state: Exp;  lines: +2 -2
The following regression tests now pass: GFF, SeqFeature, Registry

--jason

jason at jason $ cvs diff -r 1.49 Bio/Root/IO.pm
Index: Bio/Root/IO.pm
===================================================================
RCS file: /home/repository/bioperl/bioperl-live/Bio/Root/IO.pm,v
retrieving revision 1.49
diff -r1.49 IO.pm
1c1
< # $Id: IO.pm,v 1.49 2003/10/28 21:58:54 jason Exp $
---
> # $Id: IO.pm,v 1.50 2003/11/21 03:03:38 lstein Exp $
435c435
<     my $fh = $self->_fh || \*ARGV;
---
>     my $fh = $self->_fh or return;


On Wed, 28 Jan 2004, Peter van Heusden wrote:

> Jason Stajich wrote:
>
> >On Wed, 28 Jan 2004, Donald G. Jackson wrote:
> >
> >
> >
> >>Personally, I like the fall-back but agree that $ARGV[0] shouldn't be
it.
> >>I'd suggest STDIN - if somebody calls new without a file/handle I think
> >>they're more likely to be reading.  OTOH, guessing format woud be tough.
> >>
> >>
> >
> >the guess format is trying to read off the top of the file I think - we
> >support a 'peek' type of reading into the file, by having the _pushback
> >functionality in Root::IO.
> >
> >I would like to see something like this go into Root:IO rather than in
> >SeqIO - and have Root::IO give back a filename if it knows what it is.
> >
> >Also the Root::IO code could also do something like this:
> > $file = "-" unless defined $file;
> > open my $fh => $input or die $!;
> >
> >Which will then read from stdin if now filename is sent in - right now we
> >don't really support that anymore because it was causing clog-ups in some
> >of the DB::GFF code/tests I think.
> >
> >Maybe we localize this to 'FormattedReaderWriters' -- all the
> >XXXIO(-format => 'XXX') modules so as to avoid the problems Lincoln saw.
> >
> >
> >
> >
> Can you to where Lincoln "saw" this problem? The BioPerl mailing list
> archive is not searchable, and searching via Google doesn't turn
> anything up.
>
> Anyway, I'll look into Root::IO tomorrow and see what I come up with.
>
> Peter
>
> >
> >
> >>At the very least a warning would be appropriate, perhaps indicating the
> >>course of action.
> >>
> >>For xml handlers we can check the dtd and throw an error.  I will modify
> >>my SeqIO::tinyseq::tinyseqHandler to do so.
> >>
> >>Don Jackson
> >>
> >>
> >>
> >>Peter van Heusden wrote:
> >>
> >>
> >>
> >>>My review of the Bio::SeqIO::new method shows the following behaviour:
> >>>
> >>>Missing both ?file and ?fh arguments: falls back to using $ARGV[0]
> >>>(the first command line argument) as sequence filename. If this fails,
> >>>gives an exception about ?Unknown format?.
> >>>-file argument (without ?fh argument):
> >>>? given, but file unreadable: throws exception
> >>>? undefined: reads $ARGV[0], as above.
> >>>-fh argument (without ?file argument):
> >>>? given, but not a filehandle: gives exception
> >>>? given, but an invalid filehandle (not open): gives exception
> >>>? undefined: reads $ARGV[0], as above.
> >>>-format argument: if the sequence file doesn?t correspond to the given
> >>>format, some parsers give an error (e.g. EMBL), while others do not
> >>>(GenBank), instead silently give wrong results.
> >>>-format argument without ?file argument: Silently creates a SeqIO
> >>>object which writes to STDOUT.
> >>>
> >>>I don't think that this $ARGV[0] shortcut should be in there - it
> >>>causes unnecessary potential confusion. Imagine a situation where -fh
> >>>or -file is specified (using a variable), but that variable somehow
> >>>does not get defined. In that case, the $ARGV[0] fallback behaviour
> >>>would be used, which might lead to a non-obvious error behaviour.
> >>>
> >>>I'd like to propose that either -file or -fh should be specified,
> >>>otherwise an exception is thrown. While I'm about it, I'm thinking of
> >>>migrating the exceptions to the new 'typed exceptions' that BioPerl
> >>>now provides - is there any consensus on exception type names?
> >>>
> >>>Peter
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at portal.open-bio.org
> >>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >>
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list