[Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe

Florent Angly florent.angly at gmail.com
Sun Aug 28 22:35:36 UTC 2011


Hi,

I implemented the format() getter method in Bio::SeqIO as discussed, 
essentially following the way proposed by Hilmar. The variant() method 
is not needed since Bio::SeqIO::fastq already has a get/set method for that.

I noticed that there are plenty more Bio*IO modules that could benefit 
from having a format() method, e.g.:
     Bio::AlignIO
     Bio::ClusterIO
     Bio::FeatureIO
     Bio::MapIO
     Bio::OntologyIO
     Bio::SearchIO
     Bio::TreeIO
     Bio::Assembly::IO *
The code could be copy-pasted for each of them but it is not very 
graceful. Is there a way we could have all these IO modules share the 
same format() method?

* Note how the IO class for Bio::Assembly is called Bio::Assembly::IO, 
and not Bio::AssemblyIO like for other classes. This may be something to 
change in the future for consistency.

Florent


On 28/08/11 13:27, Chris Fields wrote:
> There is no reason the variant couldn't also be a method; it's fairly generic to Bio::SeqIO. FASTQ just happens to be the only parser that takes advantage of it (probably b/c I added it when I refactored FASTQ :)
>
> See the code for Bio::SeqIO::new to see what is done.  Again, like the format it only makes sense as a getter method.
>
> chris
>
> On Aug 28, 2011, at 4:08 AM, Florent Angly wrote:
>
>> Yes indeed, that's a very convenient way to implement a format() methods that gets the format of the file. I'll try to implement it today. More logic may be involved because of the formats that take variants, e.g. the FASTQ format (Bio::SeqIO::fastq<http://www.bioperl.org/wiki/Module:Bio::SeqIO::fastq>  module) has a 'sanger', 'illumina' and 'solexa' variants.
>> Florent
>>
>>
>> On 27/08/11 13:43, Hilmar Lapp wrote:
>>> The format is already available - it is in essence the class of the SeqIO instance:
>>>
>>> my $format = ref($in);
>>>
>>> Rather than passing that into SeqIO->new(), you can directly instantiate a new object from it:
>>>
>>> my $out = ref($in)->new(-file =>   ...);
>>>
>>> Would that address what you are trying to accomplish?
>>>
>>> -hilmar
>>>
>>> Sent with a tap.
>>>
>>> On Aug 27, 2011, at 8:12 PM, Florent Angly<florent.angly at gmail.com>   wrote:
>>>
>>>> My proposal would be to store the format of a file somewhere in the Bio::SeqIO object and create a new get/set method in Bio::SeqIO called format() to store of access its value. The idea would be that the example code above could be rewritten as:
>>>>
>>>>     # Open the file and let BioPerl guess its format
>>>>     my $in = Bio::SeqIO->new( -file =>   $input_seqfile );
>>>>
>>>>     # Retrieve the format guessed by BioPerl
>>>>     my $format = $in->format( );
>>>>
>>>>     # Open the output file using the same format as the input file
>>>>     my $out = Bio::SeqIO->new( -file =>   ">".$output_seqfile , format =>   $format );
>>>>
>>>>     # Now do the work...
>>>>
>>>> I think this is more elegant since it is more readable, requires less computation (the file format is guessed once), and is more consistent with other Bio::SeqIO methods like alphabet, that guesses the alphabet but has a get/set method to access it.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list