[Bioperl-l] Bio::SeqIO can't guess the format of data from a pipe

Chris Fields cjfields at illinois.edu
Sun Aug 28 03:27:34 UTC 2011


There is no reason the variant couldn't also be a method; it's fairly generic to Bio::SeqIO. FASTQ just happens to be the only parser that takes advantage of it (probably b/c I added it when I refactored FASTQ :)

See the code for Bio::SeqIO::new to see what is done.  Again, like the format it only makes sense as a getter method.

chris

On Aug 28, 2011, at 4:08 AM, Florent Angly wrote:

> 
> Yes indeed, that's a very convenient way to implement a format() methods that gets the format of the file. I'll try to implement it today. More logic may be involved because of the formats that take variants, e.g. the FASTQ format (Bio::SeqIO::fastq<http://www.bioperl.org/wiki/Module:Bio::SeqIO::fastq> module) has a 'sanger', 'illumina' and 'solexa' variants.
> Florent
> 
> 
> On 27/08/11 13:43, Hilmar Lapp wrote:
>> The format is already available - it is in essence the class of the SeqIO instance:
>> 
>> my $format = ref($in);
>> 
>> Rather than passing that into SeqIO->new(), you can directly instantiate a new object from it:
>> 
>> my $out = ref($in)->new(-file =>  ...);
>> 
>> Would that address what you are trying to accomplish?
>> 
>> -hilmar
>> 
>> Sent with a tap.
>> 
>> On Aug 27, 2011, at 8:12 PM, Florent Angly<florent.angly at gmail.com>  wrote:
>> 
>>> My proposal would be to store the format of a file somewhere in the Bio::SeqIO object and create a new get/set method in Bio::SeqIO called format() to store of access its value. The idea would be that the example code above could be rewritten as:
>>> 
>>>    # Open the file and let BioPerl guess its format
>>>    my $in = Bio::SeqIO->new( -file =>  $input_seqfile );
>>> 
>>>    # Retrieve the format guessed by BioPerl
>>>    my $format = $in->format( );
>>> 
>>>    # Open the output file using the same format as the input file
>>>    my $out = Bio::SeqIO->new( -file =>  ">".$output_seqfile , format =>  $format );
>>> 
>>>    # Now do the work...
>>> 
>>> I think this is more elegant since it is more readable, requires less computation (the file format is guessed once), and is more consistent with other Bio::SeqIO methods like alphabet, that guesses the alphabet but has a get/set method to access it.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list