[Bioperl-l] First release of Bio::Community

Francisco J. Ossandón fossandonc at hotmail.com
Fri Nov 29 22:48:33 UTC 2013


@Chris
Thanks for the links!

@Florent
I use the BioPerl and Bio-Root versions directly from Git and pull the
changes every time I see a new commit, so all my tests are always done in
the latest development code. ;)

Oh, by the way, your module requires Bio::Root::Version '1.006922', but on
Git repository Bio::Root::Version is still '1.006902'
(https://github.com/bioperl/Bio-Root/blob/master/lib/Bio/Root/Version.pm)
and bioperl Build.PL is also '1.006902'
(https://github.com/bioperl/bioperl-live/blob/master/Build.PL)... Shouldn't
that be updated to the new number??

I have PerlIO::eol installed, so I know for certain that it doesn’t help for
this particular line-endings problem. Both methods fails. In fact I added a
comment about that above the "if( !$HAS_EOL && !$param{-raw} && (defined
$line) ) {", and I had to adapt some of your tests for them to pass in
Windows
(https://github.com/bioperl/Bio-Root/commit/7295aea2a66a4aaa85a087d30eb53c0d
00dccc2f):
+    # In Windows the "-raw" parameter has no effect, because Perl already
discards
+    # the '\r' from the line when reading in text mode from the filehandle
+    # ($line = <$fh>), and put it back automatically when printing
+    if ($^O =~ m/mswin/i) {
+        is $win_rio->_readline( -raw => 1) , "VERSION     U71225.1
GI:2804359\n";
+    }
+    else {
+        is $win_rio->_readline( -raw => 1) , "VERSION     U71225.1
GI:2804359\r\n";
+    }

After lots of testing with bioperl windows-related fails, I found this issue
is not the fault of any particular module, since is related on how the Perl
core is IMPLEMENTED on Windows. I think that this is not explained on the
official documentation, but now I'm sure on what's the problem. Please
EVERYONE, if you are going to use the seek/read functions this is something
you have to take in consideration for your code to work in Windows...

You know that Windows use 2 characters '\r\n' as line ending in its file, so
if you use Linux-Perl to read a line you will see both characters. Well,
when Windows-Perl reads from a file in TEXT MODE (the default), it
automatically DISCARDS the '\r' character from the line and only give you
back the string with the '\n' character, so even when the '\r' is present in
the file (taking up space and affecting the byte positions), you will never
see it in the variable. This also means that "$line =~ s#\r\n#\n#;" only
occurs in Linux because the variable in Windows never see the '\r' in the
first place. Then when you print normally "text\n" into the file,
Windows-Perl automatically CONVERTS the '\n' into '\r\n' at the moment of
writing in the file...

This behavior have lots of sense because normally you don’t have to bother
to add or remove '\r' when printing or reading specifically in Windows, and
makes '\n' enough to work the same way in any operating system. BUT, it
affects SEEK/READ operations that rely on BYTE LENGTH because '\r' is a
ghost character that doesn’t show up in the variable but take space in the
file. So when you ask for the length of the line from a $variable that
contains "text\n" Perl will tell you that is 5 because it already discarded
the '\r', but in the file the byte length is really 6 " text\r\n". This will
make all your indexes to be off by 1 character multiplied by the number of
lines involved, failing to properly read the data.

You have 3 options. The first is to always set the filehandle of your files
to BINMODE, so Windows-Perl actually show you the line content byte by byte
AS IT IS in the file, meaning that you will NOW SEE '\r\n', but that would
mean to manually add '\r\n' when printing and replace chomp for "s#\n##g"
and "s#\r##g" everywhere (chomp leaves '\r' in binmode), making code
maintenance probably more difficult and bug-prone. The second option is what
I implemented in BioPerl
(https://github.com/bioperl/bioperl-live/commit/e3154535233e0fc1647a3793e158
82d14c62c42c), and what I plan to add to Bio-Community; that is to detect
this issue before parsing the file to add an offset correction to avoid
changing everything else:
my $fh         = $self->_fh;
my $init_pos   = tell($fh);
my $curr_line  = <$fh>;
my $pos_diff   = tell($fh) - $init_pos;
my $correction = $pos_diff - length $curr_line; 
seek $fh, $init_pos, 0; # Rewind position to proceed to read the file

The idea is to save the cursor current byte position ($init_pos), then read
a line and ask for the new byte position ($pos_diff), which will give you
the real byte length of the line in the file. Then the real length can be
compared with the reported length() and see if there is a difference. For
Linux $correction will be 0 and indexes will be unchanged, but for Windows
$correction will be 1 and the indexes will be adjusted to account for the
ghost '\r' and that way the data will be read correctly. I like this
approach because is not really OS-fixed like an "if ($^O =~ m/mswin/i)",
since it checks the issue and acts proportionally to the results; an
IF-Windows giving a fixed offset would also not be robust, because it would
crash if Windows-Perl reads a file with Linux-endings.

The third option would be to never use length($line) in the first place and
instead use tell($fh) 2 times (before and after reading the line) to compare
and know directly the line byte length.

It took me a while to find the exact place to implement this fix, but I
finally made this change in my local copy of Bio-Community and that allowed
many tests to pass, but I have not committed it yet because there still are
messages with "MSG: Error: Got 4 columns at line 3 but got a different
number (1) at the previous line ", so I want to try to fix those before
pushing changes. As soon as I can make it pass everything in Windows and
Linux, I will push it to the repo.

Please, think of this issue when you write seek & read in your future codes.
=)

Cheers,

Francisco J. Ossandón


-----Mensaje original-----
De: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Florent Angly
Enviado el: jueves, 28 de noviembre de 2013 20:30
Para: Francisco J. Ossandón
CC: Fields, Christopher J; BioPerl List
Asunto: Re: [Bioperl-l] First release of Bio::Community

Francisco >
At the moment, Bio::Community has to use the last released version of
BioPerl (1.6.922) so that people can install it easily through CPAN.

When you use the development version of BioPerl, do you still have the
issue?
I should point out that the module Bio::Root::IO, which handles processing
line endings, uses either in-house code or a third-party module
(PerlIO::eol, http://search.cpan.org/dist/PerlIO-eol/eol.pm).
See the clarification I made at:
https://github.com/bioperl/Bio-Root/commit/6f570df89c7202383b098a5774accddec
2453508
It would be good to know whether the two methods fail or only one of them on
Windows.
Cheers,

Florent



On Fri, Nov 29, 2013 at 7:48 AM, Francisco J. Ossandón
<fossandonc at hotmail.com> wrote:
> Congratulation for your module Florent!
> I have been testing the code in Windows and there are several tests
failing.
> Several fails are caused by same 'seek/read' problem that I found in 
> BioPerl, that is caused because Windows Perl hides the '\r' character 
> when reading files, so its absent from the variable when using m// and 
> length() but is present on the file. I still need to debug more.
>
> By the way, what's the difference between 'sub', 'func' and 'method'??
>
> Cheers,
>
> Francisco J. Ossandon
>
> -----Mensaje original-----
> De: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Fields, 
> Christopher J Enviado el: jueves, 28 de noviembre de 2013 1:05
> Para: Florent Angly
> CC: BioPerl List
> Asunto: Re: [Bioperl-l] First release of Bio::Community
>
> Done!
>
> http://news.open-bio.org/news/2013/11/initial-release-of-bioperl-bioco
> mmunit
> y-distribution/
> https://twitter.com/obf_news/status/405909716867366912
>
> chris
>
> On Nov 25, 2013, at 6:34 AM, Florent Angly <florent.angly at gmail.com>
wrote:
>
>> Thank Hilmar and Chris. Yes, I don't mind the announcement being 
>> poster on these websites.
>> Best,
>> Florent
>>
>> On Sat, Nov 23, 2013 at 5:07 AM, Fields, Christopher J 
>> <cjfields at illinois.edu> wrote:
>>> Very good point!  Florent, let me know if you want to add the post 
>>> there,
> I can probably get you set up (or post it for you directly if you want).
>>>
>>> chris
>>>
>>> On Nov 22, 2013, at 9:07 AM, Hilmar Lapp <hlapp at drycafe.net> wrote:
>>>
>>>> Awesome! Florent, would you mind this going up as a blog post on 
>>>> Open
> Bio? Also, have you thought about posting this to Ecolog?
>>>>
>>>>      -hilmar
>>>>
>>>> On Nov 21, 2013, at 11:07 PM, Florent Angly wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> Some time ago, I announced that I was working on a set of BioPerl 
>>>>> modules collectively forming the Bio-Community distribution. These 
>>>>> Moose-based modules provide objects to represent communities, 
>>>>> metacommunities and their members, and they also provide many 
>>>>> methods to interact with them, perform various ecological 
>>>>> operations
> (e.g.
>>>>> rarefaction, taxonomic summary, subsampling), and to read/write 
>>>>> them to file in multiple formats.
>>>>>
>>>>> Today, I am happy to announce the release of the first version of 
>>>>> these modules, which can be obtained from CPAN:
>>>>> http://search.cpan.org/search?query=Bio-Community&mode=dist
>>>>>
>>>>> Obviously, this is just the beginning for these modules and I hope 
>>>>> that interested developpers will join me to expand them.
>>>>>
>>>>> Best,
>>>>>
>>>>> Florent
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> ===========================================================
>>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list