[Bioperl-l] Enquiry on gi_taxid_nucl.dmp.gz

Roy Chaudhuri roy.chaudhuri at gmail.com
Wed Aug 25 07:12:15 EDT 2010


 > Also it would be safer for the split to be whitespace matching and that
> you want the the two first columns from the file.  Doing this would
> eliminate the need for the chomp on the line above.
>
>    my ($gi, $taxid) = split(/\s+/, $_);
>
> instead of
>
>    chomp;
>    my ($gi, $taxid) = split(" ", $_,2);

Sorry to be pedantic, but according to perldoc -f split: "As a special 
case, specifying a PATTERN of space (' ') will split on white space just 
as "split" with no arguments does"

The only difference between patterns of " " and /\s+/ is that the latter 
will return an initial null field if there is leading white space, which 
may or may not be what you want.

$ perl -e 'print join("-", split(" ", " 1\t2  3")), "\n"'
1-2-3
$ perl -e 'print join("-", split(/\s+/, " 1\t2  3")), "\n"'
-1-2-3

Cheers.
Roy.


More information about the Bioperl-l mailing list