[BioRuby] EMBL / ENA parser error
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Wed Dec 7 16:32:49 UTC 2011
Hi Michael,
We must first read official documents provided by EMBL-EBI.
EMBL User Manual:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
In "3.4.7 The OS Line", two examples that don't start with
an uppercase letter are shown.
> OS unidentified bacterium B8
> OS uncultured proteobacterium
Therefore, the issue should be treated as a bug of BioRuby.
The regexp and/or the logic for OS lines will be changed.
Thank you for reporting the issue,
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
On Thu, 01 Dec 2011 16:30:22 +0000
Michael Paulini <mh6 at sanger.ac.uk> wrote:
> Hi fellow biorubysts,
>
> I tried to parse EMBL/ENA entry DQ471885 with the bioruby EMBL parser,
> and it dies when it tries to parse:
> OS uncultured nematode
>
> due to the regexp in embl/common.rb being:
> ==================================
> if tmp =~ /([A-Z][a-z]* *[\w\d \:\'\+\-]+[\w\d])/
> org = $1
> tmp =~ /(\(.+\))/
> os.push({'name' => $1, 'os' => org})
> else
> raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n"
> end
> ================================
> as it doesn't start with an uppercase letter.
>
> Shouud we change the regexp, or file a bug with ENA?
>
> thanks,
>
> Michael
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose registered
> office is 215 Euston Road, London, NW1 2BE.
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
More information about the BioRuby
mailing list