[BioPython] general fix for RX line

Jeffrey Chang jchang@SMI.Stanford.EDU
Sun, 9 Apr 2000 15:24:43 -0700 (PDT)


Do you have a copy of the entry you are using?  In Swiss-Prot Release 38,
accession P42288 is in entry DADR_DIDMA, and does not have an RX line that
breaks the parser.

My philosophy about parsers is that they should stick to the defined
format as closely as reasonably possible.  Having a very specific parser
will lower the chances of passing along bad data, and prevent having to
write more checks in client code.

However, since no format is perfect, occasionally exceptions do pop up
that will require loosening that definition.  In that case, I
conservatively create exceptions to let through data that should pass.

For the Swiss-Prot RX line, the user manual defines the syntax as:
RX   BIBLIOGRAPHIC_DATABASE_NAME; IDENTIFIER.

This seems to work for every entry in release 37 and 38, but broke for one
entry, CLD1_HUMAN, in Release 39.  Thus, I made an exception for this one
case.  However, if the format has changed in release 39, either documented
or in practice, then perhaps it is time to loosen it up more generally.  
Please let me know.

Thanks,
Jeff




On Sun, 9 Apr 2000, Cayte wrote:

>   On accession p42288, the SwissProt parser gave the error:
> 
>   assert len(cols) == 3, "I don't understand RX line %s" \
> sertionError: I don't understand RX line RX   MEDLINE; 94067048. [NCBI, ExPASy
> Israel, Japan]
> 
> 
>   When I looked at the code, it had a patch for a particular case (  MEDLINE; 99132301 ).  We may need a more general fix.
> 
>                                                                            Cayte
>