[Bioperl-l] Porting Entrez Gene parser to Biojava, Biopython, Biophp, even C++

Mingyi Liu mingyi.liu at gpc-biotech.com
Sun Mar 13 22:26:35 EST 2005


Mingyi Liu wrote

> This has nothing to do with ASN. It is all about how uniform the data 
> structure could be.  In fact, consider when NCBI decides to do
> {
>  tag id 12345,
>  tag str "whatever"
> }

oops, I really meant:
{
  tag id 12345,
  tag str "whatever",
  tag id 34567
}

I switched to str just as example but forgot that this renders my 
example incorrect.  So now the structure has to become:

      'tag' => [
           {
             'id' => '12345',
             'str' => 'whatever'
           }
           {
             'id' => 34567
           }
         ]
or one that makes more sense
      'tag' => [
           {
             'id' => '12345'
           }
           {
             'str' => 'whatever'
           }
           {
             'id' => 34567
           }
         ]
which is my approach.  Again your approach would demand users to test 
reference before dealing with content, and users have to design two ways 
of dealing with the content.  While in my approach users always deal 
with it as array, just one design and no reference testing needed.  If 
you read my comment for the data structure trimming function, you'll see 
some more consideration in this aspect.  It's still not perfect, I hope 
that's not too surprising and not becoming a reason to dispatch my 
parser altogether. ;-)        

Regards,

Mingyi



More information about the Bioperl-l mailing list