[Bioperl-l] Fwd: problem with bioperl (where's the Mus?)

Anand C. Patel acpatel at usa.net
Sun Aug 23 17:17:08 UTC 2009


On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote:
>> Common name -- still "genbank common name" in name_class in the  
>> taxon_name table for "house mouse", which I think the module is  
>> looking for as "common name".
>
> If you are loading the NCBI taxonomy first, this is coming from  
> NCBI, not one of the scripts or BioPerl, and hence we have no  
> control over it. Are you saying that there is no designated name of  
> class 'common name' for Mus musculus in the NCBI taxonomy dump?
>
> Also, the common name being present or not should have no bearing on  
> the lineage array, where the actual problem is, so I don't  
> understand right now how this would be connected to the problem you  
> are seeing.
>
>>
>> It's not behaving differently despite reloading the sequences.
>>
>> I've created a horrible munge that fixes it for cosmetic purposes:
>> my $species = $seq->species;
>> my $justspecies = $species->scientific_name();
>> my $binspecies = $species->binomial();
>>
>> my $gbstring2 = $gbstring;
>>
>> $gbstring2 =~ s/$binspecies/$justspecies/g;
>> $gbstring2 =~ s/$justspecies/$binspecies/g;
>
> I don't understand what you are trying to achieve here - it seems  
> like you are making a substitution and then reverting it? Also,  
> $species->scientific_name() and $species->binomial() should be  
> identical for Mus musculus - are you finding different values being  
> returned?
>
> So in essence, I wouldn't expect your above code snippet to have any  
> effect, for both of these reasons. How do you find $gbstring2 to be  
> different from $gbstring at the end of this block of code?
>
> 	-hilmar

I should have been clearer.

Code snippet:
my $species = $seq->species;
print "common name = ",$species->common_name, "\n";
print "scientific name = ",$species->scientific_name, "\n";
print "species = ",$species->species, "\n";
print "genus = ",$species->genus, "\n";
print "sub_species = ",$species->sub_species, "\n";
print "binomial = ",$species->binomial, "\n";
print "ncbi_taxid = ",$species->ncbi_taxid, "\n";

Output:
common name =
scientific name = musculus
species = musculus
genus = Mus
sub_species =
binomial = Mus musculus
ncbi_taxid = 10090
The common name is missing, despite having loaded it from NCBI  
taxonomy using the provided script.
It is ONLY present as this "genbank common name".
So, what I get in $gbstring is:
LOCUS       NM_017474               2935 bp    dna     linear   ROD 13- 
AUG-2009
DEFINITION  Mus musculus chloride channel calcium activated 3 (Clca3),  
mRNA.
ACCESSION   NM_017474 XM_978159
VERSION     NM_017474.2  GI:255918210
KEYWORDS    .
SOURCE      musculus
   ORGANISM  musculus
             Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa;  
Bilateria;
             Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata;
             Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii;  
Tetrapoda;
             Amniota; Mammalia; Theria; Eutheria; Euarchontoglires;  
Glires;
             Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.
What I get in $gbstring2 is:
LOCUS       NM_017474               2935 bp    dna     linear   ROD 13- 
AUG-2009
DEFINITION  Mus musculus chloride channel calcium activated 3 (Clca3),  
mRNA.
ACCESSION   NM_017474 XM_978159
VERSION     NM_017474.2  GI:255918210
KEYWORDS    .
SOURCE      Mus musculus
   ORGANISM  Mus musculus
             Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa;  
Bilateria;
             Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata;
             Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii;  
Tetrapoda;
             Amniota; Mammalia; Theria; Eutheria; Euarchontoglires;  
Glires;
             Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.
Not perfect -- common name is still missing, but better.
I could go through and replace all of the instances of "genbank common  
name" with "common name" and see if this fixes it.
Any other thoughts?
Thanks,
Anand

> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>




More information about the Bioperl-l mailing list