[Bioperl-l] Fwd: problem with bioperl (where's the Mus?)
Anand C. Patel
acpatel at usa.net
Sun Aug 23 17:17:08 UTC 2009
On Aug 23, 2009, at 9:38 AM, Hilmar Lapp wrote:
>> Common name -- still "genbank common name" in name_class in the
>> taxon_name table for "house mouse", which I think the module is
>> looking for as "common name".
>
> If you are loading the NCBI taxonomy first, this is coming from
> NCBI, not one of the scripts or BioPerl, and hence we have no
> control over it. Are you saying that there is no designated name of
> class 'common name' for Mus musculus in the NCBI taxonomy dump?
>
> Also, the common name being present or not should have no bearing on
> the lineage array, where the actual problem is, so I don't
> understand right now how this would be connected to the problem you
> are seeing.
>
>>
>> It's not behaving differently despite reloading the sequences.
>>
>> I've created a horrible munge that fixes it for cosmetic purposes:
>> my $species = $seq->species;
>> my $justspecies = $species->scientific_name();
>> my $binspecies = $species->binomial();
>>
>> my $gbstring2 = $gbstring;
>>
>> $gbstring2 =~ s/$binspecies/$justspecies/g;
>> $gbstring2 =~ s/$justspecies/$binspecies/g;
>
> I don't understand what you are trying to achieve here - it seems
> like you are making a substitution and then reverting it? Also,
> $species->scientific_name() and $species->binomial() should be
> identical for Mus musculus - are you finding different values being
> returned?
>
> So in essence, I wouldn't expect your above code snippet to have any
> effect, for both of these reasons. How do you find $gbstring2 to be
> different from $gbstring at the end of this block of code?
>
> -hilmar
I should have been clearer.
Code snippet:
my $species = $seq->species;
print "common name = ",$species->common_name, "\n";
print "scientific name = ",$species->scientific_name, "\n";
print "species = ",$species->species, "\n";
print "genus = ",$species->genus, "\n";
print "sub_species = ",$species->sub_species, "\n";
print "binomial = ",$species->binomial, "\n";
print "ncbi_taxid = ",$species->ncbi_taxid, "\n";
Output:
common name =
scientific name = musculus
species = musculus
genus = Mus
sub_species =
binomial = Mus musculus
ncbi_taxid = 10090
The common name is missing, despite having loaded it from NCBI
taxonomy using the provided script.
It is ONLY present as this "genbank common name".
So, what I get in $gbstring is:
LOCUS NM_017474 2935 bp dna linear ROD 13-
AUG-2009
DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3),
mRNA.
ACCESSION NM_017474 XM_978159
VERSION NM_017474.2 GI:255918210
KEYWORDS .
SOURCE musculus
ORGANISM musculus
Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa;
Bilateria;
Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata;
Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii;
Tetrapoda;
Amniota; Mammalia; Theria; Eutheria; Euarchontoglires;
Glires;
Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.
What I get in $gbstring2 is:
LOCUS NM_017474 2935 bp dna linear ROD 13-
AUG-2009
DEFINITION Mus musculus chloride channel calcium activated 3 (Clca3),
mRNA.
ACCESSION NM_017474 XM_978159
VERSION NM_017474.2 GI:255918210
KEYWORDS .
SOURCE Mus musculus
ORGANISM Mus musculus
Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa;
Bilateria;
Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata;
Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii;
Tetrapoda;
Amniota; Mammalia; Theria; Eutheria; Euarchontoglires;
Glires;
Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus.
Not perfect -- common name is still missing, but better.
I could go through and replace all of the instances of "genbank common
name" with "common name" and see if this fixes it.
Any other thoughts?
Thanks,
Anand
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
More information about the Bioperl-l
mailing list