[Bioperl-l] how to find the gene's name ?

Heikki Lehvaslaiho heikki at ebi.ac.uk
Wed Mar 31 06:42:26 EST 2004


Laure,

If you are dealing with Swiss-prot entries, gene name(s)  is accessed by:

    for ($seq->annotation()->get_Annotations('gene_name')) {
        print $_->value;
    }

See Bio::AnnotationI for more information about annotations. (Gene name is a 
standard feature a swiss-prot entry, but occurs only occasionally in 
nucleotide sequence entries as part of the feature table, so it makes sense 
to keep it in generic storage space offered by annotations.)



Molecular weight is not stored in the object, but recalculated from the 
sequence using the following code:

  my $mw = ${Bio::Tools::SeqStats->get_mol_wt($seq->primary_seq)}[0];

It might actually be wise to save the molecule weight in an annotation object 
and retrieve it there and calculate it only it is not present. That should 
speed up the processing.

The keywords are easily accessed (maybe you example sequence does not have 
keywords?):

  my $kw_string = $seq->keywords;


Lastly, you miss "Org Crossref (RX)" line. I am not sure what you mean by it.
RX lines are for reference crosrefs which are an other type of annotation 
objects:

    foreach my $ref ( $_->annotation->get_Annotations('reference') ) {
        print "MEDLINE=". $ref->medline. "\n" if $ref->medline;
    }

If, on the other hand, you are after the taxid like:

OX   NCBI_TaxID=9606;

This code will print it out:

    print "taxid= ", $seq->species->ncbi_taxid, "\n";


The best way to find out how swiss-prot fields are treated, look into the  
Bio::SeqIO::swiss module.

Yours,

	-Heikki

On Wednesday 31 Mar 2004 07:20, you wrote:
> Hello, it's me again, sorry to annoy you, but I'm blocked :(
>
> So as I said, I try to extract the gene's name. I need to add some
> precisions. First, I work on Swissprot Database, I've downloaded it to work
> in local. The thing I want to do is to keep only the informations that I
> need to resolve my project that is to create a web interface so as we could
> find a protein by entrying the composition in acid amino.
> So I must create my own database by extracting information from swissprot,
> I need to find:
> -Entry name (it's ok)
> -Primary accession number (it's ok)
> -Seq length (it's ok)
> -Seq weight (I can't find it... :( )
> -Seq protein (it's ok)
> -Protein name (it's ok)
> -Gene name (Can't find it)
> -Organelle (ok)
> -Species name (ok)
> -Taxonomy (ok)
> -Org crossref (can't find it)
> -keyword list (the fonction $seq->keywords() doesn't work or I don't know
> how to use it...)
>
> I've tried the algorithm you send me but it tells that it can't call the
> method has_tag without package or object reference at line
> print $_->get_tag_values('gene'), "\n" if $_->has_tag('gene');
>
> So The thing I try to do is to open the local database of Swissprot and
> write in my own file those informations, but I can't find the Seq weight,
> the Gene's name, the Org crossref (RX) and the keywords...
> You can find my algorithm in the joint file.
>
> Thank you very much.
>
> Laure.

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list