[Bioperl-l] swiss prot

Heikki Lehvaslaiho heikki@ebi.ac.uk
Thu, 12 Apr 2001 16:31:55 +0100


OK. They moved the dbfetch script over to the main site here at EBI.
Unfortunately, the main page has some typos (I have a constant and
loving relationship with typos, I am sure you've noticed that.) and
some of the examples are not working properly. Easter holidays are
here and I can not get the fixes over to the main web site. Pay no
attention to them.

Have a look at :

	http://www.ebi.ac.uk/cgi-bin/dbfetch

Sean's entries:

http://www.ebi.ac.uk/cgi-bin/dbfetch?db=swall&format=fasta&id=P00916,O39869

are both retrieved.


The next step is to get suggestions what databases should be added to
the script and get volunteers to design storage objects and
corresponding Bio::DB module. 

At the moment we have  SWALL, PDB, Medline and Ensembl available.

I've updated Bio::DB::EMBL. It is up to Jason what he does with
Bio::DB::Swissprot.

Any takers for Medline literature refence object design?

	-Heikki

Heikki Lehvaslaiho wrote:
> 
> I think I have solution, but it is not ready, yet.
> 
> Rodrigo was teasing me to make emblfetch cgi script into a general
> dbfetch and took him seriously. 8-) The script is in testing phase
> here at EBI. It offers an easy way to access any local SRS database.
> The database specific parameters are kept in an easy to modify hash
> (has to be modified within the script for speed). I debugged it using
> EMBL, Medline (servs XML!), and Ensembl. It took me exactly one minute
> to add SWALL into it.  SWALL is a weekly updated SWISS-PROT +
> SP-TrEMBL +  TrEMBLnew.
> 
> In a short while  (week or so depending on how many bugs and feature
> changes others want to have before the release) we should be able
> point Bio::DB::Swissprot to this script. I am going to distribute the
> dbfetch script so that hopefully most SRS maintainers install it and
> people could use SRS server closest to them.
> 
>         -Heikki
> 
> Jason Stajich wrote:
> >
> > This is a TrEMBL entry not Swiss prot.  <sigh>. swiss format expects
> > ID_DIVISION in ID line.  There is no real good way to determine this on
> > the fly in Bio::DB::EMBL since we pass the stream to a SeqIO object.
> >
> > [sprot]  http://www.expasy.org/cgi-bin/get-sprot-raw.pl?P00916
> > [TrEMBL] http://www.expasy.org/cgi-bin/get-sprot-raw.pl?O39869
> >
> > Bioperl: here is my fix - please let me know if you think this is
> > acceptable and I'll submit the fix.
> >
> > I am assigning division to UNK for the TrEMBL entry even though we could
> > probably deduce it from OC lines - I don't want to deal with that right
> > now... (also changed ^\s to \S since they are equivalent).
> >
> > RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/swiss.pm,v
> > retrieving revision 1.36
> > diff -r1.36 swiss.pm
> > 153c153
> > <    $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
> > ---
> > >    $line =~ /^ID\s+([\S_]+)(_[\S_]+)?\s+([\S;]+);\s+([\S;]+);/
> > 155c155,161
> > <    $name = $1."_".$2;
> > ---
> > >    if( $2 ) {
> > >        $name = $1."_".$2;
> > >        $seq->division($2);
> > >    } else {
> > >        $name = $1;
> > >        $seq->division('UNK');
> > >    }
> > 157d162
> > <    $seq->division($2);
> >
> > On Tue, 10 Apr 2001, Xiangyun Wang wrote:
> >
> > > Hi,
> > >
> > > I am using the bio::DB::siwssprot module to retrieve protein sequences
> > > with their id.
> > >
> > > But some proteins (as Q9EPU5) can't be retrieved.
> > >
> > > What's the problem here?
> > >
> > > Thanks
> > > Sean
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> >
> > Jason Stajich
> > jason@chg.mc.duke.edu
> > Center for Human Genetics
> > Duke University Medical Center
> > http://www.chg.duke.edu/
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 
> --
> ______ _/      _/_____________________________________________________
>       _/      _/                      http://www.ebi.ac.uk/mutations/
>      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
>     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
>    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
>   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
>      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________