[BioSQL-l] Pubmed-ID's from SwissPort

Hilmar Lapp hlapp at gnf.org
Tue Aug 30 21:53:28 EDT 2005


The annotation is taken from what's in the source record, so I'm 
assuming you're referring to those references that have a PubMed as 
well as a MEDLINE ID annotated in the SwissProt record.

If only one ID is provided, that ID will be stored in the database 
(using a foreign key in the Reference table to Dbxref), so if the 
MEDLINE ID is absent the PubMed ID will substitute for it if it was 
present in the source entry. Note that there is no on-the-fly lookup to 
whatever site to find out the other ID if only one is given.

If both IDs are present, the relational model right now doesn't permit 
you to store both because the relationship between Dbxref and Reference 
is 1:n, not n:n. I.e., there is a foreign key in the Reference table, 
not an association table between the two.

You could alter the schema and accordingly 
Bio/DB/BioSQL/ReferenceAdaptor.pm in bioperl-db in order to store both 
IDs, but then you're no longer in sync with the biosql/bioperl-db 
development.

If your main goal is to change preference from the MEDLINE ID to the 
PubMed ID you can achieve that relatively easily by writing a 
SeqProcessor and cheating a little on the reference annotation objects, 
e.g. like this (not tested, so may contain typos, but you get the 
idea):

package PubmedProcessor;
use vars qw(@ISA);
use strict;
use Bio::Seq::BaseSeqProcessor;
@ISA = qw(Bio::Seq::BaseSeqProcessor);

# check the POD if Bio::Seq::BaseSeqProcessor to understand what
# this method does
sub process_seq {
	my ($self,$seq) = @_;
	foreach my $ref ($seq->annotation->get_Annotations('reference')) {
		# don't bother if there's no pubmed ID anyway
         next unless $ref->pubmed();
		# cheat that PubMed is Medline to fool the preference order
		# in bioperl-db
		my $id = $ref->medline();
		$ref->medline($ref->pubmed());
		$ref->pubmed($id);
	}
	return ($seq);
}
1;

Then supply the module to load_seqdatabase.pl using the --pipeline 
command line argument (see the POD).


Hth,

	-hilmar

On Aug 30, 2005, at 2:16 AM, Silke Trissl wrote:

> Hello,
>
> we are using BioSQL to store SwissProt. Currently we only get
> MEDLINE-ID's from the literature references.
>
> My question now is, is there an easy way - like adding an additional
> argument when starting the filling process - to get PubMed ID's from
> SwissProt as well or instead.
>
> We are using BioPerl to fill a PostGreSQL database.
>
> Thanks for any help in advance.
>
> 	Silke Trissl
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list