[Bioperl-pipeline] any hope of getting the example to work?
Shawn
shawnh@fugu-sg.org
02 Oct 2002 00:39:27 +0800
--=-I1MKKiDf3zdbmL6HHWs6
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Ok, the problem with storing the genbank features was that
in Bio::EnsEMBL::DBSQL::FeatureAdaptor, it was validating the features
by checking $feat->isa("Bio::EnsEMBL::SeqFeatureI").
Naturally, SeqIO returns Bio::SeqFeature::Generic which inherits from
Bio::SeqFeatureI instead. The simple solution would be to convert each
bioperl seqfeature to a ensembl seqfeature before storing.
I have attached the following load_contigs.pl script which loads genbank
files which works with ensembl-4-1 and bioperl-live.
Also since each feature is associated with an analysis in the Ensembl
schema, I artificially set the analysis to one which has the format of
the file as the logic_name. Change it as u wish.
hope this helps.
shawn
On Tue, 2002-10-01 at 22:39, Andy Nunberg wrote:
> At 09:58 AM 10/1/2002 +0800, you wrote:
> >Hi Andy,
> >
> >sorry we have been tied up with the ciona genome work.
> >
> >
> >> Hi all,
> >> since I cant get the first script in your example to work because of the
> >> XML module, do you have any ideas how to fix this? can you send an older
> >> verision that works?
> >
> >give me a day or 2 ( I know I have been saying this) and I will create a
> >dump of a blast pipeline for you. Kiran is working hard on the XML.
> >
> >I'm assuming you will want to store your features in Ensembl to view them.
> >
> >This will be the layout:
> >
> >You have your BACs stored in Ensembl (the reason why this is needed is because
> > the features need a reference sequence for viewing them on the
> >browser)
> >
> >Contigs will be retrieved from EnsEMBL and blast against dbfiles in the
> >bioperl-pipeline.
> >(Your dbfiles will be your proprietary data).
> >
> >Features are stored into the contig table in Ensembl.
> >
> >> website not theirs) and how to load genbank files into it.
> >
> >in the ensembl-pipeline/scripts directory, there is a load_scaffolds.pl
> >script which
> >works on fasta files. You could convert your genbank files to fasta and load
> >them or
> >just replace the following line:
> >
> >my $seqio = new Bio::SeqIO(-format=>'Fasta',
> > -file=>$seqfile);
> >with
> >
> >my $seqio = new Bio::SeqIO(-format=>'genbank',
> > -file=>$seqfile);
> >
> >this should work with the stable release of ensembl
> >
> >Note though, with genbank, the script will try and store the features as
> well,
> >and you will get a bunch of errors for that since the ensembl and bioperl
> >features
> >aren't really compatible. You can ignore them since you are only interested in
> >the sequences. Or you could just comment out the storing of features.
> Actually I want the features from Genbank, because I want to see where my
> sequences are hitting what features
> on the BAC. Is this possible to do?
>
> >
> >> I'd love to see a course at the O'Reilly meeting where I can learn to do
> >> these basic things. (IE How to set up EnsEMBL database and populate with
> >> your own data.. How to setup a basic bioperl-pipeline..I'd be happy just
> >> setting up a couple of BLASTs!)
> >
> >when we get stable hopefully :)
> >
> >
> >shawn
> >
> >--
> >********************************
> >* Shawn Hoon
> >* http://www.fugu-sg.org/~shawnh
> >********************************
>
> *******************************************************************
> Andy Nunberg, Ph.D
> Computational Biologist
> Orion Genomics, LLC
> (314) 615-6989
> http://www.oriongenomics.com
>
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
>
--=-I1MKKiDf3zdbmL6HHWs6
Content-Disposition: attachment; filename=load_contigs.pl
Content-Transfer-Encoding: quoted-printable
Content-Type: text/x-perl; name=load_contigs.pl; charset=ISO-8859-1
#!/usr/local/bin/perl -w
=3Dhead1 NAME
load_scaffolds.pl
=3Dhead1 SYNOPSIS
=20
load_scaffolds.pl=20
=3Dhead1 DESCRIPTION
This script loads a fasta file into the ensembl clone/contig/dna schema. BE=
WARE! It is for gunshot sequencing projects like fugu, where one clone =3D =
one contig is the rule (and they are usually referred to as scaffolds. It d=
oes not deal with multiple contigs per clone.
If the option -pipe is set it also fills the ensembl pipeline InputIdAnalys=
is with each contig id and value 1 (i.e. ready to be analysed)
=3Dhead1 OPTIONS
-host host name for database (gets put as host=3D in locator)
-dbname For RDBs, what name to connect to (dbname=3D in locator)
-dbuser For RDBs, what username to connect as (dbuser=3D in locator)
-dbpass For RDBs, what password to use (dbpass=3D in locator)
-format file format(fasta,genbank etc)=20
-help displays this documentation with PERLDOC
=3Dcut
use strict;
use vars qw($USER $PASS $DB $HOST $DSN);
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::SeqFeature;
use Bio::EnsEMBL::Analysis;
use Bio::SeqIO;
use Bio::EnsEMBL::PerlDB::Contig;
use Bio::EnsEMBL::PerlDB::Clone;
use Getopt::Long;
my $host =3D 'mysql';
my $port =3D '';
my $dbname =3D '';
my $dbuser =3D 'root';
my $dbpass =3D undef;
my $format =3D "fasta";
my $help =3D 0;
my $verbose =3D 0;
&GetOptions(=20
'host:s' =3D> \$host,
'port:n' =3D> \$port,
'dbname:s' =3D> \$dbname,
'dbuser:s' =3D> \$dbuser,
'dbpass:s' =3D> \$dbpass,
'format:s' =3D> \$format,
'verbose' =3D> \$verbose,
'h|help' =3D> \$help
);
if ($help) {
exec('perldoc', $0);
}
$dbname || exec('perldoc',$0);
$SIG{INT} =3D sub {my $sig=3Dshift;die "exited after SIG$sig";};
my $db =3D Bio::EnsEMBL::DBSQL::DBAdaptor->new(-dbname=3D>$dbname,-user=3D=
>$dbuser,-host=3D>$host,-pass=3D>$dbpass);
my ($seqfile) =3D @ARGV;
if( !defined $seqfile ) { die 'cannot load because sequence file to load se=
quences from was not passed in as argument';}
my $seqio =3D Bio::SeqIO->new(-format =3D>lc($format),
-file =3D>$seqfile);
my $count =3D 0;
while ( my $seq =3D $seqio->next_seq ) {
$seq =3D &convert_seq_features($seq,$format);
my $cloneid=3D $seq->id;
my $contigid =3D $cloneid.".1";
$verbose && print STDERR ("Parsed contig $contigid : contig length ".$s=
eq->length."\n");
my $clone =3D new Bio::EnsEMBL::PerlDB::Clone;
my $contig =3D new Bio::EnsEMBL::PerlDB::Contig;
$clone->htg_phase(3);
$clone->id($cloneid);
$contig->id($contigid);
$contig->internal_id($count++);
$contig->seq($seq); =20
$clone->add_Contig($contig);
eval {=20
$db->write_Clone($clone);
$verbose && print STDERR "Written ".$clone->id." scaffold into db\n"=
;
};
if( $@ ) {
print STDERR "Could not write clone into database, error was $@\n";
}
}
sub convert_seq_features {
my ($seq,$format) =3D @_;
my @feat =3D $seq->all_SeqFeatures;
my $anal =3D Bio::EnsEMBL::Analysis->new(-logic_name=3D>$format);
$anal->dbID(1);
my @ens;
foreach my $feat (@feat){
my $ens =3D Bio::EnsEMBL::SeqFeature->new(-start=3D>$feat->star=
t,
-end =3D> $feat->end,
-strand =3D> $feat->str=
and,
-source_tag=3D>$feat->s=
ource_tag,
-primary_tag=3D>$feat->=
primary_tag,
-analysis=3D>$anal,
-score=3D>100,
-seqname =3D> $feat->s=
eqname);
push @ens, $ens;
}
$seq->flush_SeqFeatures;
$seq->add_SeqFeature(@ens);
return $seq;
}
--=-I1MKKiDf3zdbmL6HHWs6--