[Bioperl-l] simple PrimarySeq question
Hilmar Lapp
hlapp at gmx.net
Mon Jul 2 22:36:19 EDT 2007
Check out the POD of Bio::Seq::SeqBuilder, the synopsis should have
examples for what you want to do:
use Bio::SeqIO;
# usually you won't instantiate this yourself - a SeqIO object -
# you will have one already
my $seqin = Bio::SeqIO->new(-fh => \*STDIN, -format => "genbank");
my $builder = $seqin->sequence_builder();
# if you need only sequence, id, and description (e.g. for
# conversion to FASTA format):
# if you want everything except the sequence and features
$builder->want_all(1); # this is the default if it's untouched
Let us know if that doesn't answer your question.
Note that this is currently only implemented for Genbank format.
On Jul 2, 2007, at 8:41 PM, niels at genomics.dk wrote:
> Kevin,
> Thanks, but I didnt put the question very clearly sorry .. yes, SeqIO
> gets entries from file, and from those large parsed entries I can
> get a
> simplified primary_seq object. But the SeqIO object includes feature
> and annotation objects etc that takes time to make, and I wish to know
> if there is a way to get a primari_seq object without this overhead. I
> apologize if I overlooked it in the docs.
> Niels
>> Start by having a look at the following link:
>> http://bioperl.org/cgi-bin/deob_interface.cgi
>> SeqIO is how one reads or writes sequences to/from files.
>> Bio::PrimarySeq is just an object that holds information about a
>> sequence obtained from a file.
>> As for how to parse a Genbank file into a list of features:
>> $file = Bio::SeqIO->new(-format => $format, -file => "id.gb");
>> while (my $seq = $file->next_seq())
>> {
>> @features = $seq->all_SeqFeatures;
>> # sort features by their primary tags
>> for my $f (@features)
>> {
>> my $tag = $f->primary_tag;
>> if ($tag eq 'CDS')
>> {
>> # @sorted_features holds all the Bio::PrimarySeq
>> features obtained from the genbank file
>> push @sorted_features, $f;
>> }
>> }
>> }
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Niels Larsen
>>> Sent: Monday, July 02, 2007 1:45 PM
>>> Cc: bioperl-l List
>>> Subject: [Bioperl-l] simple PrimarySeq question
>>> I write hoping someone could show me how to create a
>>> PrimarySeq object without parsing features and all first. The
>>> lines below return
>>> "Can't locate object method "next_seq" via package
>>> "Bio::PrimarySeq" at ./tst2 line 16."
>>> whereas calling Bio::SeqIO-> gives no error, but a too big object.
>>> The GenBank record after the __END__ is the "1.gb" file. I
>>> could not find out how from the tutorial or the
>>> Bio::PrimarySeq description.
>>> Niels L
>>> #!/usr/bin/env perl
>>> use strict;
>>> use warnings FATAL => qw ( all );
>>> use Data::Dumper;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> my ( $seq_h, $seq );
>>> $seq_h = Bio::PrimarySeq->new( -file => "1.gb", -format =>
>>> 'genbank' ); # $seq_h = Bio::SeqIO->new( -file => "1.gb",
>>> -format => 'genbank' );
>>> $seq = $seq_h->next_seq();
>>> # print Dumper( $seq );
>>> __END__
>>> LOCUS X60065 9 bp mRNA linear
>>> MAM 14-NOV-2006
>>> DEFINITION B.bovis beta-2-gpI mRNA for beta-2-glycoprotein I.
>>> ACCESSION X60065 REGION: 1..9
>>> VERSION X60065.1 GI:5
>>> KEYWORDS beta-2 glycoprotein I.
>>> SOURCE Bos taurus (cattle)
>>> ORGANISM Bos taurus
>>> Eukaryota; Metazoa; Chordata; Craniata;
>>> Vertebrata; Euteleostomi;
>>> Mammalia; Eutheria; Laurasiatheria;
>>> Cetartiodactyla; Ruminantia;
>>> Pecora; Bovidae; Bovinae; Bos.
>>> AUTHORS Bendixen,E., Halkier,T., Magnusson,S.,
>>> Sottrup-Jensen,L. and
>>> Kristensen,T.
>>> TITLE Complete primary structure of bovine beta
>>> 2-glycoprotein I:
>>> localization of the disulfide bridges
>>> JOURNAL Biochemistry 31 (14), 3611-3617 (1992)
>>> PUBMED 1567819
>>> REFERENCE 2 (bases 1 to 9)
>>> AUTHORS Kristensen,T.
>>> TITLE Direct Submission
>>> JOURNAL Submitted (11-JUN-1991) T. Kristensen, Dept of
>>> Mol Biology,
>>> University of Aarhus, C F Mollers Alle 130,
>>> DK-8000 Aarhus C,
>>> FEATURES Location/Qualifiers
>>> source 1..9
>>> /organism="Bos taurus"
>>> /mol_type="mRNA"
>>> /db_xref="taxon:9913"
>>> /clone="pBB2I"
>>> /tissue_type="liver"
>>> gene <1..>9
>>> /gene="beta-2-gpI"
>>> CDS <1..>9
>>> /gene="beta-2-gpI"
>>> /codon_start=1
>>> /product="beta-2-glycoprotein I"
>>> /protein_id="CAA42669.1"
>>> /db_xref="GI:6"
>>> /db_xref="GOA:P17690"
>>> /db_xref="UniProtKB/Swiss-Prot:P17690"
>>> sig_peptide <1..>9
>>> /gene="beta-2-gpI"
>>> 1 ccagcgctc
>>> //
