[Bioperl-l] Location having both "before" and "after" (<x..>y)
jlklassen at wisc.edu
jlklassen at wisc.edu
Wed Jul 24 14:36:40 EDT 2013
Thanks for the clarification, Jason. Works nicely.
All the best,
Jonathan
On Wednesday, July 17, 2013 3:09:59 PM UTC-5, Jason Stajich wrote:
>
> You want to make it a fuzzy location I believe.
>
> Does this work better?
>
> Bio::SeqFeature::Generic->new( -location =>
> Bio::Location::Fuzzy->new(-start => $start, -end => $end, -strand =>
> $strand)));
>
> Jason
> On Jul 16, 2013, at 7:36 AM, jlkl... at wisc.edu <javascript:> wrote:
>
> > Hello all,
> >
> > I am attempting to output a CDS annotation where the CDS spans both the
> > beginning and ending of my contig, according to the NCBI standard:
> > http://www.ncbi.nlm.nih.gov/genbank/examples.wgs#partialcds. According
> to
> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation, my understanding
> is
> > that BioPerl supports both "BEFORE" and "AFTER" annotations, and indeed
> I
> > can output an appropriate gbk with either the "BEFORE" or "AFTER"
> > individually. However, when I have them both together I get something
> like
> > this:
> >
> > LOCUS Contig_381.1 220 bp dna linear UNK
> > ACCESSION unknown
> > FEATURES Location/Qualifiers
> > CDS <3
> > /locus_id="M32_06772"
> > /gene="Contig_381.1_1"
> >
> > /translation="AYGARAKGGVVAWEPLPVQYADYALWQQELLGGGGDAGGVVSQG
> > LAYWEGVLEGLPQELELPVDRPRPVRPS"
> > ORIGIN
> > 1 gggcgtacgg ggcgcgtgcc aagggtgggg tggtggcgtg ggagccgttg
> ccggtgcagt
> > 61 acgcggatta cgcgttgtgg cagcaggagc tgctcggggg tgggggtgat
> gcgggcggtg
> > 121 tggtctcgca agggttggcg tattgggagg gggtgctgga ggggttgccg
> caggagctgg
> > 181 agttgccggt ggaccggccg cgtccggtgc gtccgtccta
> > //
> >
> > You can see that the the CDS location annotation has the appropriate 5'
> > annotation, but not the 3' (should be >218). Note that the translation
> is
> > parsed from the Prodigal output and not generated directly in BioPerl.
> I'm
> > a somewhat new to BioPerl, so am hoping that I'm missing something
> > relatively simple, but a search of the mailing list did not reveal an
> > obvious answer. I'm using BioPerl v1.6.901 on Ubuntu LINUX 12.04
> (Biolinux
> > 7). My code is to create a multiple gbk file from a set of bacterial
> gene
> > models, as predicted using Prodigal using multiple contigs as a
> template.
> >
> > Many thanks in advance for the help
> >
> > Jonathan Klassen
> > Postdoctoral Fellow
> > University of Wisconsin-Madison
> >
> >
> >
> > abbreviated code:
> >
> > use strict;
> > use warnings;
> > use Bio::Seq;
> > use Bio::SeqIO;
> > use Bio::SeqUtils;
> > use Bio::SeqFeature::Generic;
> >
> > my $ffn_obj = Bio::SeqIO->new(-file => $ffn_infile, -format => 'fasta',
> > -alphabet => 'dna');
> >
> > while (my $each_gene = $ffn_obj->next_seq){
> >
> > # parse prodigal ffn fasta headers
> >
> > my $gene_id = $each_gene->display_id;
> > $gene_id =~ /^(.*)_\d+$/ or die "Cannot match gene_id: $gene_id";
> > my $contig_id = $1;
> > my $gene_desc = $each_gene->desc;
> > $gene_desc =~ /\# (\d+) \# (\d+) # (-1|1) # .*;partial=(0|1)(0|1);/
> or
> > die "Cannot match gene_desc: $gene_desc";
> > my $start = $1;
> > my $end = $2;
> > my $orientation = $3;
> > my $five_partial = $4;
> > my $three_partial = $5;
> >
> > # accommodates partial genes
> >
> > ##############################################
> > # comment out these next 6 lines to reproduce the error
> > ##############################################
> >
> > if ($five_partial == 1){
> > $start = "<".$start;
> > }
> > if ($three_partial == 1){
> > $end = ">".$end;
> > }
> >
> > # creates annotation feature for this gene
> >
> > my $feature = new Bio::SeqFeature::Generic(
> > -start => $start,
> > -end => $end,
> > -strand => $orientation,
> > -primary_tag => 'CDS',
> > );
> >
> > # hash of arrays, keyed by contig_id
> >
> > push @{$genes{$contig_id}{gene_obj}}, $each_gene;
> > push @{$genes{$contig_id}{feature_obj}}, $feature;
> > push @{$genes{$contig_id}{gene_id}}, $gene_id;
> > push @{$genes{$contig_id}{locus_id}}, $locus_id;
> > }
> >
> > # read through fna, annotating and outputting genes and prots
> >
> > my $genome_gbk = Bio::SeqIO->new(-file => ">$output_prefix.gbk",
> -format
> > => 'genbank', alphabet => 'dna' );
> > my $fna_obj = Bio::SeqIO->new(-file => $fna_infile, -format => 'fasta',
> > -alphabet => 'dna');
> > while (my $contig = $fna_obj->next_seq){
> > my $contig_id = $contig->display_id;
> >
> > for my $a (0..$#{$genes{$contig_id}{gene_obj}}){
> > $contig->add_SeqFeature($genes{$contig_id}{feature_obj}[$a]);
> > }
> > $genome_gbk->write_seq($contig);
> > }
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Biop... at lists.open-bio.org <javascript:>
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason.... at gmail.com <javascript:>
> ja... at bioperl.org <javascript:>
>
>
More information about the Bioperl-l
mailing list