[Bioperl-l] Location having both "before" and "after" (<x..>y)

jlklassen at wisc.edu jlklassen at wisc.edu
Wed Jul 24 14:36:40 EDT 2013


Thanks for the clarification, Jason.  Works nicely.

All the best,

Jonathan

On Wednesday, July 17, 2013 3:09:59 PM UTC-5, Jason Stajich wrote:
>
> You want to make it a fuzzy location I believe. 
>
> Does this work better? 
>
> Bio::SeqFeature::Generic->new( -location => 
> Bio::Location::Fuzzy->new(-start => $start, -end => $end, -strand => 
> $strand))); 
>
> Jason 
> On Jul 16, 2013, at 7:36 AM, jlkl... at wisc.edu <javascript:> wrote: 
>
> > Hello all, 
> > 
> > I am attempting to output a CDS annotation where the CDS spans both the 
> > beginning and ending of my contig, according to the NCBI standard: 
> > http://www.ncbi.nlm.nih.gov/genbank/examples.wgs#partialcds.  According 
> to 
> > http://www.bioperl.org/wiki/HOWTO:Feature-Annotation, my understanding 
> is 
> > that BioPerl supports both "BEFORE" and "AFTER" annotations, and indeed 
> I 
> > can output an appropriate gbk with either the "BEFORE" or "AFTER" 
> > individually.  However, when I have them both together I get something 
> like 
> > this: 
> > 
> > LOCUS       Contig_381.1             220 bp    dna     linear   UNK 
> > ACCESSION   unknown 
> > FEATURES             Location/Qualifiers 
> >     CDS             <3 
> >                     /locus_id="M32_06772" 
> >                     /gene="Contig_381.1_1" 
> > 
> > /translation="AYGARAKGGVVAWEPLPVQYADYALWQQELLGGGGDAGGVVSQG 
> >                     LAYWEGVLEGLPQELELPVDRPRPVRPS" 
> > ORIGIN       
> >        1 gggcgtacgg ggcgcgtgcc aagggtgggg tggtggcgtg ggagccgttg 
> ccggtgcagt 
> >       61 acgcggatta cgcgttgtgg cagcaggagc tgctcggggg tgggggtgat 
> gcgggcggtg 
> >      121 tggtctcgca agggttggcg tattgggagg gggtgctgga ggggttgccg 
> caggagctgg 
> >      181 agttgccggt ggaccggccg cgtccggtgc gtccgtccta 
> > // 
> > 
> > You can see that the the CDS location annotation has the appropriate 5' 
> > annotation, but not the 3' (should be >218).  Note that the translation 
> is 
> > parsed from the Prodigal output and not generated directly in BioPerl. 
>  I'm 
> > a somewhat new to BioPerl, so am hoping that I'm missing something 
> > relatively simple, but a search of the mailing list did not reveal an 
> > obvious answer.  I'm using BioPerl v1.6.901 on Ubuntu LINUX 12.04 
> (Biolinux 
> > 7).  My code is to create a multiple gbk file from a set of bacterial 
> gene 
> > models, as predicted using Prodigal using multiple contigs as a 
> template. 
> > 
> > Many thanks in advance for the help 
> > 
> > Jonathan Klassen 
> > Postdoctoral Fellow 
> > University of Wisconsin-Madison 
> > 
> > 
> > 
> > abbreviated code: 
> > 
> > use strict; 
> > use warnings; 
> > use Bio::Seq; 
> > use Bio::SeqIO; 
> > use Bio::SeqUtils; 
> > use Bio::SeqFeature::Generic; 
> > 
> > my $ffn_obj = Bio::SeqIO->new(-file => $ffn_infile, -format => 'fasta', 
> > -alphabet => 'dna'); 
> > 
> > while (my $each_gene = $ffn_obj->next_seq){ 
> > 
> >    # parse prodigal ffn fasta headers 
> > 
> >    my $gene_id = $each_gene->display_id; 
> >    $gene_id =~ /^(.*)_\d+$/ or die "Cannot match gene_id: $gene_id"; 
> >    my $contig_id = $1; 
> >    my $gene_desc = $each_gene->desc; 
> >    $gene_desc =~ /\# (\d+) \# (\d+) # (-1|1) # .*;partial=(0|1)(0|1);/ 
> or 
> > die "Cannot match gene_desc: $gene_desc"; 
> >    my $start         = $1; 
> >    my $end           = $2; 
> >    my $orientation   = $3; 
> >    my $five_partial  = $4; 
> >    my $three_partial = $5; 
> > 
> >    # accommodates partial genes 
> > 
> > ############################################## 
> > # comment out these next 6 lines to reproduce the error 
> > ############################################## 
> > 
> >    if ($five_partial == 1){ 
> >        $start = "<".$start; 
> >    } 
> >    if ($three_partial == 1){ 
> >        $end = ">".$end; 
> >    } 
> > 
> >    # creates annotation feature for this gene 
> > 
> >    my $feature = new Bio::SeqFeature::Generic( 
> >        -start       => $start, 
> >        -end         => $end, 
> >        -strand      => $orientation, 
> >        -primary_tag => 'CDS', 
> >    ); 
> > 
> >    # hash of arrays, keyed by contig_id 
> > 
> >    push @{$genes{$contig_id}{gene_obj}}, $each_gene; 
> >    push @{$genes{$contig_id}{feature_obj}}, $feature; 
> >    push @{$genes{$contig_id}{gene_id}}, $gene_id; 
> >    push @{$genes{$contig_id}{locus_id}}, $locus_id; 
> > } 
> > 
> > # read through fna, annotating and outputting genes and prots 
> > 
> > my $genome_gbk  = Bio::SeqIO->new(-file => ">$output_prefix.gbk", 
> -format 
> > => 'genbank', alphabet => 'dna'    ); 
> > my $fna_obj = Bio::SeqIO->new(-file => $fna_infile, -format => 'fasta', 
> > -alphabet => 'dna'); 
> > while (my $contig = $fna_obj->next_seq){ 
> >    my $contig_id = $contig->display_id; 
> > 
> >    for my $a (0..$#{$genes{$contig_id}{gene_obj}}){ 
> >        $contig->add_SeqFeature($genes{$contig_id}{feature_obj}[$a]); 
> >    } 
> >    $genome_gbk->write_seq($contig); 
> > } 
> > 
> > _______________________________________________ 
> > Bioperl-l mailing list 
> > Biop... at lists.open-bio.org <javascript:> 
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l 
>
> Jason Stajich 
> jason.... at gmail.com <javascript:> 
> ja... at bioperl.org <javascript:> 
>
>


More information about the Bioperl-l mailing list