[Bioperl-l] Finding locations of a string within a fasta file

Charles Hauser charlesh at stedwards.edu
Sun Jul 16 23:32:38 UTC 2006


Hi Chris,

Thanks for the info.

Unfortunately, I was not clear that the sequence is unannotated, i.e.  
there is no GenBank record.  I need to extract the locations of the  
gaps from a raw fasta file.


ch

On Jul 15, 2006, at 4:22 PM, Chris Fields wrote:

> You can retrieve the original GenBank CONTIG file using  
> Bio::DB::GenBank if
> the format is set to 'gb' (it is now set to 'gbwithparts' by  
> default.  The
> CONTIG lines are currently stored in a series of
> Bio::Annotation::SimpleValue objects; get the accessions using the  
> following
> script.
>
> use strict;
> use warnings;
>
> use Bio::DB::GenBank;
>
> my $factory = Bio::DB::GenBank->new(-format => 'gb');
>
> my $seq = $factory->get_Seq_by_id(shift);
>
> my $seqout = Bio::SeqIO->new(-fh => \*STDOUT,
>                              -format => 'genbank');
>
> # greps only annotations with CONTIG tagname, joins all together
> my $contig = join '', grep {$_->tagname eq 'CONTIG'}
> $seq->get_Annotations();
>
> # split each region, getting rid of gaps and join(), then split into
> acc/span
> for (grep {$_ !~ m{gap|join}}
>      split ',', $contig) {
>     my ($acc, $span) = split ':', $_;
>     $span =~ s{\)}{}g; # spurious ')'
>     print "ACC: $acc\n\tSpan:$span\n";
> }
>
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Hauser
>> Sent: Saturday, July 15, 2006 2:30 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Finding locations of a string within a fasta  
>> file
>>
>> All,
>>
>> I'm trying to determine where (the start .. end positions) within a
>> genomic scaffold sequence gaps occur.
>> The gaps are denoted as runs of N's.
>>
>> Suggestions on how to easily retrieve this would be appreciated.
>>
>> ch
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>




More information about the Bioperl-l mailing list