[Bioperl-l] Finding locations of a string within a fasta file
Charles Hauser
charlesh at stedwards.edu
Sun Jul 16 23:32:38 UTC 2006
Hi Chris,
Thanks for the info.
Unfortunately, I was not clear that the sequence is unannotated, i.e.
there is no GenBank record. I need to extract the locations of the
gaps from a raw fasta file.
ch
On Jul 15, 2006, at 4:22 PM, Chris Fields wrote:
> You can retrieve the original GenBank CONTIG file using
> Bio::DB::GenBank if
> the format is set to 'gb' (it is now set to 'gbwithparts' by
> default. The
> CONTIG lines are currently stored in a series of
> Bio::Annotation::SimpleValue objects; get the accessions using the
> following
> script.
>
> use strict;
> use warnings;
>
> use Bio::DB::GenBank;
>
> my $factory = Bio::DB::GenBank->new(-format => 'gb');
>
> my $seq = $factory->get_Seq_by_id(shift);
>
> my $seqout = Bio::SeqIO->new(-fh => \*STDOUT,
> -format => 'genbank');
>
> # greps only annotations with CONTIG tagname, joins all together
> my $contig = join '', grep {$_->tagname eq 'CONTIG'}
> $seq->get_Annotations();
>
> # split each region, getting rid of gaps and join(), then split into
> acc/span
> for (grep {$_ !~ m{gap|join}}
> split ',', $contig) {
> my ($acc, $span) = split ':', $_;
> $span =~ s{\)}{}g; # spurious ')'
> print "ACC: $acc\n\tSpan:$span\n";
> }
>
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Charles Hauser
>> Sent: Saturday, July 15, 2006 2:30 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Finding locations of a string within a fasta
>> file
>>
>> All,
>>
>> I'm trying to determine where (the start .. end positions) within a
>> genomic scaffold sequence gaps occur.
>> The gaps are denoted as runs of N's.
>>
>> Suggestions on how to easily retrieve this would be appreciated.
>>
>> ch
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list