[Bioperl-l] (no subject)
Hermann Norpois
hnorpois at googlemail.com
Mon May 7 14:03:38 EDT 2012
Hello,
I wrote a script to retrieve promoter sequences from genbank (see below)
that doesnt work in the get_stream modus. Input is an array (or list) of
geneids. In my output file (test4.fasta) only the last header is followed
by a sequence (see below). I guess I need a second stream but I do not know
how to construct.
Thank you very much for helping.
Hermann
*script:
*#!/bin/perl -w
use strict;
use warnings;
use Bio::DB::EntrezGene;
use Bio::SeqIO;
use Bio::DB::GenBank;
my $seqio_obj = Bio::SeqIO->new(-file => ">test4.fasta", -format => 'fasta'
); # outputfile
my $db = new Bio::DB::EntrezGene;
my $seqio = $db->get_Stream_by_id([54161, 12064, 71661]); # Gene ids
while ( my $seq = $seqio->next_seq ) {
my $ac = $seq->annotation;
for my $ann ($ac->get_Annotations('dblink')) {
if ($ann->database eq "Evidence Viewer") {
# get the sequence identifier, the start, and the stop
my ($contig,$from,$to) = $ann->url =~
/contig=([^&]+).+from=(\d+)&to=(\d+)/;
my $chr_start = $from-700;
my $chr_stop = $from;
my $strand = 1;
my $gb = Bio::DB::GenBank->new(-format => 'fasta',
-seq_start => $chr_start,
-seq_stop => $chr_stop,
-strand => $strand
);
print "$contig\t$from\t$to\n\t$chr_start\t$chr_stop\n"; #
For control: Printing coordinates
my $obj = $gb->get_Seq_by_id($contig);
$seqio_obj->write_seq($obj);
}
}
}
*test4.fasta*
>gi|28485536:10326633-10326632 Mus musculus chromosome 2 genomic contig,
strain C57BL/6J
>gi|28526280:23304000-23303999 Mus musculus chromosome 6 genomic contig,
strain C57BL/6J
>gi|372099010:6466323-6467023 Mus musculus strain C57BL/6J chromosome 7
genomic contig, GRCm38 C57BL/6J MMCHR7_CTG5_2
CCCCTCCCCGCAGCTTGATTCCTATAAAAACCTGCCATTTTGGATGAATGTGCTGTTCGC
CCTTGGCTCCTTTCTTGGTCCACTTGCCCTCTCTCTTCTCTCTCTCTCCTTTCACTTTCT...
More information about the Bioperl-l
mailing list