[Bioperl-l] some contigs do not work for sequence retrievel

Hans-Rudolf Hotz hrh at fmi.ch
Tue May 8 16:47:16 UTC 2012


Hi Hermann


I can't give you the full answer, as I am not familiar enough with the 
inner works of the "Bio::DB::GenBank" module.


However, as first idea, you might wanna check the NCBI annotation:

for the "Evidence Viewer" (why are you using this link?):

" 54161" links to: 
http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=10090&contig=NT_039353.1&gene=Copg&lid=54161&from=57085128&to=57110783

NT_039353.1 is no longer the current sequence version, see:
http://www.ncbi.nlm.nih.gov/nuccore/NT_039353.1



"18619" links to: 
http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=10090&contig=NT_187032.1&gene=Penk&lid=18619&from=1083535&to=1088444

NT_187032.1  IS the current sequence version, see:
http://www.ncbi.nlm.nih.gov/nuccore/NT_187032.1



maybe someone can jump in and explain, why in this particular case 
fetching of an old sequence version is not possible. It usually just 
works for me.


Regards, Hans

On 05/08/2012 06:16 PM, Hermann Norpois wrote:
> Hello,
>
> for getting a sequence 5 prime upstream of TTS I wrote a script that works
> for some geneids but not for all. I always get a contig and coordinates. I
> do not have an idea why I do not get a sequence ( I only get fasta
> headers). Actually the sequence ID should be out of importance if I see
> that a contig is detected. Has anybody an idea?
>
> Thanks
> Hermann Norpois
>
>
> #!/bin/perl -w
> use strict;
> use Bio::DB::EntrezGene;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> my $id = "12064"; #Works with geneid 18619 (Penk1) but not with 54161
> (copg) or 12064 (bdnf)
>
> my $seqio_obj = Bio::SeqIO->new(-file =>  ">bdnf.fasta", -format =>  'fasta'
> );
>
> my $db = new Bio::DB::EntrezGene;
>
> my $seq = $db->get_Seq_by_id($id);
>
> my $ac = $seq->annotation;
>
> for my $ann ($ac->get_Annotations('dblink')) {
>      if ($ann->database eq "Evidence Viewer") {
>                  # get the sequence identifier, the start, and the stop
>          my ($contig,$from,$to) = $ann->url =~
>            /contig=([^&]+).+from=(\d+)&to=(\d+)/;
>                   my $chr_start = $from-700;
>                   my $chr_stop = $from;
>                 #  my $strand = 1;
>                  print "CONTIG:\t$contig\tFROM\t$from\tTO\t$to\n\tFETCHING
> SEQUENCE FROM\t$chr_start\tTO\t$chr_stop\n"; # Control that something was
> detected.
>                  my $gb  = Bio::DB::GenBank->new(-format     =>  'fasta',
>                                  -seq_start  =>  $chr_start,
>                                  -seq_stop   =>  $chr_stop,
>                  #               -strand     =>  $strand
>                  #               -complexity =>  1
>                                  );
>                #  $gb->request_format('fasta');
>                  my $obj = $gb->get_Seq_by_id($contig);
>
>                  $seqio_obj->write_seq($obj);
>
>      }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list