[Bioperl-l] Bioperl-l Digest, Vol 71, Issue 15

Fri Apr 10 14:05:06 UTC 2009

Dereje,

There's a HOW TO that discusses an approach similar to this (Using  
local Genbank and Entrez Gene files):

http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences

But the provided script uses Gene ids, not chromosome names. The more  
general suggestion would be to look at the module Bio::DB::Fasta.

Brian O.

On Mar 31, 2009, at 6:59 PM, demis001 wrote:

>
> Hi ,
>
> I am new to BioPerl and this forum and even do not know how to post  
> the new
> post. I have one question for you guys.
>
> Is there any BioPerl module that allows me to download sequence  
> based on
> chromosome name, seqStart and SeqEnd given the formatted human genome
> database downloaded on my Linux desktop?
>
> I used to do this using Perl $URI object and it is really slow as the
> process depend on the network. To be more specific, I took chrName,  
> seqStart
> and seqEnd and go to Ensembl database to get the sequence one by one  
> using
> Perl $URI object.
>
> I thought it might be easier if I process locally using indexed  
> database
> using BioPerl module if there is any designed for this purpose.
>
> Input, millions  rows of tab delimited (CSV) file contain  
> information about
> chrName, seqStart, seqEnd. Locally formatted/indexed human genome.  
> Output
> should be the fasta sequence contain the sequence and with the header
> contain chr name  and location persed
>
> Sorry if I posted in the wrong section of the forum and happy to  
> get  any
> recommendation.
> Thanks
>
> Govind Chandra wrote:
>>
>> Hi,
>>
>> The code below
>>
>>
>> ====== code begins =======
>> #use strict;
>> use Bio::SeqIO;
>>
>> $infile='NC_000913.gbk';
>> my $seqio=Bio::SeqIO->new(-file => $infile);
>> my $seqobj=$seqio->next_seq();
>> my @features=$seqobj->all_SeqFeatures();
>> my $count=0;
>> foreach my $feature (@features) {
>>  unless($feature->primary_tag() eq 'CDS') {next;}
>>  print($feature->start(),"   ", $feature->end(), "
>> ",$feature->strand(),"\n");
>>  $ac=$feature->annotation();
>>  $temp1=$ac->get_Annotations("locus_tag");
>>  @temp2=$ac->get_Annotations();
>>  print("$temp1   $temp2[0] @temp2\n");
>>  if($count++ > 5) {last;}
>> }
>>
>> print(ref($ac),"\n");
>> exit;
>>
>> ======= code ends ========
>>
>> produces the output
>>
>> ========== output begins ========
>>
>> 190   255   1
>> 0
>> 337   2799   1
>> 0
>> 2801   3733   1
>> 0
>> 3734   5020   1
>> 0
>> 5234   5530   1
>> 0
>> 5683   6459   -1
>> 0
>> 6529   7959   -1
>> 0
>> Bio::Annotation::Collection
>>
>> =========== output ends ==========
>>
>> $ac is-a Bio::Annotation::Collection but does not actually contain  
>> any
>> annotation from the feature. Is this how it should be? I cannot  
>> figure
>> out what is wrong with the script. Earlier I used to use has_tag(),
>> get_tag_values() etc. but the documentation says these are  
>> deprecated.
>>
>> Perl is 5.8.8. BioPerl version is 1.6 (installed today). Output of  
>> uname
>> -a is
>>
>> Linux n61347 2.6.18-92.1.6.el5 #1 SMP Fri Jun 20 02:36:06 EDT 2008
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> Thanks in advance for any help.
>>
>> Govind
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Re%3A-Bioperl-l-Digest%2C-Vol-71%2C-Issue-15-tp22744119p22816585.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l