[Bioperl-l] [How to add features in genbank flat file]

Jason Stajich jason.stajich at duke.edu
Thu Mar 24 20:51:28 EST 2005


You seem annoyed that no one solved the problem for you - I hope that  
you realize that if you want a specific feature you can also modify the  
module yourself and provide a patch to the project.

As for the specifics of your problem perhaps if you highlight what the  
entrez key-value sets need to be set to in order to get the SNP data we  
can add it to the GenBank::Query as an option.

Removing the blank lines is part of the SeqIO parsing but I suppose a  
state variable could be added in genbank.pm to not skip them  when in  
the 'COMMENT' state if this is a critical feature for you.

If you are just downloading genbank files it looks like you have a good  
solution so I'm glad you were able to figure it out.

-jason

> Hello,
> No one seems to have a solution to this problem I posted a month ago.
>
> So, I changed my mind and use 'wget' to get the GenBank sequences.
> I get the full GenBank entry, with most of features.
> And I can avoid another bug: COMMENT lines are not well formated with  
> the BioPerl script I used (not as COMMENT lines are on NCBI), and  
> blank lines are removed.
>
>
> 	#!/usr/bin/perl -w
> 	
> 	use strict;
> 	use diagnostics;
> 	use File::Cat;
> 	
> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is  
> missing.\n\tTry something like: $0 NM_178432\n\n";
> 	
> 	`wget -O output_file.tmp  
> "http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendt 
> o=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef 
> _HPRD=32" 2>/dev/null`;
> 	
> 	cat ("output_file.tmp", \*STDOUT);
> 	unlink("output_file.tmp");
> 	
> 	# wget -O output_file  
> 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send& 
> sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC= 
> 16&ef_HPRD=32'
> 	
> 	exit;
>
>
> Sorry, I don't use BioPerl to Query GenBank (but for other  
> applications) but BioPerl 1.5 has not corrected the COMMENT bug and  
> the missing features.
>
>> Hello,
>> I saw that Genbank web site have changed:
>> Now, features like 'SNPs' are no more included in the EST flat files.
>> At the NCBI web site, we must click on 'features: SNP' to add them in  
>> our flat file.
>> With BioPerl, 1.4 or 1.5, it's the same, the variation features are  
>> no more included in the EST flat files that I upload.
>> Here is the script I use:
>> 	#!/usr/bin/perl -w
>> 	
>> 	use strict;
>> 	use Bio::DB::GenBank;
>> 	use Bio::DB::Query::GenBank;
>> 	use Bio::SeqIO;
>> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is  
>> missing.
>> \n\tTry something like: $0 NM_178432\n\n";
>> 	
>> 	$acc=$acc."[Accession]";
>> 	
>> 	my $query_string = "$acc";
>> 	my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
>> 	                                                  
>> -query=>$query_string);
>> 	
>> 	my $gb = new Bio::DB::GenBank;
>> 	my $stream = $gb->get_Stream_by_query($query);
>> 	
>> 	my $out=Bio::SeqIO->new(-format=>'genbank');
>> 	my $seq = $stream->next_seq();
>> 	
>> 	my $result=$out->write_seq($seq);
>> 	$result =~ s/^1.*$//;
>> 	#print $out->write_seq($seq);
>> 	print $result;
>> 	
>> 	exit;
>> How can I add most of features to my nucleotide flat files ?
>> Thanks
>
> -- 
> Sébastien Moretti
> http://igs.cnrs-mrs.fr/
> CNRS - IGS
> 31 chemin Joseph Aiguier
> 13402 Marseille cedex
>




More information about the Bioperl-l mailing list