[Bioperl-l] [How to add features in genbank flat file]

Sebastien Moretti sebastien.moretti at igs.cnrs-mrs.fr
Thu Mar 24 06:05:27 EST 2005


Hello,
No one seems to have a solution to this problem I posted a month ago.

So, I changed my mind and use 'wget' to get the GenBank sequences.
I get the full GenBank entry, with most of features.
And I can avoid another bug: COMMENT lines are not well formated with 
the BioPerl script I used (not as COMMENT lines are on NCBI), and blank 
lines are removed.


	#!/usr/bin/perl -w
	
	use strict;
	use diagnostics;
	use File::Cat;
	
	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is 
missing.\n\tTry something like: $0 NM_178432\n\n";
	
	`wget -O output_file.tmp 
"http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=$acc&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32" 
2>/dev/null`;
	
	cat ("output_file.tmp", \*STDOUT);
	unlink("output_file.tmp");
	
	# wget -O output_file 
'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&qty=1&c_start=1&val=NM_178432&dopt=gbwithparts&send=Send&sendto=t&from=begin&to=end&extrafeatpresent=1&ef_SNP=1&ef_CDD=8&ef_MGC=16&ef_HPRD=32'
	
	exit;


Sorry, I don't use BioPerl to Query GenBank (but for other applications) 
but BioPerl 1.5 has not corrected the COMMENT bug and the missing features.

> Hello,
> I saw that Genbank web site have changed:
> Now, features like 'SNPs' are no more included in the EST flat files.
> At the NCBI web site, we must click on 'features: SNP' to add them in our flat 
> file.
> 
> With BioPerl, 1.4 or 1.5, it's the same, the variation features are no more 
> included in the EST flat files that I upload.
> 
> Here is the script I use:
> 	#!/usr/bin/perl -w
> 	
> 	use strict;
> 	use Bio::DB::GenBank;
> 	use Bio::DB::Query::GenBank;
> 	use Bio::SeqIO;
> 	my $acc=$ARGV[0] or die "\n\tThe accession number you seek for is missing.
> \n\tTry something like: $0 NM_178432\n\n";
> 	
> 	$acc=$acc."[Accession]";
> 	
> 	my $query_string = "$acc";
> 	my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
> 	                                                 -query=>$query_string);
> 	
> 	my $gb = new Bio::DB::GenBank;
> 	my $stream = $gb->get_Stream_by_query($query);
> 	
> 	my $out=Bio::SeqIO->new(-format=>'genbank');
> 	my $seq = $stream->next_seq();
> 	
> 	my $result=$out->write_seq($seq);
> 	$result =~ s/^1.*$//;
> 	#print $out->write_seq($seq);
> 	print $result;
> 	
> 	exit;
> 
> How can I add most of features to my nucleotide flat files ?
> 
> Thanks

-- 
Sébastien Moretti
http://igs.cnrs-mrs.fr/
CNRS - IGS
31 chemin Joseph Aiguier
13402 Marseille cedex


More information about the Bioperl-l mailing list