[Bioperl-l] question about GenPept.pm
Dimitar Kenanov
dimitark at bii.a-star.edu.sg
Thu Nov 25 02:20:28 UTC 2010
Hi guys,
i want to get some genomes and proteomes from NCBI in fasta format. I
found i have to use 'download_query_genbank.pl' for that. It works but
not as i would like. It uses the modules GenPept and GenBank. They
retrieve the data in fasta but in different format than i want.
Example:
a) i want the fasta to be like the following:
>gi|5834889|ref|NP_006959.1|COX3_10021 cytochrome c oxidase subunit III
[Caenorhabditis elegans]
here sequense...
b) but it comes like this:
>COX3_10021 cytochrome c oxidase subunit III [Caenorhabditis elegans]
here sequense...
But i need the gi and NP as well. So i dug up a bit and after playing
with 'download_query_genbank.pl' i managed to make GenBank to give the
fasta seqs in the format i want.
I made the following changes:
1. added $retformat option for Getopt
2.modified this section:
if( $options{'-db'} eq 'protein' ) {
### DIMITAR ###
if( $retformat eq 'fasta'){
$dbh = Bio::DB::GenPept->new(-verbose => $debug,
-format => 'Fasta');
### END DIMITAR ###
}else{
$dbh = Bio::DB::GenPept->new(-verbose => $debug);
}
} else {
### DIMITAR ###
if( $retformat eq 'fasta'){
$dbh = Bio::DB::GenBank->new(-verbose => $debug,
-format => 'Fasta');
### END DIMITAR ###
}else{
$dbh = Bio::DB::GenBank->new(-verbose => $debug);
}
}
But i go problem with GenPept. I still cant get the seqs in full fasta
format as i explained above. Its interesting cos both modules GenPept
and GenBank are almost identical except that GenBank uses the new method
of NCBIHelper while GenPept has its own which still uses the
NCBIHelper's as well.
With my modification i pass the format i want but then somehow it
reverts to the default set in GenPept which is 'gp' while i need it to
be 'fasta'.
If i change the defaultformat in GenPept to fasta it works but thats
just doing the job without adding the needed flexibility.
Any help would be appreciated. I will try to find solution as well.
Cheers
PS: i attache the modified 'download_query_genbank.pl'
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: download_query_genbank.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20101125/9645c803/attachment-0004.pl>
More information about the Bioperl-l
mailing list