From xupeng86 at gmail.com Wed Jan 5 03:58:12 2011 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Wed, 5 Jan 2011 16:58:12 +0800 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? Message-ID: Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) , or can convert sequence in biosql to genbank files ? Many thanks! From biopython at maubp.freeserve.co.uk Wed Jan 5 04:39:08 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Jan 2011 09:39:08 +0000 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: On Wed, Jan 5, 2011 at 8:58 AM, ?? wrote: > > Is there any tools can convert a bacteria_accession number( hole genome) to > ffn format( gene multi fasta) , You can download *.ffn files from the NCBI's FTP site, e.g. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ If you want most/all of the available genomes as ffn files, I would just download them all as a gzipped file: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ffn.tar.gz Alternatively, you can probably do this via the NCBI Entrez API. I've not tried through. My guess is you'd need to map the genome accession to a list of gene IDs (using ELink), then fetch them as FASTA entries (using EFetch). > or ?can convert sequence in biosql to genbank files ? > > Many thanks! If you have loaded the genomes into a BioSQL database (e.g. from the GenBank files), then you can easily get the genomes back again as SeqRecord objects, and save those as GenBank files. However, in order to get the nucleotide sequences of the genes you would have to use the SeqFeature objects and their extract method. Peter From biopython at maubp.freeserve.co.uk Wed Jan 5 04:40:54 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Jan 2011 09:40:54 +0000 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: On Wed, Jan 5, 2011 at 9:39 AM, Peter wrote: > On Wed, Jan 5, 2011 at 8:58 AM, ?? wrote: >> >> Is there any tools can convert a bacteria_accession number( hole genome) to >> ffn format( gene multi fasta) , > > You can download *.ffn files from the NCBI's FTP site, e.g. > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ > > If you want most/all of the available genomes as ffn files, I would > just download them all as a gzipped file: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ffn.tar.gz > > Alternatively, you can probably do this via the NCBI Entrez API. > I've not tried through. My guess is you'd need to map the genome > accession to a list of gene IDs (using ELink), then fetch them > as FASTA entries (using EFetch). All of the above remarks would apply to BioPerl, Biopython, etc (and are not really relevant to the BioSQL mailing list). >> or ?can convert sequence in biosql to genbank files ? >> >> Many thanks! > > If you have loaded the genomes into a BioSQL database (e.g. > from the GenBank files), then you can easily get the genomes > back again as SeqRecord objects, and save those as GenBank > files. However, in order to get the nucleotide sequences of the > genes you would have to use the SeqFeature objects and their > extract method. The above applies if you are using BioSQL with Biopython. I would expect BioPerl etc to offer similar functionality. Peter From hlapp at drycafe.net Wed Jan 5 09:23:26 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 5 Jan 2011 09:23:26 -0500 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: <7EAD47CE-7279-4C99-A560-5ED02836BAED@drycafe.net> On Jan 5, 2011, at 4:40 AM, Peter wrote: > The above applies if you are using BioSQL with Biopython. > I would expect BioPerl etc to offer similar functionality. Indeed. Thanks for answering, Peter! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From xupeng86 at gmail.com Thu Jan 6 00:19:01 2011 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Thu, 6 Jan 2011 13:19:01 +0800 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: It seems that Bioperl don't have such scripts which can convert genbank files to .fna/.ffn/.faa etc fasta format, isn't it ? Bio::SeqIO seems can't tackle this problem. From biopython at maubp.freeserve.co.uk Thu Jan 6 05:34:51 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Jan 2011 10:34:51 +0000 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 5:19 AM, ?? wrote: > > It seems that Bioperl don't have such scripts which can convert genbank > files to .fna/.ffn/.faa etc ?fasta format, isn't it ? > Bio::SeqIO seems can't tackle this problem. Try asking on the BioPerl mailing list - I'm sure it can be done. Peter From cjfields at illinois.edu Thu Jan 6 07:20:29 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Jan 2011 06:20:29 -0600 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> On Jan 6, 2011, at 4:34 AM, Peter wrote: > On Thu, Jan 6, 2011 at 5:19 AM, ?? wrote: >> >> It seems that Bioperl don't have such scripts which can convert genbank >> files to .fna/.ffn/.faa etc fasta format, isn't it ? >> Bio::SeqIO seems can't tackle this problem. > > Try asking on the BioPerl mailing list - I'm sure it can be done. > > Peter See the BioPerl SeqIO HOWTO for this: http://www.bioperl.org/wiki/HOWTO:SeqIO Basically: # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { $seq_out->write_seq($inseq); } You may have to configure the sequence display ID and description to suit your needs. chris From biopython at maubp.freeserve.co.uk Thu Jan 6 07:36:26 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Jan 2011 12:36:26 +0000 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> References: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> Message-ID: Hi Chris & ??, I've CC'd the BioPerl mailing list (this started on the BioSQL list). 2011/1/6 Chris Fields : > See the BioPerl SeqIO HOWTO for this: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > Basically: > > ? ?# create one SeqIO object to read in,and another to write out > ? ?my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-format' => $infileformat); > ? ?my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'-format' => $outfileformat); > > ? ?# write each entry in the input file to the output file > ? ?while (my $inseq = $seq_in->next_seq) { > ? ? ? $seq_out->write_seq($inseq); > ? ?} > > You may have to configure the sequence display ID and description to suit your needs. > > chris Hi Chris, I think that just covers the easy case, getting one FASTA record per GenBank record (i.e. one FASTA sequence for the whole plasmid or chromosome), which is what the NCBI use *.fna for on their FTP site. What about the second part of this request, getting the gene sequences in FASTA as nucleotides (NCBI use *.ffn) and proteins/amino acids (NCBI use *.faa)? This would require looking at the gene/CDS features in the GenBank file (and again, rebuilding the exact sequence name the NCBI use in their FASTA files is hard). Peter P.S. There is a Biopython example of this here: http://www.warwick.ac.uk/go/peter_cock/python/genbank2fasta/ From xupeng86 at gmail.com Wed Jan 5 08:58:12 2011 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Wed, 5 Jan 2011 16:58:12 +0800 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? Message-ID: Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) , or can convert sequence in biosql to genbank files ? Many thanks! From biopython at maubp.freeserve.co.uk Wed Jan 5 09:39:08 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Jan 2011 09:39:08 +0000 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: On Wed, Jan 5, 2011 at 8:58 AM, ?? wrote: > > Is there any tools can convert a bacteria_accession number( hole genome) to > ffn format( gene multi fasta) , You can download *.ffn files from the NCBI's FTP site, e.g. ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ If you want most/all of the available genomes as ffn files, I would just download them all as a gzipped file: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ffn.tar.gz Alternatively, you can probably do this via the NCBI Entrez API. I've not tried through. My guess is you'd need to map the genome accession to a list of gene IDs (using ELink), then fetch them as FASTA entries (using EFetch). > or ?can convert sequence in biosql to genbank files ? > > Many thanks! If you have loaded the genomes into a BioSQL database (e.g. from the GenBank files), then you can easily get the genomes back again as SeqRecord objects, and save those as GenBank files. However, in order to get the nucleotide sequences of the genes you would have to use the SeqFeature objects and their extract method. Peter From biopython at maubp.freeserve.co.uk Wed Jan 5 09:40:54 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 5 Jan 2011 09:40:54 +0000 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: On Wed, Jan 5, 2011 at 9:39 AM, Peter wrote: > On Wed, Jan 5, 2011 at 8:58 AM, ?? wrote: >> >> Is there any tools can convert a bacteria_accession number( hole genome) to >> ffn format( gene multi fasta) , > > You can download *.ffn files from the NCBI's FTP site, e.g. > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ > > If you want most/all of the available genomes as ffn files, I would > just download them all as a gzipped file: > ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ffn.tar.gz > > Alternatively, you can probably do this via the NCBI Entrez API. > I've not tried through. My guess is you'd need to map the genome > accession to a list of gene IDs (using ELink), then fetch them > as FASTA entries (using EFetch). All of the above remarks would apply to BioPerl, Biopython, etc (and are not really relevant to the BioSQL mailing list). >> or ?can convert sequence in biosql to genbank files ? >> >> Many thanks! > > If you have loaded the genomes into a BioSQL database (e.g. > from the GenBank files), then you can easily get the genomes > back again as SeqRecord objects, and save those as GenBank > files. However, in order to get the nucleotide sequences of the > genes you would have to use the SeqFeature objects and their > extract method. The above applies if you are using BioSQL with Biopython. I would expect BioPerl etc to offer similar functionality. Peter From hlapp at drycafe.net Wed Jan 5 14:23:26 2011 From: hlapp at drycafe.net (Hilmar Lapp) Date: Wed, 5 Jan 2011 09:23:26 -0500 Subject: [BioSQL-l] Is there any tools can convert a bacteria_accession number( hole genome) to ffn format( gene multi fasta) ? In-Reply-To: References: Message-ID: <7EAD47CE-7279-4C99-A560-5ED02836BAED@drycafe.net> On Jan 5, 2011, at 4:40 AM, Peter wrote: > The above applies if you are using BioSQL with Biopython. > I would expect BioPerl etc to offer similar functionality. Indeed. Thanks for answering, Peter! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From xupeng86 at gmail.com Thu Jan 6 05:19:01 2011 From: xupeng86 at gmail.com (=?GB2312?B?0OzF8w==?=) Date: Thu, 6 Jan 2011 13:19:01 +0800 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: It seems that Bioperl don't have such scripts which can convert genbank files to .fna/.ffn/.faa etc fasta format, isn't it ? Bio::SeqIO seems can't tackle this problem. From biopython at maubp.freeserve.co.uk Thu Jan 6 10:34:51 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Jan 2011 10:34:51 +0000 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: On Thu, Jan 6, 2011 at 5:19 AM, ?? wrote: > > It seems that Bioperl don't have such scripts which can convert genbank > files to .fna/.ffn/.faa etc ?fasta format, isn't it ? > Bio::SeqIO seems can't tackle this problem. Try asking on the BioPerl mailing list - I'm sure it can be done. Peter From cjfields at illinois.edu Thu Jan 6 12:20:29 2011 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Jan 2011 06:20:29 -0600 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: References: Message-ID: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> On Jan 6, 2011, at 4:34 AM, Peter wrote: > On Thu, Jan 6, 2011 at 5:19 AM, ?? wrote: >> >> It seems that Bioperl don't have such scripts which can convert genbank >> files to .fna/.ffn/.faa etc fasta format, isn't it ? >> Bio::SeqIO seems can't tackle this problem. > > Try asking on the BioPerl mailing list - I'm sure it can be done. > > Peter See the BioPerl SeqIO HOWTO for this: http://www.bioperl.org/wiki/HOWTO:SeqIO Basically: # create one SeqIO object to read in,and another to write out my $seq_in = Bio::SeqIO->new('-file' => "<$infile", '-format' => $infileformat); my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", '-format' => $outfileformat); # write each entry in the input file to the output file while (my $inseq = $seq_in->next_seq) { $seq_out->write_seq($inseq); } You may have to configure the sequence display ID and description to suit your needs. chris From biopython at maubp.freeserve.co.uk Thu Jan 6 12:36:26 2011 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Jan 2011 12:36:26 +0000 Subject: [BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1 In-Reply-To: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> References: <408F810D-41D4-4569-BE16-9E4DD0B27FAC@illinois.edu> Message-ID: Hi Chris & ??, I've CC'd the BioPerl mailing list (this started on the BioSQL list). 2011/1/6 Chris Fields : > See the BioPerl SeqIO HOWTO for this: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > Basically: > > ? ?# create one SeqIO object to read in,and another to write out > ? ?my $seq_in = Bio::SeqIO->new('-file' => "<$infile", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '-format' => $infileformat); > ? ?my $seq_out = Bio::SeqIO->new('-file' => ">$outfile", > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'-format' => $outfileformat); > > ? ?# write each entry in the input file to the output file > ? ?while (my $inseq = $seq_in->next_seq) { > ? ? ? $seq_out->write_seq($inseq); > ? ?} > > You may have to configure the sequence display ID and description to suit your needs. > > chris Hi Chris, I think that just covers the easy case, getting one FASTA record per GenBank record (i.e. one FASTA sequence for the whole plasmid or chromosome), which is what the NCBI use *.fna for on their FTP site. What about the second part of this request, getting the gene sequences in FASTA as nucleotides (NCBI use *.ffn) and proteins/amino acids (NCBI use *.faa)? This would require looking at the gene/CDS features in the GenBank file (and again, rebuilding the exact sequence name the NCBI use in their FASTA files is hard). Peter P.S. There is a Biopython example of this here: http://www.warwick.ac.uk/go/peter_cock/python/genbank2fasta/