[BioSQL-l] BioSQL-l Digest, Vol 79, Issue 1
Peter
biopython at maubp.freeserve.co.uk
Thu Jan 6 12:36:26 UTC 2011
Hi Chris & 徐朋,
I've CC'd the BioPerl mailing list (this started on the BioSQL list).
2011/1/6 Chris Fields <cjfields at illinois.edu>:
> See the BioPerl SeqIO HOWTO for this:
>
> http://www.bioperl.org/wiki/HOWTO:SeqIO
>
> Basically:
>
> # create one SeqIO object to read in,and another to write out
> my $seq_in = Bio::SeqIO->new('-file' => "<$infile",
> '-format' => $infileformat);
> my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
> '-format' => $outfileformat);
>
> # write each entry in the input file to the output file
> while (my $inseq = $seq_in->next_seq) {
> $seq_out->write_seq($inseq);
> }
>
> You may have to configure the sequence display ID and description to suit your needs.
>
> chris
Hi Chris,
I think that just covers the easy case, getting one FASTA record per
GenBank record (i.e. one FASTA sequence for the whole plasmid or
chromosome), which is what the NCBI use *.fna for on their FTP site.
What about the second part of this request, getting the gene sequences
in FASTA as nucleotides (NCBI use *.ffn) and proteins/amino acids
(NCBI use *.faa)? This would require looking at the gene/CDS features
in the GenBank file (and again, rebuilding the exact sequence name the
NCBI use in their FASTA files is hard).
Peter
P.S. There is a Biopython example of this here:
http://www.warwick.ac.uk/go/peter_cock/python/genbank2fasta/
More information about the BioSQL-l
mailing list