[Biopython-dev] [Bug 2819] New: Bio.SeqIO support for NCBI protein tables (*.ptt files)
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Wed Apr 22 12:14:47 EDT 2009
http://bugzilla.open-bio.org/show_bug.cgi?id=2819
Summary: Bio.SeqIO support for NCBI protein tables (*.ptt files)
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
On their FTP site the NCBI provide a range of files for each
genome/plasmid/chromosome, e.g.
ftp://ftp.ncbi.nih.gov/genomes/Protozoa/Cryptosporidium_parvum/
The *.ptt files are simple tab separated tables listing all the proteins. They
correspond to the CDS features in the GenBank file.
This enhancement bug is about adding "ptt" as an input file format in Bio.SeqIO
(and potentially as an output format too), where a single ptt file gives a
single SeqRecord object containing a SeqFeature object for each protein. The
header line gives the sequence length, so an UnknownSeq can be used for the
SeqRecrd's seq property.
One example application of this would be to draw a GenomeDiagram showing the
protein locations. This can be done using the SeqFeature objects from parsing
a GenBank file, but using the ptt file will be much faster.
See earlier suggestions on the mailing list (part of the GFF thread):
http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005725.html
http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005745.html
Patch to follow...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list