[Biopython-dev] [Bug 2819] New: Bio.SeqIO support for NCBI protein tables (*.ptt files)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Wed Apr 22 12:14:47 EDT 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2819

           Summary: Bio.SeqIO support for NCBI protein tables (*.ptt files)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


On their FTP site the NCBI provide a range of files for each
genome/plasmid/chromosome, e.g.
ftp://ftp.ncbi.nih.gov/genomes/Protozoa/Cryptosporidium_parvum/

The *.ptt files are simple tab separated tables listing all the proteins.  They
correspond to the CDS features in the GenBank file.

This enhancement bug is about adding "ptt" as an input file format in Bio.SeqIO
(and potentially as an output format too), where a single ptt file gives a
single SeqRecord object containing a SeqFeature object for each protein.  The
header line gives the sequence length, so an UnknownSeq can be used for the
SeqRecrd's seq property.

One example application of this would be to draw a GenomeDiagram showing the
protein locations.  This can be done using the SeqFeature objects from parsing
a GenBank file, but using the ptt file will be much faster.

See earlier suggestions on the mailing list (part of the GFF thread):
http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005725.html
http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005745.html

Patch to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list