[Biopython-dev] Preparing for Biopython 1.50 (beta)

Iddo Friedberg idoerg at gmail.com
Mon Mar 16 20:49:39 EDT 2009


I have.

For one thing, GenBank has some new files that break the current parser.

LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007


This is a typical header for an environmental sequence (notice the ENV).
Note taht this does not necessarily have to be a next-gen sequence. It
can also be Sanger. The point is, it's not genome associated, but
obtained using metagenomic methods

To our business: the "rc" breaks the parser.





The file itself is attahed. Note that in the end iit does not have a
sequence, but rather a WGS field that points to sequence files.

I'll actually be happy to take this one.

./I




On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote:
> I've got some 454 and Solid data you could test it on too.
> 
> Has anybody else looked into how these other two Next Gen formats might 
> complicate things?
> 
> Brad Chapman wrote:
> > Peter;
> >
> >   
> >> I think we should probably do another release soon 
> >>     
> >
> > Good call. +1 from me.
> >
> >   
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first.  Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>     
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >   
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
-- 
Iddo Friedberg, Ph.D.
CALIT2 Atkinson Hall MC #0446
University of California San Diego
9500 Gilman Drive
La Jolla, CA 92093-0446 USA
+1 (858) 534-0570
http://iddo-friedberg.org
-------------- next part --------------
LOCUS       ABDH01000000           55108 rc    DNA     linear   ENV 26-NOV-2007
DEFINITION  Termite gut metagenome, whole genome shotgun sequencing project.
ACCESSION   ABDH00000000
VERSION     ABDH00000000.1  GI:161074815
PROJECT     GenomeProject:19107
DBLINK      Project:19107
KEYWORDS    WGS.
SOURCE      termite gut metagenome
  ORGANISM  termite gut metagenome
            unclassified sequences; metagenomes; organismal metagenomes.
REFERENCE   1  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C.,
            Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M.,
            Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D.,
            Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C.,
            Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C.,
            Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C.,
            Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and
            Leadbetter,J.R.
  TITLE     Metagenomic and functional analysis of hindgut microbiota of a
            wood-feeding higher termite
  JOURNAL   Nature 450 (7169), 560-565 (2007)
   PUBMED   18033299
REFERENCE   2  (bases 1 to 55108)
  AUTHORS   Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
            Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G.,
            Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H.,
            Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E.,
            Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N.,
            Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M.,
            Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.,
            Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P.
            and Leadbetter,J.R.
  TITLE     Direct Submission
  JOURNAL   Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint
            Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA
            94598-1698, USA
COMMENT     The termite gut metagenome whole genome shotgun (WGS) project has
            the project accession ABDH00000000.  This version of the project
            (01) has the accession number ABDH01000000, and consists of
            sequences ABDH01000001-ABDH01055108.
            URL -- http://www.jgi.doe.gov
            JGI Project ID:4001605
            Contact: Philip Hugenholtz (PHugenholtz at lbl.gov)
            sampling site latitude: N10.11.260; sampling site longitude:
            W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen
            content; host species: Nasutitermes sp.; anatomic site: gut,
            proctodeal segment 3, lumen; association type: symbiosis; sample
            treatment and preservation: termites were collected, transported to
            laboratory alive within 36 hours, P3 gut lumen fluid was extracted
            and stored frozen in buffered saline solution until DNA extraction.
            The JGI and collaborators endorse the principles for the
            distribution and use of large scale sequencing data adopted by the
            larger genome sequencing community and urge users of this data to
            follow them. It is our intention to publish the work of this
            project in a timely fashion and we welcome collaborative
            interaction on the project and analysis.
            (http://www.genome.gov/page.cfm?pageID=10506376).
FEATURES             Location/Qualifiers
     source          1..55108
                     /organism="termite gut metagenome"
                     /mol_type="genomic DNA"
                     /isolation_source="Nasutitermes sp. proctodeal segment 3
                     gut lumen"
                     /db_xref="taxon:433724"
                     /environmental_sample
                     /country="Costa Rica"
                     /lat_lon="10.1877 N 83.8558 W"
                     /note="metagenomic"
WGS         ABDH01000001-ABDH01055108
//


More information about the Biopython-dev mailing list