[Biopython-dev] Preparing for Biopython 1.50 (beta)
Iddo Friedberg
idoerg at gmail.com
Mon Mar 16 20:49:39 EDT 2009
I have.
For one thing, GenBank has some new files that break the current parser.
LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007
This is a typical header for an environmental sequence (notice the ENV).
Note taht this does not necessarily have to be a next-gen sequence. It
can also be Sanger. The point is, it's not genome associated, but
obtained using metagenomic methods
To our business: the "rc" breaks the parser.
The file itself is attahed. Note that in the end iit does not have a
sequence, but rather a WGS field that points to sequence files.
I'll actually be happy to take this one.
./I
On Mon, 2009-03-16 at 16:15 -0700, David Schruth wrote:
> I've got some 454 and Solid data you could test it on too.
>
> Has anybody else looked into how these other two Next Gen formats might
> complicate things?
>
> Brad Chapman wrote:
> > Peter;
> >
> >
> >> I think we should probably do another release soon
> >>
> >
> > Good call. +1 from me.
> >
> >
> >> I'd like to include the following changes as part of the beta, but it
> >> would be sensible to have someone else try these out first. Any
> >> volunteers?
> >>
> >> Bug 2767 - Bio.SeqIO support for FASTQ and QUAL files
> >>
> >
> > The code for this looked good when I reviewed it earlier. I will
> > test it out with some solexa reads from here this week; any reason
> > not to check the patch and files into CVS? Then I can fire up my
> > coal-powered revision control system, feed two punch cards into the
> > mouth of the machine, hope the vacuum tubes don't burn out again,
> > and check it out locally.
> >
> > Brad
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biopython-dev
> >
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Iddo Friedberg, Ph.D.
CALIT2 Atkinson Hall MC #0446
University of California San Diego
9500 Gilman Drive
La Jolla, CA 92093-0446 USA
+1 (858) 534-0570
http://iddo-friedberg.org
-------------- next part --------------
LOCUS ABDH01000000 55108 rc DNA linear ENV 26-NOV-2007
DEFINITION Termite gut metagenome, whole genome shotgun sequencing project.
ACCESSION ABDH00000000
VERSION ABDH00000000.1 GI:161074815
PROJECT GenomeProject:19107
DBLINK Project:19107
KEYWORDS WGS.
SOURCE termite gut metagenome
ORGANISM termite gut metagenome
unclassified sequences; metagenomes; organismal metagenomes.
REFERENCE 1 (bases 1 to 55108)
AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
Richardson,T.H., Stege,J.T., Cayouette,M., McHardy,A.C.,
Djordjevic,G., Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M.,
Martin,H.G., Kunin,V., Dalevi,D., Madejska,J., Kirton,E., Platt,D.,
Szeto,E., Salamov,A., Barry,K., Mikhailova,N., Kyrpides,N.C.,
Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M., Murillo,C.,
Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.D., Chang,C.,
Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P. and
Leadbetter,J.R.
TITLE Metagenomic and functional analysis of hindgut microbiota of a
wood-feeding higher termite
JOURNAL Nature 450 (7169), 560-565 (2007)
PUBMED 18033299
REFERENCE 2 (bases 1 to 55108)
AUTHORS Warnecke,F., Luginbuhl,P., Ivanova,N., Ghassemian,M.,
Richardson,T.H., Stege,J.T., Cayouette,M., Djordjevic,G.,
Aboushadi,N., Sorek,R., Tringe,S.G., Podar,M., Garcia Martin,H.,
Kunin,V., McHardy,A.C., Dalevi,D., Madejska,J., Kirton,E.,
Platt,D., Szeto,E., Salamov,A., Barry,K., Mikhailova,N.,
Kyrpides,N., Matson,E.G., Ottesen,E.A., Zhang,X., Hernandez,M.,
Murillo,C., Acosta,L.G., Rigoutsos,I., Tamayo,G., Green,B.,
Chang,C., Rubin,E.M., Mathur,E.J., Robertson,D.E., Hugenholtz,P.
and Leadbetter,J.R.
TITLE Direct Submission
JOURNAL Submitted (27-JUN-2007) Microbial Ecology Program, US DOE Joint
Genome Institute, 2800 Mitchell Drive B100, Walnut Creek, CA
94598-1698, USA
COMMENT The termite gut metagenome whole genome shotgun (WGS) project has
the project accession ABDH00000000. This version of the project
(01) has the accession number ABDH01000000, and consists of
sequences ABDH01000001-ABDH01055108.
URL -- http://www.jgi.doe.gov
JGI Project ID:4001605
Contact: Philip Hugenholtz (PHugenholtz at lbl.gov)
sampling site latitude: N10.11.260; sampling site longitude:
W083.51.345; sampling site altitude: 310 m AMSL; sample type: lumen
content; host species: Nasutitermes sp.; anatomic site: gut,
proctodeal segment 3, lumen; association type: symbiosis; sample
treatment and preservation: termites were collected, transported to
laboratory alive within 36 hours, P3 gut lumen fluid was extracted
and stored frozen in buffered saline solution until DNA extraction.
The JGI and collaborators endorse the principles for the
distribution and use of large scale sequencing data adopted by the
larger genome sequencing community and urge users of this data to
follow them. It is our intention to publish the work of this
project in a timely fashion and we welcome collaborative
interaction on the project and analysis.
(http://www.genome.gov/page.cfm?pageID=10506376).
FEATURES Location/Qualifiers
source 1..55108
/organism="termite gut metagenome"
/mol_type="genomic DNA"
/isolation_source="Nasutitermes sp. proctodeal segment 3
gut lumen"
/db_xref="taxon:433724"
/environmental_sample
/country="Costa Rica"
/lat_lon="10.1877 N 83.8558 W"
/note="metagenomic"
WGS ABDH01000001-ABDH01055108
//
More information about the Biopython-dev
mailing list