[BioPython] BioSQL.DBSeqRecord._get_keywords() / working with BioSQL

Brad Chapman chapmanb@arches.uga.edu
Wed, 29 May 2002 09:52:28 -0400


Hi Murple;

> What is your opinion, is biopython better then biojava or bioperl?
> Java would be as find as Python for me, Perl i don't speak and have no 
> intent to learn.

I think Jason answered this better than I could've. Personally, I think
biopython is best, but I'm a little biased :-).

> How many people are involved in biopython? What can I do to support?
> (I have good programming background but lack some understanding of the
> biology part. Trying to learn.)

We are always looking for people to help with coding. The biggest help
is always just to delve into the code, find something you think is
lacking, and add it.

> At the moment I think I don't know enough about bio* to write 
> documentation and also I'm not sure if I will stick with it. Just 
> evaluating if it fits my needs. But I agree that documentation is 
> lacking. Maybe I can help by asking wrong questions so you find out what 
> documention is missing.

Yup, I've been feeling very sheepish about the docs with everyone
tearing on 'em these past few days. I wish I had more time to work on
this, as I actually like writing documentation and think it's very
important.

I'm going to try to do some BioSQL documentation since there is so much
interest in it. Hopefully I'll be able to crank some out rather quickly.

> Here is another of these questions: While parsing a GenBank entry, is 
> there any information loss? Would it be possible to "unparse" a 
> SeqRecord back into a flatfile? Is there already code for this in 
> biopython? If not and I want to write such think, where are the best 
> places to start?

There is definitely information loss parsing into a SeqRecord object.
Right now the biopython code is very good at representing the
information in feature tables of GenBank files, but not so good at
representing the "meta-information" in these files (ie. keywords,
organism stuff, references, comments, etc). We definitely could use some
general classes to represent this sort of information.

With regards to writing things back out in various formats, this is
something that Andrew is working on with his BioFormats stuff. The
rudiments of this are in Biopython right now and are starting to get
going, but I don't think it's finished. Andrew can probably tell you
more about the current status, but the ideas are there for a nice
system.

If you need to write GenBank out, the GenBank specific Record class can
be output into a GenBank format, so you can parse and unparse GenBank
like:

from Bio import GenBank
parser = GenBank.RecordParser()
record = parser.parse(open("your_file.gb"))
print record

I wrote this a while back based on a request -- personally I never spend
much time writing anything out but FASTA files so I don't use it too
much.

> By the way: all your emails were doubled. Is this a problem whith the 
> mailing list or your email setup?

Nah, I was just copying to you since I wasn't sure if you were on the
list. I'll stop doing that!

Brad
-- 
PGP public key available from http://pgp.mit.edu/