[Biopython-dev] writing
Andrew Dalke
adalke at mindspring.com
Fri Dec 21 06:03:09 EST 2001
Only one more email after this! (And it's a summary.)
The opposite to reading is writing.
I want to make file conversion easy. Here's the example in Bioperl's
SeqIO perldoc:
$format1 = shift;
$format2 = shift || die "Usage: reformat format1 format2 < input >
output";
use Bio::SeqIO;
$in = Bio::SeqIO->newFh(-format => $format1 );
$out = Bio::SeqIO->newFh(-format => $format2 );
print $out $_ while <$in>;
It should be just as easy for Biopython -- even easier since we have
autodetection.
import sys
from Bio import SeqRecord
if sys.argv != 2:
sys.exit("Usage: reformat output_format < input > output")
writer = SeqRecord.make_writer(sys.argv[1])
for record in SeqRecord.readFile():
writer.write(record)
(Same number of lines, about the same number of characters, and
I could have done
map(SeqRecord.make_writer(sys.argv[1]).write, SeqRecord.readFile())
instead of the last three lines :)
Again, there needs to be some resolution system, to figure out the
output converter associated with a given format name. There's a twist
here that Bioperl doesn't capture - versions. People are going to
want the output in "swissprot" version and there may be support for
writing it in "swissprot/version=38" and "swissprot/version=39"
versions, so something needs to figure out that 39 is probably better
than 38 (or force the user to disambigute).
There are a few other things I haven't figured out here.
I make the writer with 'make_writer'. This is a function in the
SeqRecord module scope. It looks like this:
def make_writer(output_format = "fasta", outfile = sys.stdout):
...
The 'Writer' object created writes SeqRecord objects in the correct
format, on the given file handle. I am somewhat worried that finer
control may be needed, eg, for "minimal" vs. "complete" output
generation. I decided to defer worrying until there is more than one
output generator for a given format.
I am not sure that "write" is the appropriate method name. There's
something to be said for "append", since that's the opposite of
iteration. Ie
results = []
for x in data:
results.append(x)
has exactly the same functional form as
writer = make_writers()
for x in data:
writer.write(x)
It's also possible that some writers will return strings, rather than
write to a file, as in
convert = toString(output_format)
for x in data:
sys.stdout.write(convert(x))
In this case you can see that 'write' in Python traditionally
takes a string, not an object.
On the other hand, it isn't obvious that 'append' is how to write a
record, and nearly everyone will be writing them.
I'm still thinking about that "io" object, used like this
writer = SeqRecord.io.make_writer(sys.argv[1])
for record in SeqRecord.io.readFile():
writer.write(record)
That makes it easier to standardize the interface, since integration
is then a matter of:
io = StandardIOFramework(SeqRecord)
and 'io' can have
io.register_reader(format, builder)
io.register_writer(format, writer)
builder = io.resolve_reader(format)
writer = io.resolve_writer(format)
for record in io.readFile(open("something.txt")):
...
for record in io.readString("SFSDFSDFSDF"):
...
Andrew
dalke at dalkescientific.com
More information about the Biopython-dev
mailing list