[Biopython] pd.df.to_csv(..., compression="bgzip")?

Dan Bolser dan.bolser at outsee.co.uk
Thu Mar 14 18:06:34 EDT 2024


For  the record, I just used pysam.tabix_compress
https://pysam.readthedocs.io/en/latest/api.html#pysam.tabix_compress

On Wed, 13 Mar 2024 at 11:22, Dan Bolser <dan.bolser at outsee.co.uk> wrote:

> Nice idea, I would never have thought of that.
>
> Thanks Peter!
>
> On Wed, Mar 13, 2024, 11:18 AM Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>
>> Ah. I would give it a file handle then:
>>
>> with bgzf.open("example.txt.bgz", "w") as bgzf_handle:
>>     my_data_frame.to_csv(bgzf_handle, ...)
>>
>> I would expect that to work according to
>>
>> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
>> - possibly with an explicit compression=None added?
>>
>> Peter
>>
>>
>> On Wed, Mar 13, 2024 at 11:08 AM Dan Bolser <dan.bolser at outsee.co.uk>
>> wrote:
>> >
>> > pandas.to_csv is the function that writes data. pandas.read_csv
>> silently handles decompression as needed.
>> >
>> >
>> >
>> > On Wed, Mar 13, 2024, 10:49 AM Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>> >>
>> >> Yes. BGZF is just a special kind of GZIP file, if all you are doing is
>> >> decompressing it for reading it then the standard gzip.open(...)
>> >> is fine.
>> >>
>> >> Peter
>> >>
>> >>
>> >> On Wed, Mar 13, 2024 at 10:03 AM Dan Bolser <dan.bolser at outsee.co.uk>
>> wrote:
>> >>>
>> >>> bgzip is a 'bio' thing, so thought I'd ask here. It's perhaps not
>> 'biopython', but it's bio/python.
>> >>>
>> >>> On Tue, 12 Mar 2024 at 19:11, Sean Brimer <skbrimer at gmail.com> wrote:
>> >>>>
>> >>>> Hi Dan,
>> >>>>
>> >>>> This feels more like a panda's issue than a biopython issue. That
>> said, I think you could just use gzip. I think. bgzip for samtools was
>> built on top of gzip so it probably decompresses in a similar way.
>> >>>>
>> >>>> On Tue, Mar 12, 2024 at 12:52 PM Dan Bolser <dan.bolser at outsee.co.uk>
>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I can pass `compression="gzip"` to pandas.DataFrame.to_csv, but not
>> bgzip... how to update pandas to support bgzip?
>> >>>>>
>> >>>>>
>> >>>>> Thanks,
>> >>>>> _______________________________________________
>> >>>>> Biopython mailing list  -  Biopython at biopython.org
>> >>>>> https://mailman.open-bio.org/mailman/listinfo/biopython
>> >>>
>> >>> _______________________________________________
>> >>> Biopython mailing list  -  Biopython at biopython.org
>> >>> https://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20240314/70dddfe2/attachment.htm>


More information about the Biopython mailing list