[Biopython] pd.df.to_csv(..., compression="bgzip")?

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 13 07:18:09 EDT 2024


Ah. I would give it a file handle then:

with bgzf.open("example.txt.bgz", "w") as bgzf_handle:
    my_data_frame.to_csv(bgzf_handle, ...)

I would expect that to work according to
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
- possibly with an explicit compression=None added?

Peter


On Wed, Mar 13, 2024 at 11:08 AM Dan Bolser <dan.bolser at outsee.co.uk> wrote:
>
> pandas.to_csv is the function that writes data. pandas.read_csv silently handles decompression as needed.
>
>
>
> On Wed, Mar 13, 2024, 10:49 AM Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> Yes. BGZF is just a special kind of GZIP file, if all you are doing is
>> decompressing it for reading it then the standard gzip.open(...)
>> is fine.
>>
>> Peter
>>
>>
>> On Wed, Mar 13, 2024 at 10:03 AM Dan Bolser <dan.bolser at outsee.co.uk> wrote:
>>>
>>> bgzip is a 'bio' thing, so thought I'd ask here. It's perhaps not 'biopython', but it's bio/python.
>>>
>>> On Tue, 12 Mar 2024 at 19:11, Sean Brimer <skbrimer at gmail.com> wrote:
>>>>
>>>> Hi Dan,
>>>>
>>>> This feels more like a panda's issue than a biopython issue. That said, I think you could just use gzip. I think. bgzip for samtools was built on top of gzip so it probably decompresses in a similar way.
>>>>
>>>> On Tue, Mar 12, 2024 at 12:52 PM Dan Bolser <dan.bolser at outsee.co.uk> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I can pass `compression="gzip"` to pandas.DataFrame.to_csv, but not bgzip... how to update pandas to support bgzip?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> _______________________________________________
>>>>> Biopython mailing list  -  Biopython at biopython.org
>>>>> https://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at biopython.org
>>> https://mailman.open-bio.org/mailman/listinfo/biopython


More information about the Biopython mailing list