Skip to content

Update kaggle datasets scripts with polars

Sarah Cocher requested to merge update-kaggle-with-polars into main

What does this MR do and why

[NOT URGENT, MINOR UPDATE]

References

Since MR !26 (merged) introduces polars in project pyproject.toml, it's best to align script to "download datasets from kaggle" to use polars instead of pandas, in order to avoid duplicate dependancies.

I used this opportunity to update kagglehub package version, which made me remove the workaround around .csv as zipfiles which I implemented since they fixed this issue in newer version (due to an issue I created in their repo ;) )

How to set up and validate locally

poetry install -E kaggle_datasets
poetry run python scripts/download_kaggle_datasets.py
Edited by Sarah Cocher

Merge request reports

Loading