WTF Memory - Exploring Memory Usage

Overview

An educational repository to learn about memory usage in data libraries using the memray profiler. It includes examples of loading and processing data with libraries like DuckDB, Apache Arrow, and Delta Lake, as well as the data interchange between them.

This project started as an attempt to improve dbt-duckdb integration with Delta tables by understanding memory consumption while interchanging data between DuckDB and the Delta writer via Arrow format. A nice visualization of memory allocation while transferring data between DuckDB and Delta Lake via Arrow format can be found here.

Learnings

dbt just sends SQL commands to the database, and the queries are not always optimized for memory efficiency
Arrow is very useful, and DuckDB has to provide it as an iterator to the caller because the buffer manager can exchange memory blocks and this is why fixed pointers don’t work
Changes to an open source project have to be small and incremental to be accepted. I invested a lot of time but was unable to ensure backward compatibility and therefore my PR was not accepted
Memray is excellent, and its graph representation helps visualize memory usage and understand what you’re doing wrong in your program
Understanding ram memory usage is important for performance of a data application

Doing some baby unscientific experiments with different code flows while exporting for the dbt-duckdb
We are particularly interested in the difference between b (current flow) and c (potentially a new flow) #duckdb pic.twitter.com/0W0HG52lKQ
— Aleksandar Milicevic (@milicevica23) March 4, 2024

Playing around with memray and delta-rs and i am not sure if i understand what i see
1. Why is .arrow() not stable but it takes around 1 GB?
2. What is happening with resident memory?
3. if we give arrow format to delta writer it copies data again? pic.twitter.com/fg38REN8fw
— Aleksandar Milicevic (@milicevica23) February 25, 2024

No results found

WTF Memory - Exploring Memory Usage

Overview

Learnings

Related tweets