Rust in Data Science: From Underdog to Ecosystem Builder

Dec 1, 2025

Our Journey: Why We Shifted to Rust

When we first decided to look beyond R and Python for data science, it wasn’t because we were chasing performance benchmarks or intrigued by language design. It was something more mundane — deployment pain.

Our product had to run across multiple machines: some hosted by us, others deployed on client infrastructure. Most of our code was written in R, and that worked fine in development. But in production, things quickly turned into a maintenance nightmare:

  • Adding or updating a library meant manually replicating the change across every machine.
  • Security policies sometimes forced us to upgrade R itself, and that kicked off the same tedious process.
  • A small team like ours couldn’t afford to spend cycles babysitting environments.

We realized the problem wasn’t R as a language — it was the nature of interpreted, environment-dependent deployments. What we needed was a language that compiled to a binary. Something we could build once, ship once, and know it would run the same way everywhere.

That’s what led us to Rust.

The immediate win was ease of deployment. No more fiddling with library versions, no more dependency hell across machines. We could just ship a binary and move on. At first, that was all we wanted. But once we settled into Rust, we began to discover something unexpected: a growing ecosystem for data science.


The Rust Advantage

What started as a deployment fix for our data science infrastructure turned into a realization: Rust also solved deeper problems that had frustrated us in the past.

🚀 Speed: Compiled and optimized, Rust avoids the overhead of interpreted execution.

🛡️ Safety: Strict compile-time rules around ownership and memory prevent entire classes of bugs.

⚙️ Resource efficiency: Memory discipline allows more efficient parallelism.

🔗 Interoperability: Bindings to Python and WebAssembly make Rust usable alongside existing tools.

The resource efficiency point was especially striking. On paper, a server with 16 cores and 64 GB of RAM should run 10–12 parallel threads easily. In practice, with interpreted languages, each thread eats up so much memory that you hit the limit quickly, fall into swap, and performance craters.

In on-prem B2B deployments, this is a dealbreaker. For clients, the cost of software isn’t just the license — it’s also the hardware they need to provision. If your software is inefficient, their costs balloon.

Rust’s memory ownership model flipped this for us:

  • Any single thread runs faster because the code is compiled.
  • We could run more threads concurrently without exhausting memory.

The result wasn’t just incremental speed — it was both faster individual threads and greater effective parallelism. In practice, that meant happier clients, lower infrastructure costs, and systems that actually scaled the way we had hoped.


Discovering the Ecosystem

Initially, Rust programming was just a deployment solution for our data pipelines. But as we used it more, we realized we didn’t have to give up our workflows around tabular data, machine learning, and databases. In fact, the Rust ecosystem was offering tools that were not only “good enough,” but in some cases better than what we had before.

📊 Polars — DataFrames reimagined

What it is: A DataFrame library written in Rust.

Why it matters: Blazing fast, memory efficient, with support for lazy evaluation and Arrow integration.

Our take: For us, this was the bridge. Coming from R’s data.table, Polars felt powerful, expressive, and much more robust in production.

🤖 Linfa & the ML ecosystem

What it is: A Rust ML toolkit inspired by scikit-learn.

Why it matters: Covers classical ML algorithms — regression, clustering, classification — with a clean API.

Our take: It gave us confidence that we could cover 80% of “bread-and-butter ML” without leaving Rust.

Other ML crates worth watching:

  • SmartCore — another Rust ML library, similar in scope to Linfa.
  • Burn — a PyTorch-like deep learning framework in Rust.
  • Candle (by Hugging Face) — designed for efficient training and inference, with real industry backing.

🔗 Connectors and integration crates

Real-world data science doesn’t live in CSVs — it lives in databases, warehouses, and streams. Rust’s ecosystem is strong here too:

  • SQLx — async, compile-time checked queries for Postgres, MySQL, and SQLite.
  • tokio-postgres / mysql_async — lower-level async drivers.
  • arrow-flight — high-speed data transfer between systems.
  • Arrow2: In-memory columnar data format, the backbone of many Rust data tools.

For us, these crates meant we could keep Rust at the heart of the workflow, without falling back to Python or R for integration.


Community & Momentum

The most surprising part has been the community momentum.

  • Polars is already mainstream in Python and Rust circles.
  • Linfa and SmartCore are bringing ML parity closer.
  • Burn and Candle prove Rust is serious about deep learning.
  • SQLx and Arrow2 make Rust a first-class citizen in data pipelines.

It reminded us of Python in its early 2000s surge — still rough at the edges, but with energy and direction.


Conclusion

Our path to Rust didn’t start with curiosity about a new language. It started with something painfully practical: we needed a better way to deploy. But once we made the switch, we discovered that Rust wasn’t just a solution for deployment — it was an ecosystem taking shape, with the potential to rival what Python and R built over decades.

Rust for data science isn’t going to replace Python or R tomorrow. But the qualities that brought us to it — ease of deployment, speed, safety, and memory efficiency — are the same ones that make its ecosystem compelling.


You Might Also Like

Explore more insights from the Solus team.

Still not sure which plan fits you?

Talk to our team to customize a plan that aligns with your data, industry, and growth goals.