Expand description
§Rustic Net
A high-performance, ergonomic, and extensible Machine Learning Accelerator (MLA) framework in Rust. Built for both research and production use with a focus on performance and developer experience.
§Key Features
- Tensor Operations: Efficient multi-dimensional array operations with support for various data types
- Cross-Device Support: Seamless CPU and GPU execution with a unified API
- Zero-Cost Abstractions: Leverages Rust’s type system and ownership model for optimal performance
- Thread-Safe: Designed for concurrent execution with minimal synchronization overhead
- Minimal Dependencies: Core functionality with minimal external dependencies
- FFI Ready: Easy integration with other languages through C-compatible interfaces
- SIMD Acceleration: Utilizes CPU vector instructions for maximum performance
- Parallel Processing: Multi-threaded execution for CPU-bound operations
§Quick Start
Add this to your Cargo.toml
:
[dependencies]
rustic_net = { version = "0.1", features = ["parallel"] }
Basic usage:
use rustic_net::tensor::{Tensor, Device};
use rustic_net::RusticNetInitTracing;
// Initialize logging (optional but recommended for debugging)
RusticNetInitTracing();
// Create and manipulate tensors
let t1 = Tensor::from_vec(vec![1.0, 2.0, 3.0], &[3], Device::default())?;
let t2 = t1.relu()?;
assert_eq!(t2.to_vec(), &vec![1.0, 2.0, 3.0]);
§Feature Flags
The following features can be enabled in your Cargo.toml
:
parallel
(enabled by default): Enables multi-threaded execution using Rayon for parallel processingcuda
: Enables CUDA support for GPU acceleration (requires CUDA toolkit)wasm
: Enables WebAssembly support for running in browsers and other WASM environmentssimd
: Enables SIMD acceleration for CPU operations (enabled by default on supported platforms)
§Performance Considerations
- For best performance, enable the
parallel
feature and ensure your tensors are large enough to benefit from parallel processing. - When working with CUDA, ensure your tensors are large enough to overcome the overhead of transferring data to and from the GPU.
- The library uses 32-bit floating-point numbers (
f32
) by default for optimal performance on most hardware.
§Error Handling
Most operations return a Result<T, String>
where errors are returned as human-readable strings.
It’s recommended to use the ?
operator for ergonomic error handling.
§Thread Safety
All public types in this crate are Send
and Sync
, making them safe to use across thread boundaries.
The library manages thread pools internally when the parallel
feature is enabled.
Re-exports§
Modules§
Macros§
- trace_
fn - A macro to automatically instrument functions with tracing spans.
- trace_
operation - Wraps an operation with start/end logging.
- trace_
tensor_ op - Instruments tensor operations with shape tracking.
Functions§
- Rustic
NetInit Tracing - Initializes the global tracing subscriber with production-appropriate defaults.
- Rustic
NetInit Tracing With