API Reference
PagedList Class
Professional disk-backed list implementation for large-scale data processing.
PagedList provides a high-performance, memory-efficient alternative to Python lists for handling datasets that exceed available system memory. The implementation automatically manages data persistence through intelligent chunking and disk-backed storage, maintaining list-like interface semantics while providing enterprise-grade scalability for data processing workflows.
- Features:
Transparent disk-backed storage with automatic memory management
Configurable chunking strategies for optimal performance
Parallel processing capabilities for data transformations
Enterprise-ready error handling and resource cleanup
Type-safe operations with comprehensive validation
- class paged_list.paged_list.PagedList(chunk_size: int = 50000, disk_path: str = 'data', auto_cleanup: bool = True)[source]
Bases:
objectA disk-backed list-like object that stores most of its data on disk.
When the list gets too large, data is chunked into .pkl files in the data/ directory. When retrieving slices, only relevant chunks are loaded into memory.
- combine_chunks() List[Dict[str, Any]][source]
Loads and combines all chunks into a single list (if needed).
- extend(iterable: Iterable[Dict[str, Any]]) None[source]
Extends the list by adding multiple items at once.
- index(value: Dict[str, Any], start: int = 0, stop: int | None = None) int[source]
Return index of first occurrence of value.
- map(func: Callable[[Dict[str, Any]], Dict[str, Any]], max_workers: int | None = None) None[source]
Processes records in chunks using the provided function with multiple threads.
- Parameters:
func (callable) – Function to apply to each record in the chunk.
max_workers (int, optional) – The maximum number of threads to use. Defaults to None, which means the number of threads will be determined by the system.
- Returns:
None (Modifies the records in-place)
- reverse() None[source]
Reverse the list in place.
Warning: This operation loads all data into memory and may be slow for large lists.
- serialize(max_workers: int | None = None) None[source]
Serializes the list of dictionaries by converting certain types to JSON strings and updates the underlying files using multiple threads.
Main Module
Command-line interface and demonstration utilities for paged-list.
This module provides entry points for running demonstrations and examples of the professional disk-backed list implementation.