Implementing sparse_range: From a Python Discuss Idea to a Sieve Stress Test
I found an interesting thread over on the Python Discuss forum titled "Possibility to exclude ranges from range". The initial discussion revolved around a common (?), developer need of : how to cleanly skip specific blocks of ints in a range of ints without writing clunky, nested if/continue logic, and without doing something memory-heavy like casting everything into a massive set.
The consensus was that Python's native range is beautiful because it’s a lightweight, memory-efficient arithmetic engine. It doesn't store numbers in RAM; it just calculates them on the fly. But the moment you need to punch holes in it—say, "give me all numbers from 1 to 1000, except for multiples of 5, and except for the block from 200 to 300"—you lose that structural elegance.
This inspired me to build a clean, production-grade abstraction to solve this kind of thng: sparse_range.
The Architectural Concept
The core idea began with a simple mathematical definition: a Sparse range instance is created by a set of (python) ranges S_in that hold possible output values and a set of ranges S_ex that exclude possible output values. A Sparse_range instance can be called with a start, stop, and step argument that generate trial output candidates that are checked against S_in and S_ex to see if they are allowed/filtered. An item is eligible for the final output if its value is present in at least one of our permitted baseline ranges S_in, unless that value happens to be intercepted by one of our forbidden exclusion ranges S_ex. On top of that structure, the outer sparse_range wrapper behaves like a native Python range—it accepts its own outer start, stop, and step constraints, matching the target window requested by the user.
To make the evaluation stateless and fast(?), the engine must evaluate values on the fly. However, native Python range objects are immutable and opaque; they don't natively maintain operational iteration states, nor do they know how to coordinate with one another.
To bridge this gap, the engine internally upgrades every raw Python range passed to collections S_in and S_ex into custom RangePointers.
Why We Need RangePointer and Direction Alignment
When the outer loop steps through values, the internal component ranges must follow along. If the outer loop is counting upwards (step > 0), all internal evaluation cursors must track upwards. If the outer loop reverses direction and counts downwards (step < 0), every single internal range must generate the same values, but in the reverse order, and without using up extra memory for storing values.
If they don't match directions, the leapfrogging logic breaks. A forward-running pointer cannot efficiently tell a backward-running outer loop where the next valid alignment boundary is without wasting CPU cycles scanning dead space or whatever.
Therefore, when a sparse_range instance is invoked, the engine inspects the outer step direction and enforces a unified traversal direction across every single internal S_in and S_ex sequence. If an internal range's native direction opposes the outer loop, it must be completely mathematically inverted.
The Range Reversal Formula
Reversing an arithmetic progression isn't as simple as swapping the start and stop boundaries. Because Python ranges stop before reaching the termination boundary, reversing a range requires recalculating its new alignment based on its exact step constraints.
To reverse a range mathematically, the engine computes:
New Step: The step sign is flipped (
-step).New Start: The final valid item generated by the original range becomes the new starting point. This is computed using the remainder of the length of the sequence:
new_start = start + (length - 1) × stepNew Stop: The original
startboundary shifts out by one step inverted to act as the non-inclusive terminal wall:new_stop = start - step
By normalizing all RangePointer instances to track in the exact same direction as the outer loop, the engine can efficiently step through candidate numbers, skipping blocks of excluded data in O(1) or O(K) time (where K is the number of active ranges), keeping the memory footprint at absolute zero.
The code
sparse_range.py
A run of included tests
Running Factory Pattern Limits Verification Suite (-1000 to +1000)...
✅ PASS: Standard positive crawl -> args: (0, 200, 2)
✅ PASS: Negative direction crawl -> args: (400, -400, -5)
✅ PASS: Large step slice jump -> args: (-1000, 1000, 25)
✅ PASS: Completely out of bounds window -> args: (600, 900, 1)
✅ PASS: Empty S_in Edge Case -> args: (0, 50, 1)
Test Summary: 5/5 passed successfully.
The Ultimate Stress Test: A Declarative Sieve
It's easy to write a basic range filter that skips a static block of numbers. But to prove that this RangePointer inversion and alignment math is completely airtight across dozens of concurrent boundaries, I decided to implement a completely declarative Sieve of Eratosthenes.
Instead of allocating a large mutable array of booleans in memory and procedurally striking out composites, we can use pure sequence logic:
The Permitted Range S_in: Our baseline pool of candidate of ints starting at 2:
[range(2, limit + 1)].The Exclusion ranges S_ex: A list of independent arithmetic progression ranges tracking the composites for every factor m up to √(limit).
Each exclusion range/stream starts at m × m (instead of m × 2). Any composite smaller than m × m has a prime factor smaller than m, meaning it has already been intercepted by an earlier stream. For example by the time m=5 runs, numbers like 10, 15, and 20 are already masked out by the ranges/streams for 2 and 3. The first unseen composite is 5 × 5 = 25.
Putting it to the Test
Although the module sparse_range.py has its own tests, to further verify the integrity of the design I put together a test module called sparse_range_sieve_test.py. It packages the sieve creation into an isolated factory function and validates the output against a traditional, trusted array sieve across both a forward sweep and a reversed backward pass.
The Verdict
When running forward, the generator's internal pointers march sequentially alongside the main consumer loop.
The true architectural challenge happens during the backward traversal pass (step = -1). When executing prime_sieve_factory(limit, 1, -1), the engine must manage 13 independent composite exclusion ranges simultaneously (for m = 2, 3, ..., 14). Each sub-range pointer must instantly compute its upper boundary, snap precisely to its maximum valid multiple less than or equal to 200, and step backward in perfect synchronization without dropping or skipping values.
The execution checks out perfectly:
Beginning Sparse Range Sieve Evaluation Suite (N = 200)...
----------------------------------------------------------------
👉 Running Forward Traversal Validation...
✅ PASS: Forward matching. Found 46 primes.
👉 Running Backward Traversal Validation...
✅ PASS: Backward matching. Reversed sequences match perfectly.
If you are following the Python Discuss thread, this demonstrates that handling arbitrary sequence exclusions cleanly and statelessly via pure index-pointer stream math isn't just an elegant idea—it's highly reliable even under heavy algorithmic stress.
Disclosure
I used Gemini AI to help in this. I was able to state algorithms and get Gemini to fill in with implementations, state changes and get Gemini to implement those too. I am used to creating algorithms without AI, this time I was able to state what I wanted in some detail and have Gemini add yet more detail. I could show python, pseudo-code, r textual descriptions as needed and get Gemini to fill in. The tests went from my description to code by Gemini - the sieve example test idea and spec, and debug was by me with Gemini improving my limit on m from m*2 to m*m for example before implementing as told.
No comments:
Post a Comment