Being a linear sequence generator, the
numpy.arange() function is used to generate a sequence of numbers in linear space with a uniform step size.
This is similar to another function, numpy.linspace(), which also generates a linear sequence with a uniform step size.
Let’s understand how we can use this function to generate different sequences.
array = numpy.arange(start, stop, step, dtype=None)
start-> The starting point (included) of the range, which is set to 0 by default.
stop-> The ending point (excluded) of the range
step-> The step size of the sequence, which is set to 1 by default. This can be any real number except Zero.
dtype-> The type of the output array. If
dtypeis not given (or provided as
None), the datatype will be inferred from the type of other input arguments.
Let us take a simple example to understand this:
import numpy as np a = np.arange(0.02, 2, 0.1, None) print('Linear Sequence from 0.02 to 2:', a) print('Length:', len(a))
This will generate a linear sequence from 0.2 (included) until 2 (excluded) with a step size of 0.1, so there will be (2 – 0.2)/0.1 – 1 = 20 elements in the sequence, which is the length of the resulting numpy array.
Linear Sequence from 0.02 to 2: [0.02 0.12 0.22 0.32 0.42 0.52 0.62 0.72 0.82 0.92 1.02 1.12 1.22 1.32 1.42 1.52 1.62 1.72 1.82 1.92] Length: 20
Here is another line of code which generates the numbers from 0 to 9 using
arange(), using the default step size of 1:
>>> np.arange(0, 10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
If the step size is provided as 0, this is not a valid sequence, since a step of 0 implies you are dividing the range by 0, which will raise a
import numpy as np # Invalid Step Size! a = np.arange(0, 10, 0)
ZeroDivisionError: division by zero
NOTE: This function is a bit different from
numpy.linspace(), which, by default, includes both the starting and the endpoints for the sequence calculation. It also does not take the step size as an argument, but rather takes only the number of elements in the sequence.
A simple example
Let’s now put all of this together into a simple example to demonstrate the linearity of the sequences generated by
The following code plots 2 linear sequences between
[0, 20] and
[0, 10] using
numpy.arange() to show that there is uniformity generated by the sequence, so the resulting arrays are linear.
import numpy as np import matplotlib.pyplot as plt y = np.zeros(5) # Construct two linear sequences # First one has a step size of 4 units x1 = np.arange(0, 20, 4) # Second one has a step size of 2 units x2 = np.arange(0, 10, 2) # Plot (x1, [0, 0, ..]) plt.plot(x1, y, 'o') # Plot (x2, [0.5, 0.5, ..]) plt.plot(x2, y + 0.5, 'o') # Set limit for y on the plot plt.ylim([-0.5, 1]) plt.show()
As you can see, the orange dots represent a linear sequence from 0 to 10 having a step size of 2 units, but since 10 is not included, the sequence is
[0, 2, 4, 6, 8]. Similarly, the blue dots represent the sequence
[0, 4, 8, 12, 16].
numpy.arange() vs range()
The whole point of using the
numpy module is to ensure that the operations that we perform are done as quickly as possible, since
numpy is a Python interface to lower level C++ code.
Many operations in
numpy are vectorized, meaning that operations occur in parallel when
numpy is used to perform any mathematical operation. Due to this, for large arrays and sequences,
numpy produces the best performance.
numpy.arange() is much faster than Python’s native
range() function for generating similar linear sequences.
We should not interleave
numpy‘s vectorized operation along with a Python loop. This slows down performance drastically, as the code is iterating using native Python.
For example, the below snippet shows how you should NOT use numpy.
for i in np.arange(100): pass
The recommended way is to directly use the
Let’s test the difference in performance using Python’s
import timeit import numpy as np # For smaller arrays print('Array size: 1000') # Time the average among 10000 iterations print('range():', timeit.timeit('for i in range(1000): pass', number=10000)) print('np.arange():', timeit.timeit('np.arange(1000)', number=10000, setup='import numpy as np')) # For large arrays print('Array size: 1000000') # Time the average among 10 iterations print('range():', timeit.timeit('for i in range(1000000): pass', number=10)) print('np.arange():', timeit.timeit('np.arange(1000000)', number=10, setup='import numpy as np'))
Array size: 1000 range(): 0.18827421900095942 np.arange(): 0.015803234000486555 Array size: 1000000 range(): 0.22560399899884942 np.arange(): 0.011916546000065864
As you can see,
numpy.arange() works particularly well for large sequences. It’s almost 20 times (!!) as fast as the normal Python code for a size of just 1000000, which will only scale better for larger arrays.
numpy.arange() should be the unanimous choice among programmers when working with larger arrays.
For smaller arrays, when the difference in performance isn’t that much, you could use among either of the two methods.