Python – numpy.arange()

Filed Under: NumPy
Numpy Arange

Being a linear sequence generator, the numpy.arange() function is used to generate a sequence of numbers in linear space with a uniform step size.

This is similar to another function, numpy.linspace(), which also generates a linear sequence with a uniform step size.

Let’s understand how we can use this function to generate different sequences.


Syntax

Format:

array = numpy.arange(start, stop, step, dtype=None)

Here,

  • start -> The starting point (included) of the range, which is set to 0 by default.
  • stop -> The ending point (excluded) of the range
  • step -> The step size of the sequence, which is set to 1 by default. This can be any real number except Zero.
  • dtype -> The type of the output array. If dtype is not given (or provided as None), the datatype will be inferred from the type of other input arguments.

Let us take a simple example to understand this:

import numpy as np
 
a = np.arange(0.02, 2, 0.1, None)
 
print('Linear Sequence from 0.02 to 2:', a)
print('Length:', len(a))

This will generate a linear sequence from 0.2 (included) until 2 (excluded) with a step size of 0.1, so there will be (2 – 0.2)/0.1 – 1 = 20 elements in the sequence, which is the length of the resulting numpy array.

Output

Linear Sequence from 0.02 to 2: [0.02 0.12 0.22 0.32 0.42 0.52 0.62 0.72 0.82 0.92 1.02 1.12 1.22 1.32
 1.42 1.52 1.62 1.72 1.82 1.92]
Length: 20

Here is another line of code which generates the numbers from 0 to 9 using arange(), using the default step size of 1:

>>> np.arange(0, 10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If the step size is provided as 0, this is not a valid sequence, since a step of 0 implies you are dividing the range by 0, which will raise a ZeroDivisionError Exception.

import numpy as np

# Invalid Step Size!
a = np.arange(0, 10, 0)

Output

ZeroDivisionError: division by zero

NOTE: This function is a bit different from numpy.linspace(), which, by default, includes both the starting and the endpoints for the sequence calculation. It also does not take the step size as an argument, but rather takes only the number of elements in the sequence.


A simple example

Let’s now put all of this together into a simple example to demonstrate the linearity of the sequences generated by numpy.arange().

The following code plots 2 linear sequences between [0, 20] and [0, 10] using numpy.arange() to show that there is uniformity generated by the sequence, so the resulting arrays are linear.

import numpy as np
import matplotlib.pyplot as plt

y = np.zeros(5)

# Construct two linear sequences
# First one has a step size of 4 units
x1 = np.arange(0, 20, 4)

# Second one has a step size of 2 units
x2 = np.arange(0, 10, 2)

# Plot (x1, [0, 0, ..])
plt.plot(x1, y, 'o')

# Plot (x2, [0.5, 0.5, ..])
plt.plot(x2, y + 0.5, 'o')

# Set limit for y on the plot
plt.ylim([-0.5, 1])

plt.show()

Output

Numpy Arange
Numpy Arange

As you can see, the orange dots represent a linear sequence from 0 to 10 having a step size of 2 units, but since 10 is not included, the sequence is [0, 2, 4, 6, 8]. Similarly, the blue dots represent the sequence [0, 4, 8, 12, 16].


numpy.arange() vs range()

The whole point of using the numpy module is to ensure that the operations that we perform are done as quickly as possible, since numpy is a Python interface to lower level C++ code.

Many operations in numpy are vectorized, meaning that operations occur in parallel when numpy is used to perform any mathematical operation. Due to this, for large arrays and sequences, numpy produces the best performance.

Therefore, the numpy.arange() is much faster than Python’s native range() function for generating similar linear sequences.

Performance Test

We should not interleave numpy‘s vectorized operation along with a Python loop. This slows down performance drastically, as the code is iterating using native Python.

For example, the below snippet shows how you should NOT use numpy.

for i in np.arange(100):
    pass

The recommended way is to directly use the numpy operation.

np.arange(100)

Let’s test the difference in performance using Python’s timeit module.

import timeit
import numpy as np

# For smaller arrays
print('Array size: 1000')

# Time the average among 10000 iterations
print('range():', timeit.timeit('for i in range(1000): pass', number=10000))
print('np.arange():', timeit.timeit('np.arange(1000)', number=10000, setup='import numpy as np'))

# For large arrays
print('Array size: 1000000')

# Time the average among 10 iterations
print('range():', timeit.timeit('for i in range(1000000): pass', number=10))
print('np.arange():', timeit.timeit('np.arange(1000000)', number=10, setup='import numpy as np'))

Output

Array size: 1000
range(): 0.18827421900095942
np.arange(): 0.015803234000486555
Array size: 1000000
range(): 0.22560399899884942
np.arange(): 0.011916546000065864

As you can see, numpy.arange() works particularly well for large sequences. It’s almost 20 times (!!) as fast as the normal Python code for a size of just 1000000, which will only scale better for larger arrays.

Therefore, numpy.arange() should be the unanimous choice among programmers when working with larger arrays.

For smaller arrays, when the difference in performance isn’t that much, you could use among either of the two methods.


References


Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages