How to use Python numpy.where() Method

Filed Under: NumPy
Python Numpy Where

In Python, we can use the numpy.where() function to select elements from a numpy array, based on a condition.

Not only that, but we can perform some operations on those elements if the condition is satisfied.

Let’s look at how we can use this function, using some illustrative examples!


Syntax of Python numpy.where()

This function accepts a numpy-like array (ex. a NumPy array of integers/booleans).

It returns a new numpy array, after filtering based on a condition, which is a numpy-like array of boolean values.

For example, condition can take the value of array([[True, True, True]]), which is a numpy-like boolean array. (By default, NumPy only supports numeric values, but we can cast them to bool also)

For example, if condition is array([[True, True, False]]), and our array is a = ndarray([[1, 2, 3]]), on applying a condition to array (a[:, condition]), we will get the array ndarray([[1 2]]).

import numpy as np

a = np.arange(10)
print(a[a <= 2]) # Will only capture elements <= 2 and ignore others

Output

array([0 1 2])

NOTE: The same condition condition can also be represented as a <= 2. This is the recommended format for the condition array, as it is very tedious writing it as a boolean array

But what if we want to preserve the dimension of the result, and not lose out on elements from our original array? We can use numpy.where() for this.

numpy.where(condition [, x, y])

We have two more parameters x and y. What are those?

Basically, what this says is that if condition holds true for some element in our array, the new array will choose elements from x.

Otherwise, if it’s false, elements from y will be taken.

With that, our final output array will be an array with elements from x wherever condition = True, and elements from y whenever condition = False.

Note that although x and y are optional, if you specify x, you MUST also specify y. This is because, in this case, the output array shape must be the same as the input array.

NOTE: The same logic applies for both single and multi-dimensional arrays too. In both cases, we filter based on the condition. Also remember that the shapes of x, y and condition are broadcasted together.

Now, let us look at some examples, to understand this function properly.


Using Python numpy.where()

Suppose we want to take only positive elements from a numpy array and set all negative elements to 0, let’s write the code using numpy.where().

1. Replace Elements with numpy.where()

We’ll use a 2 dimensional random array here, and only output the positive elements.

import numpy as np

# Random initialization of a (2D array)
a = np.random.randn(2, 3)
print(a)

# b will be all elements of a whenever the condition holds true (i.e only positive elements)
# Otherwise, set it as 0
b = np.where(a > 0, a, 0)

print(b)

Possible Output

[[-1.06455975  0.94589166 -1.94987123]
 [-1.72083344 -0.69813711  1.05448464]]
[[0.         0.94589166 0.        ]
 [0.         0.         1.05448464]]

As you can see, only the positive elements are now retained!

2. Using numpy.where() with only a condition

There may be some confusion regarding the above code, as some of you may think that the more intuitive way would be to simply write the condition like this:

import random
import numpy as np

a = np.random.randn(2, 3)
b = np.where(a > 0)
print(b)

If you now try running the above code, with this change, you’ll get an output like this:

(array([0, 1]), array([2, 1]))

If you observe closely, b is now a tuple of numpy arrays. And each array is the location of a positive element. What does this mean?

Whenever we provide just a condition, this function is actually equivalent to np.asarray.nonzero().

In our example, np.asarray(a > 0) will return a boolean-like array after applying the condition, and np.nonzero(arr_like) will return the indices of the non-zero elements of arr_like. (Refer to this link)

So, we’ll now look at a simpler example, that shows us how flexible we can be with numpy!

import numpy as np

a = np.arange(10)

b = np.where(a < 5, a, a * 10)

print(a)
print(b)

Ouptut

[0 1 2 3 4 5 6 7 8 9]
[ 0  1  2  3  4 50 60 70 80 90]

Here, the condition is a < 5, which will be the numpy-like array [True True True True True False False False False False], x is the array a, and y is the array a * 10. So, we choose from an only if a < 5, and from a * 10, if a > 5.

So, this transforms all elements >= 5, by multiplication with 10. This is what we get indeed!


Broadcasting with numpy.where()

If we provide all of condition, x, and y arrays, numpy will broadcast them together.

import numpy as np

a = np.arange(12).reshape(3, 4)

b = np.arange(4).reshape(1, 4)

print(a)
print(b)

# Broadcasts (a < 5, a, and b * 10)
# of shape (3, 4), (3, 4) and (1, 4)
c = np.where(a < 5, a, b * 10)

print(c)

Output

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[0 1 2 3]]
[[ 0  1  2  3]
 [ 4 10 20 30]
 [ 0 10 20 30]]

Again, here, the output is selected based on the condition, so all elements, but here, b is broadcasted to the shape of a. (One of its dimensions has only one element, so there will be no errors during broadcasting)

So, b will now become [[0 1 2 3] [0 1 2 3] [0 1 2 3]], and now, we can select elements even from this broadcasted array.

So the shape of the output is the same as the shape of a.


Conclusion

In this article, we learned about how we can use the Python numpy.where() function to select arrays based on another condition array.


References


Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages