Hey, readers! In this article, we will be focusing on 5 Variants of NumPy Data Distribution, in detail.
So, let us begin!! 🙂
Table of Contents
Numpy Data Distribution – Quick Overview
Data Distribution is a very important concept when it comes to data science and analysis. Yes, for data analysis it is very crucial to understand the behavior of data values effectively.
That is, we need to understand the distribution and segregation of data across the limits or its boundaries. By this, it enables us to understand the frequency of data values and also the list of data values in terms of ranges or limits for the distribution.
With the context of this topic, we will be focusing on the following data distributions offered by NumPy module for a NumPy Array data element–
- Zipf distribution
- Pareto distribution
- Rayleigh distribution
- Exponential distribution
- Random distribution with choice() function
1. Random Distribution
With Random Distribution, we can have a combination of randomized data values that follow certain trend of probability density values. In NumPy, we can achieve the same using choice() function.
With choice() function, we can define random numbers in terms of distribution based on probability values.
random.choice(array, p, size)
- array: The data values on the basis of which the data distribution is set of occur. The number of array elements should be equal to the count of elements represented by p.
- p: It represents the probability value of every element to occur in the data distribution. The sum of all values in the p should be equal to 1.
- size: The dimensions of the array.
from numpy import random info = random.choice([2,4,6,8], p=[0.1, 0.3, 0.2, 0.4], size=(2,1)) print(info)
2. Rayleigh Distribution
Rayleigh Distribution enables us to map the data values against a distribution based on probability density in Signal Processing. It makes use of standard deviation to have the data distributed across the limits of the element range.
- scale: The flatness of the distribution of data is decided by the standard deviation values provided under the scale parameter.
- size: Dimensions of array
from numpy import random info = random.rayleigh(scale=1.5, size=(2, 2)) print(info)
[[0.706009 2.83950694] [1.79522459 1.42889097]]
3. Exponential Distribution
With Exponential Distribution, we can examine and estimate the time frame or the bracket of period until the occurrence of the next event altogether. It enables us to decide the rate of occurrence of any particular element with respect to the probability values.
- scale: It represents the inverse value for the occurrence of any element within the data distribution.
- size: Dimensions of an array.
from numpy import random info = random.exponential(scale=1.5, size=(2, 2)) print(info)
[[0.21999314 3.49214755] [1.45176936 2.92176755]]
4. Pareto Distribution
Pareto’s Distribution says, “Only 20 percent of the factors contribute towards the 80 percent of the outcomes for an event.” Taking the same into consideration, we have Pareto Distribution which takes inspiration from the above statement and makes use of pareto() function to deliver distributions on randomized data functions.
- a: shape of the distribution
- size: Dimensions of an array
from numpy import random info = random.pareto(a=1.5, size=(2, 2)) print(info)
[[ 2.4042859 10.06819341] [ 0.97075808 0.63631779]]
5. Zipf Distribution
Zipf’s law states, “The zth most common value is 1/z times the most common value from the range of values”.
Based on the above theory, NumPy provides us with zipf() function to impose zipf’s data distribution over an array.
- a: The parameter for distribution score
- size: Dimensions of an array
from numpy import random info = random.zipf(a=1.5, size=(2, 2)) print(info)
[[ 1 1] [ 2 29]]
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python programming & it’s modules, Stay tuned with us. Till then, Happy Learning!! 🙂