Python pandas are the most useful library for **data manipulation and analysis**. Pandas is a software package. But in spite of this, it offers tons of functions which will assist us in various operations. It allows us to use various **statistical functions,** which drive the statistical measures of the data. In this story, let’s see some of the top statistical functions offered by pandas.

## Loading the Data For Statistical Functions

To see how all these statistical functions work, we need data. For this, we are going with **coffee sales data **which is quite huge and has multiple features.

```
#data
import pandas as pd
data = pd.read_csv('coffeesales.csv')
data.head(5)
```

Well, our data is now ready to get explored statistically. Before moving forward, let’s explore some basic features of our data.

**Shape**

```
#shape
data.shape
```

(4248, 9)

We have 4K + rows and 9 features in our data.

**Features**

```
#features
data.columns
```

Index(['order_date', 'market', 'region', 'product_category', 'product', 'cost', 'inventory', 'net_profit', 'sales'], dtype='object')

I think this should be enough. Now, let’s explore our data using some of the top statistical functions offered by pandas.

## 1. Describe

The **describe** function in pandas is the most useful one. It reveals the statistical measures such as min and max number, counts, standard deviation, mean, and the percentiles as well.

**More read:** Data Describe Library In Python For Data Exploration

```
#describe
data.describe()
```

Using this one-liner code, we can quickly get enough information to understand our data. In the above output, we can easily find some of the key information such as max sales, min-cost, and more.

The describe function is the best fit for summary statistics. It works very well with pandas dataframe and returns the results in a flash.

Since it is a numerical function, it won’t consider the categorical columns present in our data.

## 2. Min, Max and idMin, idMax

I am sure you are well aware of the min and max functions in python. But the idmin and idmax are also the coolest functions I have ever seen.

– These function will return the minimum and maximum number in the particular column.`Min and Max`

– These functions will return the index of those min and max values. Isn’t it cool ðŸ˜›`idmin and idmax`

```
#Min
min(data['sales'])
```

**17**

```
#Max
max(data['sales'])
```

**912**

```
#idxmin
data['sales'].idxmin()
```

**154**

```
#idxmax
data['sales'].idxmax()
```

**1154**

Here, you can see that the min and max values are 17 and 912 respectively. And, value 17 is in index 154 and the value 912 is located in index 1154. That’s something awesome ðŸ˜›

## 3. nsmallest and nlargest

The nsmallest is the function that returns the n smallest numbers. You have to pass the number of values to be returned. Suppose, if you pass 3 as the number, it will return the top 3 smallest numbers in the data.

Similarly, `nlargest`

works just opposite to nsmallest. It will return the n largest numbers present in the data. We will see them in action below.

```
#smallest
data.nsmallest(3,'sales')
```

Pretty awesome. We got the top 3 smallest numbers from the sales column in our data.

```
#largest
data.nlargest(3,'sales')
```

Well, as expected we got the top 3 largest numbers. You can pass whatever number you want.

## 4. Corr

The correlation is one of the most useful functions to understand the correlation among features in our data. It will describe the degree to which two variables move with respect to another.

In simple words, the correlation will determine if the two variables are causal or not. If causal, it will measure the degree of it.

```
#correlation
data.corr()
```

That’s it. We got the correlation results. Here we can see that sales & cost, sales & net_profit are highly positively correlated.

The correlation scale will be from -1 to +1. here, +1 is highly positively correlated and -1 is highly negatively correlated.

## 5. Sample, Unique and Value_count

`Sample`

You can use the sample function to get the random samples from the data. This function will return random values from the data. Let’s see how it works.

```
#sample
data.sample(5)
```

Well, the sample function produced the random samples from the data. It will help in data inspection.

`Unique`

We don’t get many functions in the statistics category which work with categorical data. But, we got a unique function that returns the unique values in the specific variable.

```
#unique
data['market'].unique()
```

array(['Wholesale', 'Retail'], dtype=object)

Yeah, we have 2 markets over which products were sold. Wholesale and Retail. This function is something serious ðŸ˜›

`Value_count`

We know how to see the unique values in the data. But the value_count functions will return the count of those values in the data.

Let’s check ’em on!

```
#value count
data['market'].value_counts()
```

Retail 2544 Wholesale 1704 Name: market, dtype: int64

That’s cool. We can see the whole counts of those values. These functions are specifically very useful to work on categorical data.

I would like to plot this because I have stories without visualizations ðŸ˜› So, another 2 crazy functions to grow your statistical functions list.

```
#plot
data['market'].value_counts().plot(kind = 'bar')
```

Now, it looks good than ever.

## Wrapping UP – Statistical Functions in Python

Statistical functions which pandas offer will help us in understanding the statistical nature of the data. These numbers will suggest to us what to do next. I hope all these functions which I showed here will come to your use in your assignments.

That’s all for now. Happy Python!!!

**More read: **Statistics and Python