Java Stream distinct() Function to Remove Duplicates

Filed Under: Java

Java Stream distinct() method returns a new stream of distinct elements. It’s useful in removing duplicate elements from the collection before processing them.

Java Stream distinct() Method

  • The elements are compared using the equals() method. So it’s necessary that the stream elements have proper implementation of equals() method.
  • If the stream is ordered, the encounter order is preserved. It means that the element occurring first will be present in the distinct elements stream.
  • If the stream is unordered, then the resulting stream elements can be in any order.
  • Stream distinct() is a stateful intermediate operation.
  • Using distinct() with an ordered parallel stream can have poor performance because of significant buffering overhead. In that case, go with sequential stream processing.

Remove Duplicate Elements using distinct()

Let’s see how to use stream distinct() method to remove duplicate elements from a collection.


jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]

jshell> List<Integer> distinctInts = list.stream().distinct().collect(Collectors.toList());
distinctInts ==> [1, 2, 3, 4]
Java Stream Distinct Example

Java Stream distinct() Example

Processing only Unique Elements using Stream distinct() and forEach()

Since distinct() is a intermediate operation, we can use forEach() method with it to process only the unique elements.


jshell> List<Integer> list = List.of(1, 2, 3, 4, 3, 2, 1);
list ==> [1, 2, 3, 4, 3, 2, 1]

jshell> list.stream().distinct().forEach(x -> System.out.println("Processing " + x));
Processing 1
Processing 2
Processing 3
Processing 4
Java Stream Distinct ForEach Example

Java Stream distinct() forEach() Example

Stream distinct() with custom objects

Let’s look at a simple example of using distinct() to remove duplicate elements from a list.


package com.journaldev.java;

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class JavaStreamDistinct {

	public static void main(String[] args) {
		List<Data> dataList = new ArrayList<>();
		dataList.add(new Data(10));
		dataList.add(new Data(20));
		dataList.add(new Data(10));
		dataList.add(new Data(20));

		System.out.println("Data List = "+dataList);

		List<Data> uniqueDataList = dataList.stream().distinct().collect(Collectors.toList());

		System.out.println("Unique Data List = "+uniqueDataList);
	}

}

class Data {
	private int id;

	Data(int i) {
		this.setId(i);
	}

	public int getId() {
		return id;
	}

	public void setId(int id) {
		this.id = id;
	}

	@Override
	public String toString() {
		return String.format("Data[%d]", this.id);
	}
}

Output:


Data List = [Data[10], Data[20], Data[10], Data[20]]
Unique Data List = [Data[10], Data[20], Data[10], Data[20]]

The distinct() method didn’t remove the duplicate elements. It’s because we didn’t implement the equals() method in the Data class. So the superclass Object equals() method was used to identify equal elements. The Object class equals() method implementation is:


public boolean equals(Object obj) {
    return (this == obj);
}

Since the Data objects had the same ids’ but they were referring to the different objects, they were considered not equal. That’s why it’s very important to implement equals() method if you are planning to use stream distinct() method with custom objects.

Note that both equals() and hashCode() methods are used by Collection classes API to check if two objects are equal or not. So it’s better to provide an implementation for both of them.


@Override
public int hashCode() {
	final int prime = 31;
	int result = 1;
	result = prime * result + id;
	return result;
}

@Override
public boolean equals(Object obj) {
	System.out.println("Data equals method");
	if (this == obj)
		return true;
	if (obj == null)
		return false;
	if (getClass() != obj.getClass())
		return false;
	Data other = (Data) obj;
	if (id != other.id)
		return false;
	return true;
}

Tip: You can easily generate equals() and hashCode() method using “Eclipse > Source > Generate equals() and hashCode()” menu option.

The output after adding equals() and hashCode() implementation is:


Data List = [Data[10], Data[20], Data[10], Data[20]]
Data equals method
Data equals method
Unique Data List = [Data[10], Data[20

Reference: Stream distinct() API Doc

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages