Java 8 Streams: An Introduction
The addition of Streams and Lambdas was a much needed breath of fresh air for Java and brought with it a new wave of declarative programming. In this tutorial we’ll be going over Streams, some of their features, some of their drawbacks, and a variety of use cases.
With the release of Java 8 in 2014 Oracle brought Java more inline with the many modern languages of today. The addition of Streams and Lambdas was a much needed breath of fresh air for the platform and brought with it a new wave of declarative programming. In this tutorial we’ll be going over Streams, some of their features, some of their drawbacks, and a variety of use cases.
What is a Stream?
A Stream is, effectively, a read-once data view. Wow, that’s a lot of keywords, let’s take a step back and talk about each part of that statement before we keep going.
- Read-once
- Streams can be read once by a terminal operator before being exhausted. Much like a queue/stack once you pop the element off the Stream it’s gone from the Stream. Unlike stacks, however, Streams are exhausted and never repopulated.
- Data view
- You’ll notice I specifically call these out not as Data Structures but instead as Data Views. Streams don’t hold actual data in themselves like a collection, they simply facilitate the retrieval of data from a defined data set and allow you to view/interact with the data.
Additionally, Streams have a variety of important features that define them:
- They give access to a variety of declarative lazy manipulation methods that can change the type/contents of the stream (by creating new streams)
- They give access to a set of declarative terminal operators that can create a new reusable piece of data
- They allow for infinite Streams of data
- They have built-in functionality for parallelization (Which we wont be covering in this article)
So, to sum it up, a Stream is an object that allows the access, manipulation, and collection of data. This, when paired with their extensive declarative method collection, makes them extremely useful for writing easy to read and concise code.
What are Lazy and Terminal operators?
One of the most important Stream concepts is Lazy and Terminal operators. Lazy operators are instructions that aren’t carried out immediately. As you chain lazy operators together nothing actually happens outside of defining a pipeline of actions. It isn’t until a Terminal operator is called that the pipeline is used, and the Stream consumed. Examples of Lazy operators are “map”, “flatMap”, and “filter”, and examples of Terminal operators are “collect”, “reduce”, and “forEach.”
What can we do with Streams?
Now that we have a high-level understanding of what a Stream is, let’s talk about how to use it. Consider the following simplistic problem:
“Find all odd numbers between 0 and 1000”
Traditional Java has us approach the problem in an imperative fashion:
List<Integer> oddNumbers = new ArrayList<>();
for (int i = 0 ; i <= 1000 ; i++) {
if (i % 2 != 0) {
oddNumbers.add(i);
}
}
Simple to understand, and a fairly small 6 lines, not bad! Let’s see how we do with Streams:
IntStream oddNumbers = IntStream
.rangeClosed(0, 1000)
.filter(x -> x % 2 != 0);
Great! We’ve ended with an IntStream of odd numbers in one line, that means we're done, right? Unfortunately, this is deceptive. IntStream is a specific kind of Stream, one that holds primitive ints rather than object Integers, and the end result of filter is an IntStream rather than a list. Let’s see what we have to do to actually match the for loop:
List<Integer> oddNumbers = IntStream.rangeClosed(0, 1000)
.filter(x -> x % 2 != 0)
.boxed()
.collect(Collectors.toList());
There we go. Still one line, still easy to read, but now we’re ending with a list of integers instead of a Stream of ints. However, in this simple example we’re not really getting the most out of Streams. Let’s add some additional complexity to the example.
“Find all odd numbers between 0 and 1000 that are multiples of 3 but not of 5”
Again, traditional Java looks like this:
List<Integer> oddNumbers = new ArrayList<>();
for (int i = 0 ; i <= 1000 ; i++) {
if (i % 2 != 0 && i % 3 ==0 && i % 5 != 0) {
oddNumbers.add(i);
}
}
Still the same number of lines, but that if statement is starting to get a bit unwieldy. Let’s look at our Stream equivalent:
List<Integer> oddNumbers = IntStream
.rangeClosed(0, 1000)
.filter(x -> x % 2 != 0)
.filter(x -> x % 3 == 0)
.filter(x -> x % 5 != 0)
.boxed()
.collect(Collectors.toList());
Hey, that’s better! Much easier to read. Admittedly, we could have done something similar in the traditional java by breaking each part of the if statement into it’s own nested if block, but that adds a lot of unneeded complexity to the eye and makes the code harder to follow. It is important to note that multiple filter statements will likely be optimized by the hotspot optimizer, so don’t worry too much about creating excess filters. But what about a more complex question, maybe one involving objects?
“Given a list of Users, find all users with invalid email addresses, create a map of user id to list of user objects”
This question has a bit more meat to it. Now we have to worry about a list of user objects that have an email address and a user type (We’ll define this in a minute), and collecting them into a Map of user type to list of user based on the validity of their email address. One could easily use a method like this in an application if looking for users to exclude from an email action. Let’s start by defining a couple things:
public class User {
private final String emailAddress;
private final Long id;
private final Integer userTypeId;
//Getters
}
public Boolean isValidEmail(String emailAddress) {...}
Great, now that we have a user object and method to validate email addresses let’s look at what the implementations of the question look like. A traditional java implementation looks something like this:
Map<Integer, List<User>> userMap = new HashMap<>();
for (User user : users) {
if (isValidEmail(user.getEmailAddress())) {
Integer userTypeId = user.getUserTypeId();
List<User> value = userMap.getOrDefault(userTypeId, new ArrayList<>);
value.add(user);
userMap.put(userTypeId, value);
}
}
The inner if block is a bit hectic, but it’s readable for now. This method is about as imperative as you can get, and there isn’t anything wrong with that with the exception that it can create hard-to-read code. Let’s look at the Stream solution.
Map<Integer, List<User>> userMap = users
.stream()
.filter(user -> isValidEmail(user.getEmailAddress()))
.collect(Collectors.groupingBy(User::getUserTypeId));
The declaritive nature of the Stream apporach creates a more easily readable solution and increases maintainability, which is a major benefit.
Streams don’t only have to be used against Collections though. Consider the following:
“Given a User object with a list of Product objects, determine the sum of all the products”
First let’s define the missing parts:
public class User {
private final String emailAddress;
private final Long id;
private final Integer userTypeId;
private final List<Product> cart;
//Getters
}
public class Product {
private final Long id;
private final Double price;
private final String productName;
//Getters
}
Using a Stream this can be done in a declarative fashion with little trouble:
Stream.of(user)
.map(User::getCart)
.flatMap(Collection::Stream)
.map(User::getPrice)
.reduce(0D, (x, y) -> x + y);
This gives us a very clear declarative structure to the code. This is just a fraction of the functions that come built in to Streams, but hopefully it gives you a starting points for examining places that Streams could be useful to you!
What are the downsides?
Now the elephant in the room. Streams don’t come consequence free, and it’s important to understand the restrictions that come with using them. These restrictions include thread-safety, performance, and memory-usage. The most important ones to talk about are the performance cost and memory-usage, which go hand-in-hand.
Each Stream manipulation has a significant overhead due to object creation and destruction. Consider the last case:
Stream.of(user)
.map(User::getCart)
.flatMap(Collection::Stream)
.map(User::getPrice)
.reduce(0D, (x, y) -> x + y);
In this one block of code we’ve
- Created a
Stream<User>
object - Created a new
Stream<List<Product>>
- Created a new
Stream<Product>
- Created a new
Stream<Double>
- Created N doubles through double arithmetic where N is the size of the user’s cart
That’s a lot of object creation. This example could be easily reduced in number of operations, of course:
user.getCart()
.stream()
.map(User::getPrice)
.reduce(0D, (x, y) -> x + y)
Which reduces the number of actual object creation operations to
- Create a
Stream<Product>
- Create a new
Stream<Double>
- Create N doubles through double arithmetic where N is the size of the user’s cart
But that’s still 2 + N object creations. These excess object creations add up over time, especially with larger Stream chains, so if you’re working in a performance critical system where every cycle counts, or in a memory-limited system, Streams are likely not your best choice.
Additionally, Streams lack any real sense of thread safety, primarily because they are “consume once,” but also because they are Data Views. The fact that Streams are consume once immediately destroys any chance for passing Streams between threads from being an effective strategy, but let’s consider the Data View perspective now. Consider the following code:
class Pojo {
private int tracker;
public Pojo(int tracker) {
this.tracker = tracker;
}
public int getTracker() {
return tracker;
}
public void setTracker(int tracker) {
this.tracker = tracker;
}
}
class Main {
public static void main(String... args) {
List<Pojo> pojoList = new ArrayList<>();
pojoList.add(new Pojo(1))
pojoList.add(new Pojo(2))
pojoList.add(new Pojo(3));
Stream<Integer> pojoStream = pojoList
.stream()
.map(Pojo::getTracker);
pojoList.get(0).setTracker(5);
pojoStream.forEach(System.out::println);
}
}
Remember, since Streams are Data Views they always reflect the state of the object they’re viewing at the moment the Terminal operator is called. As such, the final line of the main method prints out “5 2 3” rather than “1 2 3”. It should also be noted that while you can safely modify the contents of a collection from within a stream of said collection, you will still run into the basic ConcurrentModification errors when attempting to concurrently modify the size of the base collection.
So that’s the basics!
Streams are an extremely useful tool for writing clean, readable, maintainable code, and can greatly simplify the work required to do many operations. They do come at a cost, primarily performance related, but if you’re not heavily restricted by CPU cycles or memory they will likely be fine to use. Remember though, Streams are a tool like any other and there are times when using a traditional for loop will be more effective than a Stream, so examine your use-case, consider the ramifications, and code wisely!
Photo by Danny Postma on Unsplash