Spring Batch Example

Filed Under: Spring

Welcome to Spring Batch Example. Spring Batch is a spring framework module for execution of batch job. We can use spring batch to process a series of jobs.

Spring Batch Example

Before going through spring batch example program, let’s get some idea about spring batch terminologies.

  • A job can consist of ‘n’ number of steps. Each step contains Read-Process-Write task or it can have single operation, which is called tasklet.
  • Read-Process-Write is basically read from a source like Database, CSV etc. then process the data and write it to a source like Database, CSV, XML etc.
  • Tasklet means doing a single task or operation like cleaning of connections, freeing up resources after processing is done.
  • Read-Process-Write and tasklets can be chained together to run a job.

Spring Batch Example

Let us consider a working example for implementation of spring batch. We will consider the following scenario for implementation purpose.

A CSV file containing data needs to be converted as XML along with the data and tags will be named after the column name.

Below are the important tools and libraries used for spring batch example.

  1. Apache Maven 3.5.0 – for project build and dependencies management.
  2. Eclipse Oxygen Release 4.7.0 – IDE for creating spring batch maven application.
  3. Java 1.8
  4. Spring Core 4.3.12.RELEASE
  5. Spring OXM 4.3.12.RELEASE
  6. Spring JDBC 4.3.12.RELEASE
  7. Spring Batch 3.0.8.RELEASE
  8. MySQL Java Driver 5.1.25 – use based on your MySQL installation. This is required for Spring Batch metadata tables.

Spring Batch Example Directory Structure

Below image illustrates all the components in our Spring Batch example project.

spring batch example

Spring Batch Maven Dependencies

Below is the content of pom.xml file with all the required dependencies for our spring batch example project.


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.journaldev.spring</groupId>
	<artifactId>SpringBatchExample</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<name>SpringBatchDemo</name>
	<url>http://maven.apache.org</url>

	<properties>
		<jdk.version>1.8</jdk.version>
		<spring.version>4.3.12.RELEASE</spring.version>
		<spring.batch.version>3.0.8.RELEASE</spring.batch.version>
		<mysql.driver.version>5.1.25</mysql.driver.version>
		<junit.version>4.11</junit.version>
	</properties>

	<dependencies>

		<!-- Spring Core -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-core</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<!-- Spring jdbc, for database -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-jdbc</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<!-- Spring XML to/back object -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-oxm</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<!-- MySQL database driver -->
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>${mysql.driver.version}</version>
		</dependency>

		<!-- Spring Batch dependencies -->
		<dependency>
			<groupId>org.springframework.batch</groupId>
			<artifactId>spring-batch-core</artifactId>
			<version>${spring.batch.version}</version>
		</dependency>
		<dependency>
			<groupId>org.springframework.batch</groupId>
			<artifactId>spring-batch-infrastructure</artifactId>
			<version>${spring.batch.version}</version>
		</dependency>

		<!-- Spring Batch unit test -->
		<dependency>
			<groupId>org.springframework.batch</groupId>
			<artifactId>spring-batch-test</artifactId>
			<version>${spring.batch.version}</version>
		</dependency>

		<!-- Junit -->
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>${junit.version}</version>
			<scope>test</scope>
		</dependency>

		<dependency>
			<groupId>com.thoughtworks.xstream</groupId>
			<artifactId>xstream</artifactId>
			<version>1.4.10</version>
		</dependency>

	</dependencies>
	<build>
		<finalName>spring-batch</finalName>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-eclipse-plugin</artifactId>
				<version>2.9</version>
				<configuration>
					<downloadSources>true</downloadSources>
					<downloadJavadocs>false</downloadJavadocs>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>2.3.2</version>
				<configuration>
					<source>${jdk.version}</source>
					<target>${jdk.version}</target>
				</configuration>
			</plugin>
		</plugins>
	</build>
</project>

Spring Batch Processing CSV Input File

Here is the content of our sample CSV file for spring batch processing.


1001,Tom,Moody, 29/7/2013
1002,John,Parker, 30/7/2013
1003,Henry,Williams, 31/7/2013

Spring Batch Job Configuration

We have to define spring bean and spring batch job in a configuration file. Below is the content of job-batch-demo.xml file, it’s the most important part of spring batch project.


<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:batch="http://www.springframework.org/schema/batch" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/batch
		http://www.springframework.org/schema/batch/spring-batch-3.0.xsd
		http://www.springframework.org/schema/beans 
		http://www.springframework.org/schema/beans/spring-beans-4.3.xsd
	">

	<import resource="../config/context.xml" />
	<import resource="../config/database.xml" />

	<bean id="report" class="com.journaldev.spring.model.Report"
		scope="prototype" />
	<bean id="itemProcessor" class="com.journaldev.spring.CustomItemProcessor" />

	<batch:job id="DemoJobXMLWriter">
		<batch:step id="step1">
			<batch:tasklet>
				<batch:chunk reader="csvFileItemReader" writer="xmlItemWriter"
					processor="itemProcessor" commit-interval="10">
				</batch:chunk>
			</batch:tasklet>
		</batch:step>
	</batch:job>

	<bean id="csvFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">

		<property name="resource" value="classpath:csv/input/report.csv" />

		<property name="lineMapper">
			<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
				<property name="lineTokenizer">
					<bean
						class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
						<property name="names" value="id,firstname,lastname,dob" />
					</bean>
				</property>
				<property name="fieldSetMapper">
					<bean class="com.journaldev.spring.ReportFieldSetMapper" />

					<!-- if no data type conversion, use BeanWrapperFieldSetMapper to map 
						by name <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper"> 
						<property name="prototypeBeanName" value="report" /> </bean> -->
				</property>
			</bean>
		</property>

	</bean>

	<bean id="xmlItemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">
		<property name="resource" value="file:xml/outputs/report.xml" />
		<property name="marshaller" ref="reportMarshaller" />
		<property name="rootTagName" value="report" />
	</bean>

	<bean id="reportMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
		<property name="classesToBeBound">
			<list>
				<value>com.journaldev.spring.model.Report</value>
			</list>
		</property>
	</bean>

</beans>
  1. We are using FlatFileItemReader to read CSV file, CustomItemProcessor to process the data and write to XML file using StaxEventItemWriter.
  2. batch:job – This tag defines the job that we want to create. Id property specifies the ID of the job. We can define multiple jobs in a single xml file.
  3. batch:step – This tag is used to define different steps of a spring batch job.
  4. Two different types of processing style is offered by Spring Batch Framework, which are “TaskletStep Oriented” and “Chunk Oriented”. Chunk Oriented style is used in this example refers to reading the data one by one and creating ‘chunks’ that will be written out, within a transaction boundary.
  5. reader: spring bean used for reading the data. We have used csvFileItemReader bean in this example that is instance of FlatFileItemReader.
  6. processor: this is the class which is used for processing the data. We have used CustomItemProcessor in this example.
  7. writer: bean used to write data into xml file.
  8. commit-interval: This property defines the size of the chunk which will be committed once processing is done. Basically it means that ItemReader will read the data one by one and ItemProcessor will also process it the same way but ItemWriter will write the data only when it equals the size of commit-interval.
  9. Three important interface that are used as part of this project are ItemReader, ItemProcessor and ItemWriter from org.springframework.batch.item package.

Spring Batch Model Class

First of all we are reading CSV file into java object and then using JAXB to write it to xml file. Below is our model class with required JAXB annotations.


package com.journaldev.spring.model;

import java.util.Date;

import javax.xml.bind.annotation.XmlAttribute;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name = "record")
public class Report {

	private int id;
	private String firstName;
	private String lastName;
	private Date dob;

	@XmlAttribute(name = "id")
	public int getId() {
		return id;
	}

	public void setId(int id) {
		this.id = id;
	}

	@XmlElement(name = "firstname")
	public String getFirstName() {
		return firstName;
	}

	public void setFirstName(String firstName) {
		this.firstName = firstName;
	}

	@XmlElement(name = "lastname")
	public String getLastName() {
		return lastName;
	}

	public void setLastName(String lastName) {
		this.lastName = lastName;
	}

	@XmlElement(name = "dob")
	public Date getDob() {
		return dob;
	}

	public void setDob(Date dob) {
		this.dob = dob;
	}

	@Override
	public String toString() {
		return "Report [id=" + id + ", firstname=" + firstName + ", lastName=" + lastName + ", DateOfBirth=" + dob
				+ "]";
	}

}

Note that the model class fields should be same as defined in the spring batch mapper configuration i.e. property name="names" value="id,firstname,lastname,dob" in our case.

Spring Batch FieldSetMapper

A custom FieldSetMapper is needed to convert a Date. If no data type conversion is required, then only BeanWrapperFieldSetMapper should be used to map the values by name automatically.

The java class which extends FieldSetMapper is ReportFieldSetMapper.


package com.journaldev.spring;

import java.text.ParseException;
import java.text.SimpleDateFormat;

import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.validation.BindException;

import com.journaldev.spring.model.Report;

public class ReportFieldSetMapper implements FieldSetMapper<Report> {

	private SimpleDateFormat dateFormat = new SimpleDateFormat("dd/MM/yyyy");

	public Report mapFieldSet(FieldSet fieldSet) throws BindException {

		Report report = new Report();
		report.setId(fieldSet.readInt(0));
		report.setFirstName(fieldSet.readString(1));
		report.setLastName(fieldSet.readString(2));

		// default format yyyy-MM-dd
		// fieldSet.readDate(4);
		String date = fieldSet.readString(3);
		try {
			report.setDob(dateFormat.parse(date));
		} catch (ParseException e) {
			e.printStackTrace();
		}

		return report;

	}

}

Spring Batch Item Processor

Now as defined in the job configuration an itemProcessor will be fired before itemWriter. We have created a CustomItemProcessor.java class for the same.


package com.journaldev.spring;

import org.springframework.batch.item.ItemProcessor;

import com.journaldev.spring.model.Report;

public class CustomItemProcessor implements ItemProcessor<Report, Report> {

	public Report process(Report item) throws Exception {
		
		System.out.println("Processing..." + item);
		String fname = item.getFirstName();
		String lname = item.getLastName();
		
		item.setFirstName(fname.toUpperCase());
		item.setLastName(lname.toUpperCase());
		return item;
	}

}

We can manipulate data in ItemProcessor implementation, as you can see that I am converting first name and last name values to upper case.

Spring Configuration Files

In our spring batch configuration file, we have imported two additional configuration files – context.xml and database.xml.


<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="
		http://www.springframework.org/schema/beans 
		http://www.springframework.org/schema/beans/spring-beans-4.3.xsd">

	<!-- stored job-meta in memory -->
	<!--  
	<bean id="jobRepository"
		class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">
		<property name="transactionManager" ref="transactionManager" />
	</bean>
 	 -->
 	 
 	 <!-- stored job-meta in database -->
	<bean id="jobRepository"
		class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
		<property name="dataSource" ref="dataSource" />
		<property name="transactionManager" ref="transactionManager" />
		<property name="databaseType" value="mysql" />
	</bean>
	
	<bean id="transactionManager"
		class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
	 
	<bean id="jobLauncher"
		class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
		<property name="jobRepository" ref="jobRepository" />
	</bean>

</beans>
  • jobRepository – The JobRepository is responsible for storing each Java object into its correct meta-data table for spring batch.
  • transactionManager– this is responsible for committing the transaction once size of commit-interval and the processed data is equal.
  • jobLauncher – This is the heart of spring batch. This interface contains the run method which is used to trigger the job.

<beans xmlns="http://www.springframework.org/schema/beans"
	xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.springframework.org/schema/beans 
		http://www.springframework.org/schema/beans/spring-beans-4.3.xsd
		http://www.springframework.org/schema/jdbc 
		http://www.springframework.org/schema/jdbc/spring-jdbc-4.3.xsd">

	<!-- connect to database -->
	<bean id="dataSource"
		class="org.springframework.jdbc.datasource.DriverManagerDataSource">
		<property name="driverClassName" value="com.mysql.jdbc.Driver" />
		<property name="url" value="jdbc:mysql://localhost:3306/Test" />
		<property name="username" value="test" />
		<property name="password" value="test123" />
	</bean>

	<bean id="transactionManager"
		class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

	<!-- create job-meta tables automatically -->
	<!-- <jdbc:initialize-database data-source="dataSource"> <jdbc:script location="org/springframework/batch/core/schema-drop-mysql.sql" 
		/> <jdbc:script location="org/springframework/batch/core/schema-mysql.sql" 
		/> </jdbc:initialize-database> -->
</beans>

Spring Batch uses some metadata tables to store batch jobs information. We can get them created from spring batch configurations but it’s advisable to do it manually by executing the SQL files, as you can see in commented code above. From security point of view, it’s better to not give DDL execution access to spring batch database user.

Spring Batch Tables

Spring Batch tables very closely match the Domain objects that represent them in Java. For example – JobInstance, JobExecution, JobParameters and StepExecution map to BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS and BATCH_STEP_EXECUTION respectively.

ExecutionContext maps to both BATCH_JOB_EXECUTION_CONTEXT and BATCH_STEP_EXECUTION_CONTEXT.

The JobRepository is responsible for saving and storing each java object into its correct table.
spring batch tables
Below are the details of each meta-data table.

  1. Batch_job_instance: The BATCH_JOB_INSTANCE table holds all information relevant to a JobInstance.
  2. Batch_job_execution_params: The BATCH_JOB_EXECUTION_PARAMS table holds all information relevant to the JobParameters object.
  3. Batch_job_execution: The BATCH_JOB_EXECUTION table holds data relevant to the JobExecution object. A new row gets added every time a Job is run.
  4. Batch_step_execution: The BATCH_STEP_EXECUTION table holds all information relevant to the StepExecution object.
  5. Batch_job_execution_context: The BATCH_JOB_EXECUTION_CONTEXT table holds data relevant to an Job’s ExecutionContext. There is exactly one Job ExecutionContext for every JobExecution, and it contains all of the job-level data that is needed for that particular job execution. This data typically represents the state that must be retrieved after a failure so that a JobInstance can restart from where it had failed.
  6. Batch_step_execution_context: The BATCH_STEP_EXECUTION_CONTEXT table holds data relevant to an Step’s ExecutionContext. There is exactly one ExecutionContext for every StepExecution, and it contains all of the data that needs to persisted for a particular step execution. This data typically represents the state that must be retrieved after a failure so that a JobInstance can restart from where it failed.
  7. Batch_job_execution_seq: This table holds the data execution sequence of job.
  8. Batch_step_execution_seq: This table holds the data for sequence for step execution.
  9. Batch_job_seq: This table holds the data for sequence of job in case we have multiple jobs we will get multiple rows.

Spring Batch Test Program

Our Spring Batch example project is ready, final step is to write a test class to execute it as a java program.


package com.journaldev.spring;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class App {
	public static void main(String[] args) {

		String[] springConfig = { "spring/batch/jobs/job-batch-demo.xml" };

		ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(springConfig);

		JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
		Job job = (Job) context.getBean("DemoJobXMLWriter");

		JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis())
				.toJobParameters();

		try {

			JobExecution execution = jobLauncher.run(job, jobParameters);
			System.out.println("Exit Status : " + execution.getStatus());

		} catch (Exception e) {
			e.printStackTrace();
		}

		System.out.println("Done");
		context.close();
	}
}

Just run above program and you will get output xml like below.


<?xml version="1.0" encoding="UTF-8"?><report><record id="1001"><dob>2013-07-29T00:00:00+05:30</dob><firstname>TOM</firstname><lastname>MOODY</lastname></record><record id="1002"><dob>2013-07-30T00:00:00+05:30</dob><firstname>JOHN</firstname><lastname>PARKER</lastname></record><record id="1003"><dob>2013-07-31T00:00:00+05:30</dob><firstname>HENRY</firstname><lastname>WILLIAMS</lastname></record></report>

That’s all for Spring Batch example, you can download final project from below link.

Reference: Official Guide

Comments

  1. HAJARI Youssef says:

    Hello
    Thank you for this Tutorial, but i have a request
    when i want to run the project i need the script who can create batch Database Table BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, BATCH_JOB_EXECUTION_PARAMS and BATCH_STEP_EXECUTION
    if you can give it to me please !

    1. Pankaj says:

      Spring Batch automatically creates the table. However, in production, I wouldn’t suggest you provide DDL privileges to the app. You can check the Spring Batch JAR file, they have the SQL script to create these tables manually.

  2. veera says:

    Hi ,

    i followed your articles with clean examples.thanks for sharing your knowledge and clarifying doubts.

    i have one doubt spring batch,

    Question :

    suppose a batch running 100 records but job is failed 50th some other i need to restart the job automatically where it stopped or need skip the exception record …its like cyclic process failed record moved to list r file..

    can u help me ……

  3. Chandra says:

    The java class which implements** FieldSetMapper is ReportFieldSetMapper.
    Also that, please highlight the class/interface names which are only available from Spring Framework. Otherwise, as the readers of articles , we get wrong perception, that even, for instance, ReportFieldSetMapper is provided by Spring.
    On the whole, , article is very interesting. Thanks.

  4. malla says:

    Hi Pankaj

    Could you provide the following scripts for database.xml

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages