What is Batch processing?
It is the processing mode, which involves execution of series of automated complex jobs without user interaction. It process handles bulk data and runs for a long time.
What is Spring Batch?
It is a lightweight framework, which is used to develop Batch Applications that are used in Enterprise Applications.
Apart from bulk processing, Spring Batch also provides features such as:
1). Spring Batch applications are flexible. To alter the order of processing in an application, we only need to change an XML file.
2). They are easy to maintain. A Spring Batch job includes steps and each step can be decoupled, tested, and updated, without effecting the other steps.
3). Using the portioning techniques, you can scale the Spring Batch applications. These techniques allow us to : execute the steps of a job in parallel and execute a single thread in parallel.
4). In case of any failure, we can restart the job from exactly where it was stopped, by decoupling the steps.
5). Spring Batch provides support for a large set of readers and writers such as XML, Flat file, CSV, MYSQL, Hibernate, JDBC, Mongo, Neo4j, etc.
6). Spring Batch job can be launched using web applications, Java programs, Command Line, etc.
7). It supports chunk oriented processing. It provides us an easy way to manage the size of our transactions. Items are processed one by one and the transaction is committed, when the chunk size is met.
Apart from these, Spring Batch applications also support:
a). Automatic retry after failure. We can skip items if an exception is thrown and configure retry logic to decide, whether our batch job should retry the failed operation or not.
b). Tracking status and statistics during the batch execution and after completing the batch processing.
c). Run concurrent jobs.
d). Services such as logging (we can trace out the steps execution in the persisted database), resource management, skip, and restarting the processing.
Architecture of Spring Batch
The Spring Batch architecture contains three main components namely, Application, Batch Core, and Batch Infrastructure.
JOB and STEP : Job is the batch process that is to be executed. It runs from start to finish without interruption. This Job may contain one or more steps. Where each step is an independent part of a job, which contains the necessary information to define and execute the job.
A Batch job is configured within the tags < job > < /job > . e.g :
< job id = "job_id" >
< step id = "step1" next = "step2"/ >
< step id = "step2" next = "step3"/ >
< step id = "step3" next = "step4"/ >
< step id = "step4"/ >
< /job >
When a job is running and you try to start it again, it will be started again. If you want to avoid this (restart), you need to set the restartable value to false, e.g:
< job id = "job_id" restartable = "false" >
...
< /job >
Each step is composed of an ItemReader, ItemProcessor (optional) and an ItemWriter.
ItemReader reads data into a Spring Batch application from a particular source, whereas an ItemWriter writes data from the Spring Batch application to a particular destination.
Following are some of the predefined ItemReader classes provided by Spring Batch to read from various sources.
ItemProcessor is used to process the data. When the given item is not valid it returns null, else it processes the given item and returns the processed result. The interface ItemProcessor represents the processor.
What is the difference between Step, Chunk and Tasklet?
Step is a domain object that encapsulates an independent, sequential phase of a batch job and contains all of the information necessary to define and control the actual batch processing.
Steps can be processed by either using Chunk or a Tasklet.
When to use?
< job id="taskletJob" >
< step id="callingStoredProc" > < tasklet ref="callProc"/ > < /step >
< /job >
Chunk example: < job id = "my_read_job" >
< step id = "step1" >
< tasklet >
< chunk reader = "sqlserverlReader" writer = "fileWriter" processor = "CustomitemProcessor" > < /chunk >
< /tasklet >
< /step >
< / job >
Spring Batch provides a long list of readers and writers, you can use anyone of them depending on your requirment.
Chunk has the following attributes :
# reader : It represents the name of the item reader bean. It accepts the value of the type org.springframework.batch.item.ItemReader.
# writer : It represents the name of the item writer bean. It accepts the value of the type org.springframework.batch.item.ItemWriter.
# processor : It represents the name of the item reader bean. It accepts the value of the type org.springframework.batch.item.ItemProcessor.
# commit-interval : It is used to specify the number of items to be processed before committing the transaction.
JobRepository : It provides Create, Retrieve, Update, and Delete (CRUD) operations for the JobLauncher, Job, and Step implementations. We need to define a job repository in an XML file, e.g:
< job-repository id = "myjobRepository"/ >
or
< job-repository id = "myjobRepository"
data-source = "dataSource" transaction-manager = "transactionManager"
isolation-level-for-create = "SERIALIZABLE"
table-prefix = "BATCH_" max-varchar-length = "1000"/ >
In-Memory Repository : If you don’t want to persist the domain objects of the Spring Batch in the database, you need to configure the in-memory version of the jobRepository, e.g:
< bean id = "myjobRepository"
class = "org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" >
< property name = "dataSource" ref = "dataSource" / >
< property name = "transactionManager" ref="transactionManager" / >
< property name = "databaseType" value = "mysql" / >
< /bean >
Job Instance : It represents the logical run of a job. It is created when we run the job. Each job instance is differentiated by the name of the job and the parameters passed to it while running.
If a JobInstance execution fails, the same JobInstance can be executed again. Hence, each JobInstance can have multiple job executions
What is a JobLauncher?
The JobLauncher bean is used to configure the JobLauncher and is associated with the class org.springframework.batch.core.launch.support.SimpleJobLauncher.
This bean has one property named jobrepository, and it is used to specify the name of the bean which defines the jobrepository.
< bean id = "jobLauncher"
class = "org.springframework.batch.core.launch.support.SimpleJobLauncher" >
< property name = "jobRepository" ref = "myjobRepository" / >
< /bean >
What is a TransactionManager?
The TransactionManager bean is used to configure the TransactionManager using a relational database. This bean is associated with the class of type org.springframework.transaction.platform.TransactionManager.
< bean id = "transactionManager"
class = "org.springframework.batch.support.transaction.ResourcelessTransactionManager" / >
What is DataSource?
The datasource bean is used to configure the Datasource. This bean is associated with the class of type org.springframework.jdbc.datasource.DriverManagerDataSource.
e.g:
< bean id="sqlserverdataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource" >
< property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" / >
< property name="url" value="jdbc:sqlserver://localhost:54552; databaseName=springtest" / >
< property name="username" value="test_user" / >
< property name="password" value="test_user" / >
< /bean >
< bean id = "mysqldataSource"
class = "org.springframework.jdbc.datasource.DriverManagerDataSource" >
< property name = "driverClassName" value = "com.mysql.jdbc.Driver" / >
< property name = "url" value = "jdbc:mysql://localhost:54552/details" / >
< property name = "username" value = "test_user" / >
< property name = "password" value = "test_user" / >
< /bean >
It is the processing mode, which involves execution of series of automated complex jobs without user interaction. It process handles bulk data and runs for a long time.
What is Spring Batch?
It is a lightweight framework, which is used to develop Batch Applications that are used in Enterprise Applications.
Apart from bulk processing, Spring Batch also provides features such as:
- Logging and tracing
- Transaction management
- Job processing statistics
- Job restart
- Skip and Resource management
1). Spring Batch applications are flexible. To alter the order of processing in an application, we only need to change an XML file.
2). They are easy to maintain. A Spring Batch job includes steps and each step can be decoupled, tested, and updated, without effecting the other steps.
3). Using the portioning techniques, you can scale the Spring Batch applications. These techniques allow us to : execute the steps of a job in parallel and execute a single thread in parallel.
4). In case of any failure, we can restart the job from exactly where it was stopped, by decoupling the steps.
5). Spring Batch provides support for a large set of readers and writers such as XML, Flat file, CSV, MYSQL, Hibernate, JDBC, Mongo, Neo4j, etc.
6). Spring Batch job can be launched using web applications, Java programs, Command Line, etc.
7). It supports chunk oriented processing. It provides us an easy way to manage the size of our transactions. Items are processed one by one and the transaction is committed, when the chunk size is met.
Apart from these, Spring Batch applications also support:
a). Automatic retry after failure. We can skip items if an exception is thrown and configure retry logic to decide, whether our batch job should retry the failed operation or not.
b). Tracking status and statistics during the batch execution and after completing the batch processing.
c). Run concurrent jobs.
d). Services such as logging (we can trace out the steps execution in the persisted database), resource management, skip, and restarting the processing.
The Spring Batch architecture contains three main components namely, Application, Batch Core, and Batch Infrastructure.
- Application contains all the jobs and the code we write using the Spring Batch framework.
- Batch Core contains all the API classes that are needed to control and launch a Batch Job.
- Batch Infrastructure contains the readers, writers, and services used by both application and Batch core components.
JOB and STEP : Job is the batch process that is to be executed. It runs from start to finish without interruption. This Job may contain one or more steps. Where each step is an independent part of a job, which contains the necessary information to define and execute the job.
A Batch job is configured within the tags < job > < /job > . e.g :
< job id = "job_id" >
< step id = "step1" next = "step2"/ >
< step id = "step2" next = "step3"/ >
< step id = "step3" next = "step4"/ >
< step id = "step4"/ >
< /job >
When a job is running and you try to start it again, it will be started again. If you want to avoid this (restart), you need to set the restartable value to false, e.g:
< job id = "job_id" restartable = "false" >
...
< /job >
Each step is composed of an ItemReader, ItemProcessor (optional) and an ItemWriter.
ItemReader reads data into a Spring Batch application from a particular source, whereas an ItemWriter writes data from the Spring Batch application to a particular destination.
Following are some of the predefined ItemReader classes provided by Spring Batch to read from various sources.
- FlatFileItemReader : read data from flat files.
- StaxEventItemReader : read data from XML files.
- StoredProcedureItemReader : read data from the stored procedures of a database.
- JDBCPagingItemReader : read data from relational databases database.
- MongoItemReader : read data from MongoDB.
- Neo4jItemReader : read data from Neo4jItemReader.
- FlatFileItemWriter : write data into flat files.
- StaxEventItemWriter : write data into XML files.
- StoredProcedureItemWriter : write data into the stored procedures of a database.
- JDBCPagingItemWriter : write data into relational databases database.
- MongoItemWriter : write data into MongoDB.
- Neo4jItemWriter : write data into Neo4j.
ItemProcessor is used to process the data. When the given item is not valid it returns null, else it processes the given item and returns the processed result. The interface ItemProcessor represents the processor.
What is the difference between Step, Chunk and Tasklet?
Step is a domain object that encapsulates an independent, sequential phase of a batch job and contains all of the information necessary to define and control the actual batch processing.
Steps can be processed by either using Chunk or a Tasklet.
When to use?
- Tasklet - Suppose the job to be run a single granular task then Tasklet processing is used. e.g, usually Used in scenarios invloving a single task like deleting a resource or executing a query .
- Chunk - Suppose the job to be run is complex and involves executing of tasks involving reads, processing and writes the we use chunk oriented processing. e.g, usually used in scenarios where multiple aggregated steps need to be run like copying, processing and transferring of data.
< job id="taskletJob" >
< step id="callingStoredProc" > < tasklet ref="callProc"/ > < /step >
< /job >
Chunk example: < job id = "my_read_job" >
< step id = "step1" >
< tasklet >
< chunk reader = "sqlserverlReader" writer = "fileWriter" processor = "CustomitemProcessor" > < /chunk >
< /tasklet >
< /step >
< / job >
Spring Batch provides a long list of readers and writers, you can use anyone of them depending on your requirment.
Chunk has the following attributes :
# reader : It represents the name of the item reader bean. It accepts the value of the type org.springframework.batch.item.ItemReader.
# writer : It represents the name of the item writer bean. It accepts the value of the type org.springframework.batch.item.ItemWriter.
# processor : It represents the name of the item reader bean. It accepts the value of the type org.springframework.batch.item.ItemProcessor.
# commit-interval : It is used to specify the number of items to be processed before committing the transaction.
JobRepository : It provides Create, Retrieve, Update, and Delete (CRUD) operations for the JobLauncher, Job, and Step implementations. We need to define a job repository in an XML file, e.g:
< job-repository id = "myjobRepository"/ >
or
< job-repository id = "myjobRepository"
data-source = "dataSource" transaction-manager = "transactionManager"
isolation-level-for-create = "SERIALIZABLE"
table-prefix = "BATCH_" max-varchar-length = "1000"/ >
In-Memory Repository : If you don’t want to persist the domain objects of the Spring Batch in the database, you need to configure the in-memory version of the jobRepository, e.g:
< bean id = "myjobRepository"
class = "org.springframework.batch.core.repository.support.JobRepositoryFactoryBean" >
< property name = "dataSource" ref = "dataSource" / >
< property name = "transactionManager" ref="transactionManager" / >
< property name = "databaseType" value = "mysql" / >
< /bean >
Job Instance : It represents the logical run of a job. It is created when we run the job. Each job instance is differentiated by the name of the job and the parameters passed to it while running.
If a JobInstance execution fails, the same JobInstance can be executed again. Hence, each JobInstance can have multiple job executions
What is a JobLauncher?
The JobLauncher bean is used to configure the JobLauncher and is associated with the class org.springframework.batch.core.launch.support.SimpleJobLauncher.
This bean has one property named jobrepository, and it is used to specify the name of the bean which defines the jobrepository.
< bean id = "jobLauncher"
class = "org.springframework.batch.core.launch.support.SimpleJobLauncher" >
< property name = "jobRepository" ref = "myjobRepository" / >
< /bean >
What is a TransactionManager?
The TransactionManager bean is used to configure the TransactionManager using a relational database. This bean is associated with the class of type org.springframework.transaction.platform.TransactionManager.
< bean id = "transactionManager"
class = "org.springframework.batch.support.transaction.ResourcelessTransactionManager" / >
What is DataSource?
The datasource bean is used to configure the Datasource. This bean is associated with the class of type org.springframework.jdbc.datasource.DriverManagerDataSource.
- driverClassName : This specifies the class name of the driver used to connect with the database.
- url : This specifies the URL of the database.
- username : This specifies the username to connect with the database.
- password : This specifies the password to connect with the database.
e.g:
< bean id="sqlserverdataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource" >
< property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" / >
< property name="url" value="jdbc:sqlserver://localhost:54552; databaseName=springtest" / >
< property name="username" value="test_user" / >
< property name="password" value="test_user" / >
< /bean >
< bean id = "mysqldataSource"
class = "org.springframework.jdbc.datasource.DriverManagerDataSource" >
< property name = "driverClassName" value = "com.mysql.jdbc.Driver" / >
< property name = "url" value = "jdbc:mysql://localhost:54552/details" / >
< property name = "username" value = "test_user" / >
< property name = "password" value = "test_user" / >
< /bean >
No comments:
Post a Comment