Batch Applications tutorial on WildFly

This tutorial discusses about Batch Applications for the Java Platform (JSR-352) which can be used to define, implement and running batch jobs. Batch jobs are composed of a set of tasks which can be automatically executed without user interaction. These tasks are executed periodically or when resource usage is low and they often process large amounts of information such as log files, database records or images. Examples include billing, report generation, data format conversion, and image processing. These tasks are called batch jobs. The batch framework is a rather rich one as it includes a Java API an XML configuration and a batch runtime.

Batch applications are broken down in a set of steps which specify their execution order. A simple batch might include just to elaborate sequentially a set of records, however more advanced ones may specify additional elements like decision elements or parallel execution of steps.

Before diving into the example, some definitions first: what is a step? put it simply, a step is an independent and sequential phase of a batch job. A step it self can contain chunk-oriented steps and task-oriented steps.

Chunk-oriented steps process data by reading items from a source, applying some transformation/business logic to each item, and storing the results. Chunk steps operate on one item at a time and group the results into a chunk. The results are stored when the chunk reaches a configurable size. Chunk-oriented processing makes storing results more efficient and facilitates transaction demarcation.

java ee 7 batch wildfly java ee 7 batch wildfly
Task-oriented steps, on the other hand, execute actions other than processing single items from a source. A typical example of task-oriented step might be some DDL on a database or operation on a file system. In terms of comparison a chunk oriented step can be used for massive, long running tasks whilst a task oriented step might be fit for a set of batch operations that are to be executed periodically.

JSR 352 also defines a roll-your-own kind of a step called a batchlet. A batchlet is free to use anything to accomplish the step, such as sending an e-mail.

In this tutorial we will learn how to use a Chunk-oriented steps. Each chunk step is in turn broken in three parts:

  • The Read chunk part which is used to read the single items from a source of data (database/fs/ldap etc.)
  • The Processor chunk part manipulates one item at a time using the logic defined by the application. (e.g. sorting, filtering data, trasnforming data etc.)
  • The Writer chunk part is used to write the item which has been processed in the earlier phase.

java ee 7 batch wildfly java ee 7 batch wildfly

Due to its nature, chunk steps are usually long-running activities, therefore it is possible to bookmark their progress using checkpoints. A checkpoint can be used to restart the execution of a step which has been interrupted.

In this tutorial we will see a simple yet powerful example of batch job which takes as input a CSV file which is read, processed and inserted into a database. This example has been taken from A.Gupta Java EE 7 examples (https://github.com/arun-gupta/javaee7-samples/tree/master/batch/chunk-csv-database) and it has been slightly modified in its configuration to use the default WildFly datasource,the WildFly Java EE 7 dependencies and the JBoss Maven plugin.

The Job file

Each Job must be named uniquely and must be placed in the META-INF/batch-jobs directory. So here’s our job definition file (myJob.xml) :

<job id="myJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0">
    <step id="myStep" >
        <chunk item-count="3">
            <reader ref="myItemReader"/>
            <processor ref="myItemProcessor"/>
            <writer ref="myItemWriter"/>
        </chunk>    
    </step>
</job> 

This is the Job definition file which describes how many steps and chunks we are going to execute, the reference implementation for them and the size of the chunk, via the item-count attribute. Now here the three implementation follow here.

Writing the ItemReader

public class MyItemReader extends AbstractItemReader {

    private BufferedReader reader;

    @Override
    public void open(Serializable checkpoint) throws Exception {
        reader = new BufferedReader(
                new InputStreamReader(
                    this
                    .getClass()
                    .getClassLoader()
                    .getResourceAsStream("/META-INF/mydata.csv")
                )
            );
    }

    @Override
    public String readItem() {
        try {
            return reader.readLine();
        } catch (IOException ex) {
            Logger.getLogger(MyItemReader.class.getName()).log(Level.SEVERE, null, ex);
        }
        return null;
    }
}

Note that the class is annotated with the @Named annotation. Because the @Named annotation uses the default value, the Contexts and Dependency Injection (CDI) name for this bean is myItemReader.

Writing the ItemProcessor
Our SimpleItemProcessor follows a pattern similar to the pattern for myItemReader but it’s in charge to create a Person object from the CSV line of text which has been read:

@Named
public class MyItemProcessor implements ItemProcessor {
    SimpleDateFormat format = new SimpleDateFormat("M/dd/yy");

    @Override
    public Person processItem(Object t) {
        System.out.println("processItem: " + t);
        
        StringTokenizer tokens = new StringTokenizer((String)t, ",");

        String name = tokens.nextToken();
        String date;
        
        try {
            date = tokens.nextToken();
            format.setLenient(false);
            format.parse(date);
        } catch (ParseException e) {
            return null;
        }
        
        return new Person(name, date);
    }
}

The processItem() method receives (from the batch runtime) a String object which is tokenized and used to create a Person object as output. Notice that the type of object returned by an ItemProcessor can be different from the type of object it received from ItemReader.

Writing the ItemWriter
Following here is the ItemWriter which, as we said, is in charge to persist the Item (Person) on the default Database (ExampleDS).

@Named
public class MyItemWriter extends AbstractItemWriter {
    
    @PersistenceContext
    EntityManager em;

    @Override
    public void writeItems(List list) {
        System.out.println("writeItems: " + list);
        for (Object person : list) {
            em.persist(person);
        }
    }
}

Starting a Batch Job from a Servlet

Note that the mere presence of a job XML file or other batch artifacts (such as ItemReader) doesn’t mean that a batch job is automatically started when the application is deployed. A batch job must be initiated explicitly, say, from a servlet or from an Enterprise JavaBeans (EJB) timer or an EJB business method.

protected void processRequest(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        response.setContentType("text/html;charset=UTF-8");
        try (PrintWriter out = response.getWriter()) {
            out.println("<html>");
            out.println("<head>");
            out.println("<title>CSV-to-Database Chunk Job</title>");
            out.println("</head>");
            out.println("<body>");
            out.println("<h1>CSV-to-Database Chunk Job</h1>");
            JobOperator jo = BatchRuntime.getJobOperator();
            long jid = jo.start("myJob", new Properties());
            out.println("Job submitted: " + jid + "<br>");
            out.println("<br><br>Check server.log for output, also look at \"myJob.xml\" for Job XML.");
            out.println("</body>");
            out.println("</html>");
        } catch (JobStartException | JobSecurityException ex) {
            Logger.getLogger(TestServlet.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

The first step is to obtain an instance of JobOperator. This can be done by calling the following:

JobOperator jo = BatchRuntime.getJobOperator(); 

The servlet then creates a Properties object and stores the input file name in it. Finally, a new batch job is started by calling the following:

jor.start("myJob", new Properties()) 

The jobname is nothing but the job JSL XML file name (minus the .xml extension). The properties parameter serves to pass any input data to the job.

The batch runtime assigns a unique ID, called the execution ID, to identify each execution of a job whether it is a freshly submitted job or a restarted job. Many of the JobOperator methods take the execution ID as parameter. Using the execution ID, a program can obtain the current (and past) execution status and other statistics about the job. The JobOperator.start() method returns the execution ID of the job that was started.

Compiling the Batch Code

In order to compile your project, I’ve changed the original pom.xml for this project by including the org.jboss.spec.javax.batch dependency. Note also that I’m using the new jboss-javaee-7.0 BOM while, as far as I know, the jboss-javaee-7.0-with-hibernate bom is still not available.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
   <modelVersion>4.0.0</modelVersion>
   <groupId>org.javaee7.batch</groupId>
   <artifactId>batch-samples</artifactId>
   <version>1.0-SNAPSHOT</version>
   <packaging>war</packaging>
   <name>Batch Applications for the Java Platform (JSR-352) Example</name>
   <description>Batch Applications for the Java Platform (JSR-352) Example</description>
   <url>http://jboss.org/jbossas</url>
   <licenses>
      <license>
         <name>Apache License, Version 2.0</name>
         <distribution>repo</distribution>
         <url>http://www.apache.org/licenses/LICENSE-2.0.html</url>
      </license>
   </licenses>
   <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      <version.jboss.maven.plugin>7.4.Final</version.jboss.maven.plugin>
      <version.jboss.spec.javaee.6.0>3.0.2.Final</version.jboss.spec.javaee.6.0>
      <version.war.plugin>2.1.1</version.war.plugin>
      <version.compiler.plugin>2.3.1</version.compiler.plugin>
      <!-- maven-compiler-plugin -->
      <maven.compiler.target>1.7</maven.compiler.target>
      <maven.compiler.source>1.7</maven.compiler.source>
      <version.jboss.bom>1.0.4.Final</version.jboss.bom>
   </properties>
   <dependencyManagement>
      <dependencies>
         <dependency>
            <groupId>org.jboss.spec</groupId>
            <artifactId>jboss-javaee-7.0</artifactId>
            <version>1.0.0.Beta2</version>
            <type>pom</type>
            <scope>import</scope>
         </dependency>
         <dependency>
            <groupId>org.jboss.bom</groupId>
            <artifactId>jboss-javaee-6.0-with-hibernate</artifactId>
            <version>${version.jboss.bom}</version>
            <type>pom</type>
            <scope>import</scope>
         </dependency>
      </dependencies>
   </dependencyManagement>
   <dependencies>
      <dependency>
         <groupId>org.hibernate.javax.persistence</groupId>
         <artifactId>hibernate-jpa-2.0-api</artifactId>
         <scope>provided</scope>
      </dependency>
      <dependency>
         <groupId>org.hibernate</groupId>
         <artifactId>hibernate-validator</artifactId>
         <scope>provided</scope>
         <exclusions>
            <exclusion>
               <groupId>org.slf4j</groupId>
               <artifactId>slf4j-api</artifactId>
            </exclusion>
         </exclusions>
      </dependency>
      <!-- Import the Batch API which is included in WildFly 8 -->
      <dependency>
         <groupId>org.jboss.spec.javax.batch</groupId>
         <artifactId>jboss-batch-api_1.0_spec</artifactId>
         <version>1.0.0.Final</version>
      </dependency>
      <!-- Import the CDI API -->
      <dependency>
         <groupId>javax.enterprise</groupId>
         <artifactId>cdi-api</artifactId>
         <scope>provided</scope>
      </dependency>
      <!-- Import the Common Annotations API (JSR-250) -->
      <dependency>
         <groupId>org.jboss.spec.javax.annotation</groupId>
         <artifactId>jboss-annotations-api_1.1_spec</artifactId>
         <scope>provided</scope>
      </dependency>
      <!-- Import the Servlet API -->
      <dependency>
         <groupId>org.jboss.spec.javax.servlet</groupId>
         <artifactId>jboss-servlet-api_3.0_spec</artifactId>
         <scope>provided</scope>
      </dependency>
   </dependencies>
   <build>
      <!-- Set the name of the war, used as the context root when the app is deployed -->
      <finalName>${project.artifactId}</finalName>
      <plugins>
         <plugin>
            <artifactId>maven-war-plugin</artifactId>
            <version>${version.war.plugin}</version>
            <configuration>
               <!-- Java EE 6 doesn't require web.xml, Maven needs to catch up! -->
               <failOnMissingWebXml>false</failOnMissingWebXml>
            </configuration>
         </plugin>
         <!-- JBoss AS plugin to deploy war -->
         <plugin>
            <groupId>org.jboss.as.plugins</groupId>
            <artifactId>jboss-as-maven-plugin</artifactId>
            <version>${version.jboss.maven.plugin}</version>
         </plugin>
         <!-- Compiler plugin enforces Java 1.6 compatibility and activates 
                annotation processors -->
         <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>${version.compiler.plugin}</version>
            <configuration>
               <source>${maven.compiler.source}</source>
               <target>${maven.compiler.target}</target>
            </configuration>
         </plugin>
      </plugins>
   </build>
</project>

Download the Maven project for this example from http://www.mastertheboss.com/code/chunk-csv-database.zip