How to initialize an Array in Java in 4 simple ways

This article discusses about array initialization in Java, showing multiple ways to initialize an array, some of them you probably don’t know!

Basic Array Initialization in Java

Firstly, some background. Java classifies types as primitive types, user-defined types, and array types. An array type is a region of memory that stores values in equal-size and contiguous slots, which we call elements. The array is declared with the element type and one or more pairs of square brackets that indicate the number of dimensions. A single pair of brackets means a one-dimensional array.

Example:

String array[];

On the other hand, this is a two-dimensional array:

double[][] matrix;

So you basically specify the datatype and the declared variable name. Mind it, declaring an array does not initialize it. You can initialize an array, and assign memory to it, by providing just the array size or also the content of the array.

How to init an array specifying the array size

The following example shows how to initialize an array of Strings which contains 2 elements:

String array[] = new String[2];

How to init an array using Array Literal

The following examples shows how to initialize different elements -in a single line- using Array Literals:

String array[] = new String[] { "Pear", "Apple", "Banana" };
int[] datas = { 12, 48, 91, 17 };
char grade[] = { 'A', 'B', 'C', 'D', 'F' };
float[][] matrixTemp = { { 1.0F, 2.0F, 3.0F }, { 4.0F, 5.0F, 6.0F }};
int x = 1, y[] = { 1, 2, 3, 4, 5 }, k = 3;

You can of course also split the declaration from the assignment:

int[] array;
     
array = new int[]{2,3,5,7,11};

When retrieving an array by its index, it is important to remember that index starts from 0:

Therefore, you can retrieve it by id as follows:

   for (int i =0;i < 5;i++) {
            System.out.println(array[i]);
   }

On the other hand, if you want to init a multidimensional array in a loop, you need to loop over the array twice. Example:

double[][] temperatures = new double[3][2];

for (int row = 0; row < temperatures.length; row++)
   for (int col = 0; col < temperatures[row].length; col++)
      temperatures[row][col] = Math.random()*100;

Finally, if you don’t need to operate on the array id, you can use the simplified for loop to iterate through the array:

int[] array = new int[]{2,3,5,7,11};

for (int a:array)
   System.out.println(a);

Initializing an array in Java using Arrays.copyOf()

The java.util.Arrays.copyOf(int[] original,int newLength) method copies the specified array, eventually truncating or padding with zeros (if needed) so the copy has the specified length.

Example:

int[] array = new int[]{2,3,5,7,11};
int[] copy =  Arrays.copyOf(array, 5);

A similar option also exists in the System packages, using the System.arraycopy static method:

int[] src  = new int[]{2,3,5,7,11};
int[] dest = new int[5];
System.arraycopy( src, 0, dest, 0, src.length );

The core difference is that Arrays.copyOf does not just copy elements, it also creates a new array. On the other hand, System.arrayCopy copies into an existing array.

In most cases, System.arrayCopy will be faster because it uses a direct native memory copy. Arrays.copyOf uses Java primitives to copy although the JIT compiler could do some clever special case optimization to improve the performance.

Using Arrays functions to fill an array

The method Arrays.setAll sets all elements of an array using a generator function. This is the most flexible option as it lets you use a Lambda expression to initialize an array using a generator. Example:

int[] arr = new int[10];
Arrays.setAll(arr, (index) -> 1 + index);

This can be useful, for example, to quickly initialize an Array of Objects:

Customer[] customerArray = new Customer[7];
// setting values to customerArray using setAll() method
Arrays.setAll(customerArray, i -> new Customer(i+1, "Index "+i));

A similar function is Arrays.fill which is the best choice if you don’t need to use a generator function but you just need to init the whole array with a value. Here is how to fill an array of 10 int with the value “1”:

int [] myarray = new int[10];
Arrays.fill(myarray, 1);

Finally, if you want to convert a Collection of List objects into an array, you can use the .toArray() method on the Collection. Example:

List<String> list = Arrays.asList("Apple", "Pear", "Banana");
String[] array = list.toArray(new String[list.size()]);

Initialize an array in Java using the Stream API

Finally, you can also use the Java 8 Stream API for making a copy of an Array into another. Let’s check with at an example:

String[] strArray = {"apple", "tree", "banana'"};
String[] copiedArray = Arrays.stream(strArray).toArray(String[]::new);

Arrays.stream also does a shallow copy of objects, when using non-primitive types.

In this tutorial we have covered four basic strategies to initialize, prefill and iterate over an array in Java.

Troubleshooting OutOfMemoryError: Direct buffer memory

The java.nio.DirectByteBuffer class is special implementation of java.nio.ByteBuffer that has no byte[] laying underneath. The main feature of DirectByteBuffer is that JVM will try to natively work on allocated memory without any additional buffering so operations performed on it may be faster then those performed on ByteBuffers with arrays lying underneath.

We can allocate such ByteBuffer by calling:

ByteBuffer directBuffer = ByteBuffer.allocateDirect(64);

When such an object is created via the ByteBuffer.allocateDirect() call, it allocates the specified amount (capacity) of native memory using malloc() OS call. This memory is released only when the given DirectByteBuffer object is garbage collected and its internal “cleanup” method is called (the most common scenario), or when this method is invoked explicitly via getCleaner().clean().

Symptoms of the Direct Buffer Memory issue

As we said, the Direct Buffers are allocated to native memory space outside of the JVM’s established heap/perm gens. If this memory space outside of heap/perm is exhausted, the java.lang.OutOfMemoryError: Direct buffer memory Error will be throw.

A good runtime indicator of a growing Direct Buffers allocation is the size of Non-Heap Java Memory usage, which can be collected with any tool, like jconsole:

In terms of Operating System, the amount of Memory used by a Java process includes the following elements: Java Heap Size + Metaspace + CodeCache + DirectByteBuffers + Jvm-native-c++-heap.

You can obtain this information using the following command:

pmap -x [PID]

The above command will display the amount of RSS (in KB) for the process, as you can see from the third column of the output:

total kB         14391640 12343808 12272896

Once that you know the full size of the JVM process, you have to subtract the Java Heap Size + Metaspace for a rough estimate of the JVM native memory size.

Java Native Memory Tracking

A good indicator which can be added to the JVM is the NativeMemoryTracking, which can be added through the following settings:

-XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics

When Native Memory Tracking is enable, you can request a report on the JVM memory usage using the following command:

jcmd <pid> VM.native_memory

If you check at the jcmd output, you will find at the bottom, the amount of native memory committed/used in the Internal (committed) section :

Native Memory Tracking:

Total: reserved=1334532KB, committed=369276KB
-                 Java Heap (reserved=524288KB, committed=132096KB)
                            (mmap: reserved=524288KB, committed=132096KB) 
 
-                     Class (reserved=351761KB, committed=112629KB)
                            (classes #19111)
                            (  instance classes #17977, array classes #1134)
                            (malloc=3601KB #66765) 
                            (mmap: reserved=348160KB, committed=109028KB) 
                            (  Metadata:   )
                            (    reserved=94208KB, committed=92824KB)
                            (    used=85533KB)
                            (    free=7291KB)
                            (    waste=0KB =0.00%)
                            (  Class space:)
                            (    reserved=253952KB, committed=16204KB)
                            (    used=12643KB)
                            (    free=3561KB)
                            (    waste=0KB =0.00%)
 
-                    Thread (reserved=103186KB, committed=9426KB)
                            (thread #100)
                            (stack: reserved=102712KB, committed=8952KB)
                            (malloc=352KB #524) 
                            (arena=122KB #198)
 
-                      Code (reserved=249312KB, committed=23688KB)
                            (malloc=1624KB #7558) 
                            (mmap: reserved=247688KB, committed=22064KB) 
 
-                        GC (reserved=71049KB, committed=56501KB)
                            (malloc=18689KB #13308) 
                            (mmap: reserved=52360KB, committed=37812KB) 
 
-                  Compiler (reserved=428KB, committed=428KB)
                            (malloc=302KB #923) 
                            (arena=126KB #5)
 
-                  Internal (reserved=1491KB, committed=1491KB)
                            (malloc=1451KB #4873) 
                            (mmap: reserved=40KB, committed=40KB) 
 
-                     Other (reserved=1767KB, committed=1767KB)
                            (malloc=1767KB #50) 
 
-                    Symbol (reserved=21908KB, committed=21908KB)
                            (malloc=19503KB #252855) 
                            (arena=2406KB #1)
 
-    Native Memory Tracking (reserved=5914KB, committed=5914KB)
                            (malloc=349KB #4947) 
                            (tracking overhead=5565KB)

Setting MaxDirectMemorySize

There is a JVM parameter named -XX:MaxDirectMemorySize which allows to set the maximum amount of memory which can be reserved to Direct Buffer Usage. As a matter of fact, for JDK 8, this value is set to 64MB:

private static long directMemory = 64 * 1024 * 1024;

However, by digging into sun.misc.VM you will see that, if not configured, it derives its value from Runtime.getRuntime.maxMemory(), thus the value of –Xmx. So if you don’t configure -XX:MaxDirectMemorySize and do configure -Xmx2g, the “default” MaxDirectMemorySize will also be 2 Gb, and the total JVM memory usage of the app (heap+direct) may grow up to 2 + 2 = 4 Gb.

Collecting the Heap Dump

Even if the DirectByteBuffer is allocated outside of the JVM Heap, the JVM still provides important hints. In fact, when the JVM requests a DirectByteBuffer, there will be a reference to it in the Heap.

From the Heap Dump, you can therefore check the amount, we can check how much native memory these DirectByteBuffers are using.

If you are using an advanced tool like JXRay report (https://jxray.com/), it’s enough to load your Heap dump and it will automatically pinpoint to your Off-Heap memory dump, with the amount of information already calculated:

With another tool like Eclipse Mat, you have to calculate it yourself by using the following OQL experssion:

SELECT x, x.capacity FROM java.nio.DirectByteBuffer x WHERE ((x.capacity > 1024 * 1024) and (x.cleaner != null))

The above query will list all DirectByteBuffer which have been allocated and not released and whose capacity is bigger than 1MB.

Checking the Reference chain.

After that we have checked how much native memory your DirectByteBuffers are using, next step will be checking through the reference chain and try to understand who’s holding the ByteBuffers.

Still using Eclipse Mat, you can right-click on the result of your OQL (x.capacity field) and choose “merge shortest path to GC roots“. That will show you which class is holding the memory for the DirectBuffer thus preventing it from being garbage-collected:

So, in this case you have your XNIO worker threads holding a reference to your DirectBuffers. This might be either a a temporary problem or a bug.

If it’s a temporary problem (such as a spike in native memory which gradually reduces), that might be something you can tune, for example by reducing the number of io threads used by your application.

In WildFly / JBoss EAP the number of io-threads to create for your workers is configued in the io subsystem:

/subsystem=io/worker=default/:read-resource(recursive=false)
{
    "outcome" => "success",
    "result" => {
        "io-threads" => undefined,
        "stack-size" => 0L,
        "task-keepalive" => 60,
        "task-max-threads" => undefined
    }
}

If not specified, a default will be chosen, which is calculated by cpuCount * 2

Another option is to configure a limit per-thread DirectByteBuffer size using the -Djdk.nio.maxCachedBufferSize JVM property

-Djdk.nio.maxCachedBufferSize 

The above JVM property will limit the per-thread DirectByteBuffer size.

Finally, if are using WildFly application server or JBoss EAP, a more drastic solution is to disable direct buffers, at the expense of an increased Heap usage:

/subsystem=io/buffer-pool=default:write-attribute(name=direct-buffers,value=false)

Out of Memory caused by allocation failures

When using G1GC (the default Garbage collector since Java 11) there are additional options to manage an allocation failure. First of all some definitions: a GC allocation failure means that the garbage collector could not move objects from young gen to old gen fast enough because it does not have enough memory in old gen. In order to address this issue there are some potential solutions which include:    

  • Increasing the number of concurrent marking threads by setting ‘-XX:ConcGCThreads’ value. Increasing the number of Concurrent Marking Threads will make garbage collection run fast at the price of an higher CPU cost.
  • You can force the G1 Garbage Collector to start the Marking phase earlier by lowering ‘-XX:InitiatingHeapOccupancyPercent’ value. The default value for it is 45 which means the G1 GC marking phase will begin only when heap usage reaches 45%. By reducing this value, the G1 GC marking phase will start earlier so that Full GC can be avoided.
  • Set -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=1  .This will enable immediate flushing of softly referenced objects in the JVM options. As it turns out, the Direct Buffers as stored outside the Heap and a reference to them is generally held as a PhantomReference in the tenured generation. If there’s no pressure to run a Garbage collector on the tenured generation you might hit an Out of Memory because of the accumulation of soft references in the tenured generation.

Tuning glibc

glibc is the default native memory allocator for Java applications. The objects allocated by glibc may not be returned once it’s freed for performance improvement. This performance improvement, however, comes to the price of an increased memory fragmentation. The fragmentation can grow unboundedly eventually causing an Out of Memory.

MALLOC_ARENA_MAX is an environment variable to control how many memory pools can be created for glibc. By default, it is 8 * CPU cores. You can experiment reducing this value to 2 or 1 and see if the Out of Memory issue is gone. The lower this value, the less number of memory pools will be created (at the expenses of a reduced performance).

export MALLOC_ARENA_MAX=1

Explicit Garbage Collection disabled?

In some cases, it can be that memory allocated by direct buffers may accumulate for a long time before it is collected. In the long run that’s not really a leak, but it will increase peak memory usage. In this case, the explicit Garbage collection (done with System.gc()) is there to free buffers when the reserveMemory limit is hit.

The OpenJDK invokes System.gc() during direct ByteBuffer allocation to provide a hint and hope for timely reclamation of directly memory by the GC

So, it is worth checking if you are using DisableExplicitGC in your JVM settings:

-XX:+DisableExplicitGC

(Reference: https://stackoverflow.com/questions/32912702/impact-of-setting-xxdisableexplicitgc-when-nio-direct-buffers-are-used)

Check Open issues

In most cases, the issue is in some libraries used by your application. Therefore, you don’t have direct control on the source code to fix the issue. So it is worth checking for some known issues for frameworks using DirectByteBuffer such as netty:

https://issues.redhat.com/browse/NETTY-424

Also, check if your specific version of the application server (WildFly / EAP ) needs to be upgraded to fix an older issue for the DirectByteBuffer.

Thanks to Francisco De Melo for taking the time to review and improve this article. Francisco runs a cool blog on Java/JDK at: https://franciscomelojr.ca/

Simple strategies to test your Java applications with LDAP

LDAP is commonly used in Security realms as a source of authentication and authorization information. This tutorial will teach you two simple strategies for starting an LDAP Server in minutes in order to secure your Enterprise applications.

Option 1: Use an Embedded LDAP Server

The first example is using an embedded ApacheDS LDAP server with preconfigured LDIF file with some example LDAP data (username, firstName, lastName, email), but also some custom attributes ( postal code, street).

In order to start the ApacheDS based LDAP server you just need the pom.xml file which contains a reference to the keycloak-util-embedded-ldap package:

<dependency>
    <groupId>org.keycloak</groupId>
    <artifactId>keycloak-util-embedded-ldap</artifactId>
    <scope>test</scope>
</dependency>

Then, specify in your exec-maven-plugin which Java class to start and include in its System Properties the ldif file to be loaded:

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>exec-maven-plugin</artifactId>
    <configuration>
        <mainClass>org.keycloak.util.ldap.LDAPEmbeddedServer</mainClass>
        <classpathScope>test</classpathScope>
        <systemProperties>
            <systemProperty>
                <key>ldap.ldif</key>
                <value>ldap-example-users.ldif</value>
            </systemProperty>
        </systemProperties>
    </configuration>
</plugin>

That beiing said, you can start the LDAP server as follows:

mvn exec:java -Pldap

Here is the expected output:

You can find the pom.xml file and the ldif file in our Github repository: https://github.com/fmarchioni/mastertheboss/tree/master/ldap/embedded

Option 2: Use Docker to start LDAP

The second example we will show in this tutorial uses docker and OpenLDAP. The most common implementation of OpenLDAP is the osixia/openldap. You can start is as follows:

$ docker run --env LDAP_ORGANISATION="keycloak" --env LDAP_DOMAIN="keycloak.org" --env LDAP_ADMIN_PASSWORD="admin" osixia/openldap

Then, provided that you have installed LDAP Client tools, load the LDIF file using the ldapadd command. For example, in order to use the same example from keycloak:

$ ldapadd -f ldap-example-users.ldif -x -h 172.17.0.2 -p 389 -D "cn=admin,dc=keycloak,dc=org" -w "admin" -c

adding new entry "dc=keycloak,dc=org"
ldap_add: Already exists (68)
adding new entry "ou=People,dc=keycloak,dc=org"
adding new entry "ou=RealmRoles,dc=keycloak,dc=org"
adding new entry "ou=FinanceRoles,dc=keycloak,dc=org"
adding new entry "uid=jbrown,ou=People,dc=keycloak,dc=org"
adding new entry "uid=bwilson,ou=People,dc=keycloak,dc=org"
adding new entry "cn=ldap-user,ou=RealmRoles,dc=keycloak,dc=org"
adding new entry "cn=ldap-admin,ou=RealmRoles,dc=keycloak,dc=org"
adding new entry "cn=accountant,ou=FinanceRoles,dc=keycloak,dc=org"

You might have noticed that, the OpenLDAP docker imaged already created an entry for “dc=keycloak,dc=org”, therefore the first line of the ldif file was skipped in this case. Besides that, we have loaded the same structure with Users and Roles:

Option 3: Use a free Online LDAP Server

There are several online free LDAP servers which can be used in read-only mode to test your applications. My favuorite one is available at: https://www.forumsys.com

You can use the following settings to connect to the online server:

ldap.urls= ldap://ldap.forumsys.com:389/
ldap.base.dn= dc=example,dc=com
ldap.username= cn=read-only-admin,dc=example,dc=com
ldap.password= password
ldap.user.dn.pattern = uid={0}

Here is a snapshot from the list of users:

You can connect to individual Users (uid) or the two Groups (ou) that include:

ou=mathematicians,dc=example,dc=com

  • riemann
  • gauss
  • euler
  • euclid

ou=scientists,dc=example,dc=com

  • einstein
  • newton
  • galieleo
  • tesla

All user passwords are “password”.

That’s all. In this tutorial we have covered two strategies to start quickly an LDAP server for testing/developing applications using LDAP as repository

How to find out which JAXB implementation is used in your code

When using WildFly or JBoss EAP, the JAXB implementation is defined by the following specification in module.xml:

<module name="javax.xml.bind.api" xmlns="urn:jboss:module:1.7">


    <dependencies>
        <module name="javax.activation.api" export="true"/>
        <module name="javax.xml.stream.api"/>
        <module name="com.sun.xml.bind" services="import"/>
        <module name="javax.api"/>
    </dependencies>

    <resources>
        <resource-root path="jboss-jaxb-api_2.3_spec-1.0.1.Final-redhat-1.jar"/>
    </resources>
</module>

 In order to check the actual JAXB implementation, you can just create a new Instance of JAXBContext and execute a getClass() againsts its name:

@WebServlet(value = "/jaxb" )
public class DemoServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse res) {
        try {
            PrintWriter out = res.getWriter();
            JAXBContext jc = JAXBContext.newInstance(Employee.class);
            String jcClassName = jc.getClass().getName();
            out.println(jcClassName);
        
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
 
}


@XmlRootElement
public class Employee {
    private int id;
    private String name;
    private float salary;

    public Employee() {}
    public Employee(int id, String name, float salary) {
        super();
        this.id = id;
        this.name = name;
        this.salary = salary;
    }
    @XmlAttribute
    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    @XmlElement
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    @XmlElement
    public float getSalary() {
        return salary;
    }
    public void setSalary(float salary) {
        this.salary = salary;
    }
}

If you want to check the exact version of your JAXB implementation, the following code will do it:

if (jc instanceof com.sun.xml.bind.v2.runtime.JAXBContextImpl) {
                out.println("JAXB Version: " +
                        ((com.sun.xml.bind.v2.runtime.JAXBContextImpl) jc).getBuildId());
}

Furthermore, if you want to check the JAXB implementation for a specific class, then you can do it as follows:

/**
    * Print the JAXB Implementation information
    */
public static void outputJaxpImplementationInfo() {
       logger.debug(getImplementationInfo("DocumentBuilderFactory", DocumentBuilderFactory.newInstance().getClass()));
       logger.debug(getImplementationInfo("XPathFactory", XPathFactory.newInstance().getClass()));
       logger.debug(getImplementationInfo("TransformerFactory", TransformerFactory.newInstance().getClass()));
       logger.debug(getImplementationInfo("SAXParserFactory", SAXParserFactory.newInstance().getClass()));
}

/**
    * Get the JAXB implementation information for a particular class
    * @param componentName
    * @param componentClass
    * @return
    */
private static String getImplementationInfo(String componentName, Class componentClass) {
       CodeSource source = componentClass.getProtectionDomain().getCodeSource();
       return MessageFormat.format(
               "{0} implementation: {1} loaded from: {2}",
               componentName,
               componentClass.getName(),
               source == null ? "Java Runtime" : source.getLocation());
}

Building JAXB applications using Java 11 or newer

According to the release-notes, Java 11 removed the Java EE modules:

java.xml.bind (JAXB)- REMOVED
  • Java 8 – OK
  • Java 9 – DEPRECATED
  • Java 10 – DEPRECATED
  • Java 11 – REMOVED

You can fix the issue by using alternate versions of the Java EE technologies. Simply add Maven dependencies that contain the classes you need:

<dependency>
  <groupId>javax.xml.bind</groupId>
  <artifactId>jaxb-api</artifactId>
  <version>2.3.0</version>
</dependency>
<dependency>
  <groupId>com.sun.xml.bind</groupId>
  <artifactId>jaxb-core</artifactId>
  <version>2.3.0</version>
</dependency>
<dependency>
  <groupId>com.sun.xml.bind</groupId>
  <artifactId>jaxb-impl</artifactId>
  <version>2.3.0</version>
</dependency>

Jakarta EE 8 update 

For Jakarta EE users, you can fix the issue by using Jakarta XML Binding from Jakarta EE 8:

<dependency>
  <groupId>jakarta.xml.bind</groupId>
  <artifactId>jakarta.xml.bind-api</artifactId>
  <version>2.3.3</version></dependency>

Solving java.lang.OutOfMemoryError: Metaspace error

The java.lang.OutOfMemoryError: Metaspace indicates that the amount of native memory allocated for Java class metadata is exausted. Let’s how this issue can be solved in standalone applications and cloud applications.

In Java 8 and later, the maximum amount of memory allocated for Java classes (MaxMetaspaceSize) is by default unlimited, so in most cases there is no need to change this setting. On the other hand, if you want to limit the amount of memory allocated for Java classes, you can set it as follows:

java -XX:MaxMetaspaceSize=3200m  

The thing is that -XX:MaxMetaspaceSize is just an upper limit. The current Metaspace size (i.e. committed) will be smaller. In fact, there is a setting called MaxMetaspaceFreeRatio (default 70%) which means that the actual metaspace size will never exceed 230% of its occupancy.

And for it to grow it first would have to fill up, forcing a garbage collection in an attempt to free objects and only when it cannot meet its MinMetaspaceFreeRatio (default 40%) goal it would expand the current metaspace. That can however not be greater than 230% of the occupancy after the GC cycle.

How Java Hotspot manages MetaSpace Data

The Java Hotspot manages the space used for metadata as follows: space is requested from the OS and then divided into chunks. A class loader allocates space for metadata from its chunks.

Class metadata is deallocated when the corresponding Java class is unloaded and its chunks are recycled for reuse or returned to the OS. Java classes are unloaded as a result of garbage collection, and garbage collections may be triggered in order to unload classes and deallocate class metadata. When the space committed for class metadata reaches a certain threshold (a high-water mark), a garbage collection is triggered.
After the garbage collection, the high-water mark may be raised or lowered depending on the amount of space freed from class metadata.

Checking MetaSpace capacity with jstat

The simplest way to monitor the MetaSpace size is by means of the jstat tool which is available in the JDK. When used with the option -gcmetacapacity it provides the following information:

jstat -gcmetacapacity (PID)  

For example:

MCMN   MCMX      MC       CCSMN CCSMX       CCSC    YGC   FGC    FGCT    CGC    CGCT       
0.0   374784.0  140360.0  0.0   253952.0    21168.0  23     0    0.000     6    0.046   

And here is a description of the Labels:

  • MCMN: Minimum metaspace capacity (kB).
  • MCMX: Maximum metaspace capacity (kB).
  • MC: Metaspace capacity (kB).
  • CCSMN: Compressed class space minimum capacity (kB).
  • CCSMX: Compressed class space maximum capacity (kB).
  • YGC: Number of young generation GC events.
  • FGC: Number of full GC events.
  • FGCT: Full garbage collection time.
  • GCT: Total garbage collection time.

Other interesting options include the parameter -gcutil:

 $ jstat -gcutil (PID) | awk '{print($5)}' 

This will print the Metaspace utilization as a percentage of the space’s current capacity.

Querying the Meta Space from an Heap Dump

Further inspection can be performed through an Heap Dump:

Then, if you have a look at the OQL console, you can execute OQL queries to perform ad hoc analysis on your classes. For example, by executing the following query, you can have a list of class loaded from each Classloader:

select map(sort(map(heap.objects('java.lang.ClassLoader'), '{loader: it, count: it.classes.elementCount }'), 'lhs.count < rhs.count'), 'toHtml(it) + "
"')

This can be a precious hint to determine if a Classloader is loading an increasing number of classes.

Monitoring MetaSpace Size with Java Native Memory tracking

A good way to monitor the exact amount of Metadata is by using the NativeMemoryTracking, which can be added through the following settings:

-XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics

When Native Memory Tracking is enabled, you can request a report on the JVM memory usage using the following command:

$ jcmd <pid> VM.native_memory

If you check at the jcmd output, you will find at the bottom, the amount of native memory committed/used in the Internal (committed) section

Total: reserved=1334532KB, committed=369276KB
-                 Java Heap (reserved=524288KB, committed=132096KB)
                            (mmap: reserved=524288KB, committed=132096KB)
 
-                     Class (reserved=351761KB, committed=112629KB)
                            (classes #19111)
                            (  instance classes #17977, array classes #1134)
                            (malloc=3601KB #66765)
                            (mmap: reserved=348160KB, committed=109028KB)
                            (  Metadata:   )
                            (    reserved=94208KB, committed=92824KB)
                            (    used=85533KB)
                            (    free=7291KB)
                            (    waste=0KB =0.00%)
                            (  Class space:)
                            (    reserved=253952KB, committed=16204KB)
                            (    used=12643KB)
                            (    free=3561KB)
                            (    waste=0KB =0.00%)

In the line beginning with Metaspace, the used value is the amount of space used for loaded classes. The committed value is the amount of space available for chunks. The reserved value is the amount of space reserved (but not necessarily committed) for metadata.

OutOfMemoryError: Metaspace on OpenShift/Kubernetes

When using openjdk Image on OpenShift/Kubernetes, the default maxium value for the Metaspace is XX:MaxMetaspaceSize=100m. You might have noticed that setting this value through the JAVA_OPTIONS environment variable, doesn’t work as the default value is appended to the bottom:

VM Arguments: -Xms128m -Xmx1024m -XX:MetaspaceSize=128M -XX:MaxMetaspaceSize=256m    -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

The correct way to set the MaxMetaspaceSize is through the GC_MAX_METASPACE_SIZE environment variable. For example, if you are using a deployment.yaml file to deploy your application with JKube, the following settings will override the default values for the MaxMetaspaceSize and MaxMetaspaceSize:

spec:
  template:
    spec:
      containers:
      - env:
        - name: JAVA_OPTIONS
          value: '-Xms128m -Xmx1024m'
        - name: GC_MAX_METASPACE_SIZE
          value: 256
        - name: GC_METASPACE_SIZE
          value: 96

How to solve java.lang.OutOfMemoryError: GC overhead limit exceeded

The error “java.lang.OutOfMemoryError: GC overhead limit exceeded” is fairly common for old JDK (mostly JDK 1.6 and JDK 1.7) . Let’see how to solve it.

According to the JDK Troubleshooting guide, the “java.lang.OutOfMemoryError: GC overhead” limit exceeded indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 consecutive garbage collections, then a java.lang.OutOfMemoryErroris thrown. This exception is typically thrown because the amount of live data barely fits into the Java heap having little free space for new allocations.

This is meant to prevent applications from running for an extended period of time while making little or no progress reclaiming objects.

Before talking about the possible solution, it is worth to know that this feature can be disabled with the following option:

java -XX:-UseGCOverheadLimit  

Disabling this throttle however, will just postpone the memory issue that will turn soon into a “java.lang.OutOfMemoryError: Java heap space.”

Possible solutions:

1) Check for Memory leaks with a memory profiling tool like Eclipse MAT ( https://www.eclipse.org/mat/), Visual VM etc and fix any memory leaks. To do that, include the following option in your JVM, so that an Heap dump will be created upon an Out of Memory error:

-XX:+HeapDumpOnOutOfMemoryError

Then, open the Heap Dump with Eclipse Mat and generate a Leak Suspect report as in this example:

As you can see, MAT has found one leak suspect, which occupies 71% of application’s memory, taken by instances of class java.util.LinkedList. If you click on the “Details” link you will see some more info about where the instances reside and why they are so big. That should be also evident by clicking on the See Stracktrace link, which will print the stack trace of that Thread.

2) If you cannot find any memory leak, increase the heap size if current heap is not enough. for example:

java -Xmx6g

3) Also, if you don’t have memory leaks in your application, it is recommended to upgrade to a newer version of JDK which uses the G1GC algorithm. The throughput goal for the G1 GC is 90 percent application time and 10 percent garbage collection time. As a proof of concept, consider the following Class, which reproduces the OutOfMemoryError: GC overhead limit exceeded with JDK 1.6, using the Parallel GC:

 

class Main {
    public static void main(String args[]) throws Exception {
        Map map = System.getProperties();
        Random rnd = new Random();
        while (true) {
            map.put(rnd.nextInt(), "val");
            System.gc();
        }
    }
}

The same application code will not trigger the OutOfMemoryError: GC overhead limit exceeded when upgrading to JDK 1.8 and using the G1GC algorithm.

 

4) If the new generation size is explicitly defined with JVM options (e.g. -XX:NewSize, -XX:MaxNewSize), decrease the size or remove the relevant JVM options entirely to unconstrain the JVM and provide more space in the old generation for long lived objects.

5) Enable the standard garbage collection logging options and analyze the logging. For example:

java -verbose:gc -Xloggc:gc.log.`date +%Y%m%d%H%M%S` -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime

 6)  Check out as well promotion failures and frequent full GC sweeps that fail to free up a lot of memory:

java -XX:+PrintGCDetails -XX:+PrintPromotionFailure -XX:PrintFLSStatistics=1

 

Simplest way to read a File in a String with Java 8

Do you need a quick hack to read a text file into a Java String? With Java 8 this is a piece of cake! See this example:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

    public static void main(String[] args) throws IOException {
        String text = new String(Files.readAllBytes(Paths.get("myfile.txt")));
    }
}

Speed up your Java coding with Lombok

We all know that one of the most annoying things in Java is the thing we need to place lots of boiler plate code when building our applications. Think for example of adding Constructors with all fields, getter and setter methods, Logger static methods, not to mention the boilerplate code to use common patterns in your Java code.

Although IDEs are able to add boiler plate code so you don’t have to actually write that code, you still have to perform maintenance for example, if field names change, if you add or remove them. Also, a class with lots of fields becomes bloated with code, making it more complicated to read.

By using project Lombok you can include some simple annotations in your code that will, under the hoods, place all the boiler plate code in your classes.

How to install Lombok

In order to get started you need a couple of things. At first you need to add the Lombok library to your project. If you are using Maven:

    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.18.6</version>
        <scope>provided</scope>
    </dependency>

Next, you need to install the Lombok plugin your for IDE so that it can find the “hidden” boiler plate code which is created with Annotations. For example, if you are using IntelliJ IDEA, install the Lombok plugin from the Settings of your Project:

Once that Lombok plugin has been installed, you are ready to add annotations in your code. Let’s see an example class:

public class Person {
    String firstName;
    String lastName;


    public static void main(String[] args) {

    }
}

We would typically need to add getter/setters method to our POJO, along with Constructors with fields, toString() method and overriding equals(). All this can be done in one shot by adding the @Data annotation to the class:

import lombok.Data;

public @Data
class Person {
    String firstName;
    String lastName;


    public static void main(String[] args) {

    }
}

As you can see, even if no boiler plate code has been added to the class, all the stuff is actually available and ready to be used:

Some useful Lombok annotations

Besides the @Data annotation which adds a bulk of boiler-plate code, you can choose to add just single methods, like a Constructor with all args, using @AllArgsConstructor

@AllArgsConstructor
public class Person {
    String firstName;
    String lastName;


    public static void main(String[] args) {

    }
}

You can then add also getter and setters method using the @Getter and @Setter annotations:

@AllArgsConstructor
@Getter @Setter
public class Person {
    String firstName;
    String lastName;


    public static void main(String[] args) {

    }
}

Much the same thing, you can add the equals method with @EqualsAndHashCode and the toString method using @ToString

Avoiding the Null Pointer Exception trap

As you certainly know, in order to avoid Null Pointer Exceptions in your code you often end up including boiler plate code in your classes that check if fields/parameters are null or not. This can be avoided using the elegant @NonNull annotation. Here is an example:

 import lombok.NonNull;

public class NonNullExample extends Something {
  private String name;
  
  public NonNullExample(@NonNull Person person) {
    super("Hello");
    this.name = person.getName();
  }
}

The @NonNull looks like if (param == null) throw new NullPointerException(“param is marked @NonNull but is null”); and will be inserted at the very top of your method. For constructors, the null-check will be inserted immediately following any explicit this() or super() calls.

Adding Design Patterns using Lombok Annotations

Another areas where Lombok can be pretty useful is the Design Pattern area. For example, consider the Builder pattern that allows us to write readable, understandable code to set up complex objects.You can include static builder method with the @Builder annotation:

    @Builder
    public class Person {
    	String firstName;
    	String lastName;

    	public static void main(String[] args) {
    		Person emp = new PersonBuilder().firstName("John")
    				.lastName("Doe")
    				.build();
    	}
    }

Another Pattern that you can implement with Lombok is the Delegate pattern. Any field or no-argument method can be annotated with @Delegate to let lombok generate delegate methods that forward the call to this field (or the result of invoking this method).

Lombok delegates all public methods of the field’s type (or method’s return type), as well as those of its supertype except for all methods declared in java.lang.Object.

You can pass any number of classes into the @Delegate annotation’s types parameter. If you do that, then lombok will delegate all public methods in those types (and their supertypes, except java.lang.Object) instead of looking at the field/method’s type. Here is an example:

import java.util.ArrayList;
import java.util.Collection;

import lombok.Delegate;

public class DelegationExample {
    private interface SimpleCollection {
        boolean add(String item);

        boolean remove(Object item);
    }

    @Delegate(types = SimpleCollection.class)
    private final Collection<String> collection = new ArrayList<String>();
}


class ExcludesDelegateExample {
    long counter = 0L;

    private interface Add {
        boolean add(String x);

        boolean addAll(Collection<? extends String> x);
    }

    @Delegate(excludes = Add.class)
    private final Collection<String> collection = new ArrayList<String>();

    public boolean add(String item) {
        counter++;
        return collection.add(item);
    }

    public boolean addAll(Collection<? extends String> col) {
        counter += col.size();
        return collection.addAll(col);
    }
}

Logging easier with Lombok

Another area where we have to create boiler plate code is Logging, by placing a static final log field in each class, initialized with the name of the class. This can be avoided by putting the variant of @Log on your class (whichever one applies to the logging system you use). Here is an example:

import lombok.extern.java.Log;
import lombok.extern.slf4j.Slf4j;

@Log
public class LogExample {
  
  public static void main(String... args) {
    log.severe("severe error!");
  }
}

@Slf4j
public class LogExampleOther {
  
  public static void main(String... args) {
    log.error("error!");
  }
}

@CommonsLog(topic="CounterLog")
public class LogExampleCategory {

  public static void main(String... args) {
    log.error("Calling the 'CounterLog' with a message");
  }
}

Here is the list of available Log annotations:

  • @CommonsLog: Creates private static final org.apache.commons.logging.Log log = org.apache.commons.logging.LogFactory.getLog(LogExample.class);
  • @Flogger: Creates private static final com.google.common.flogger.FluentLogger log = com.google.common.flogger.FluentLogger.forEnclosingClass();
  • @JBossLog: Creates private static final org.jboss.logging.Logger log = org.jboss.logging.Logger.getLogger(LogExample.class);
  • @Log: Creates private static final java.util.logging.Logger log = java.util.logging.Logger.getLogger(LogExample.class.getName());
  • @Log4j: Creates private static final org.apache.log4j.Logger log = org.apache.log4j.Logger.getLogger(LogExample.class);
  • @Log4j2: Creates private static final org.apache.logging.log4j.Logger log = org.apache.logging.log4j.LogManager.getLogger(LogExample.class);
  • @Slf4j: Creates private static final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(LogExample.class);
  • @XSlf4j: Creates private static final org.slf4j.ext.XLogger log = org.slf4j.ext.XLoggerFactory.getXLogger(LogExample.class);

Checkout the Lombok project for more information: https://projectlombok.org/

Getting started with Java-based Machine Learning Libraries

There are over 70 Java-based open source machine learning projects listed on the MLOSS.org website, and probably many more unlisted projects live at university servers, GitHub, or Bitbucket. In this article, we will review the major libraries and platforms, the kind of problems they can solve, the algorithms they support, and the kind of data they can work with.

 

Weka

Waikato Environment for Knowledge Analysis (WEKA) is a machine learning library that was developed at the University of Waikato, New Zealand, and is probably the most well-known Java library. It is a general purpose library that is able to solve a wide variety of machine learning tasks, such as classification, regression, and clustering. It features a rich graphical user interface, command-line interface, and Java API. You can check out Weka at http://www.cs.waikato.ac.nz/ml/weka/.

Currently, Weka contains 267 algorithms in total: data preprocessing (82), attribute selection (33), classification and regression (133), clustering (12), and association rules mining (7). Graphical interfaces are well suited for exploring your data, while the Java API allows you to develop new machine learning schemes and use the algorithms in your applications.

Weka is distributed under the GNU General Public License (GNU GPL), which means that you can copy, distribute, and modify it as long as you track changes in source files and keep it under GNU GPL. You can even distribute it commercially, but you must disclose the source code or obtain a commercial license.

In addition to several supported file formats, Weka features its own default data format, ARFF, to describe data by attribute-data pairs. It consists of two parts. The first part contains a header, which specifies all of the attributes and their types, for instance, nominal, numeric, date, and string. The second part contains the data, where each line corresponds to an instance. The last attribute in the header is implicitly considered the target variable and missing data is marked with a question mark. Consider the following example:

@RELATION person_dataset 
@ATTRIBUTE `Name` STRING 
@ATTRIBUTE `Height` NUMERIC 
@ATTRIBUTE `Eye color`{blue, brown, green} 
@ATTRIBUTE `Hobbies` STRING @DATA 'Bob', 185.0, blue, 'climbing, sky diving' 'Anna', 163.0, brown, 'reading' 'Jane', 168.0, ?, ? 

The file consists of three sections. The first section starts with the @RELATION  keyword, specifying the dataset name. The next section starts with the @ATTRIBUTE keyword, followed by the attribute name and type. The available types are STRING, NUMERIC, DATE, and a set of categorical values. The last attribute is implicitly assumed to be the target variable that we want to predict. The last section starts with the @DATA keyword, followed by one instance per line. Instance values are separated by commas and must follow the same order as attributes in the second section.  

Weka’s Java API is organized into the following top-level packages:

  • weka.associations: These are data structures and algorithms for association rules learning, including Apriori, predictive Apriori, FilteredAssociator, FP-Growth, Generalized Sequential Patterns (GSP), hotSpot, and Tertius.
  • weka.classifiers: These are supervised learning algorithms, evaluators, and data structures. The package is further split into the following components:
  • weka.classifiers.bayes: This implements Bayesian methods, including Naive Bayes, Bayes net, Bayesian logistic regression, and so on.
  • weka.classifiers.evaluation: These are supervised evaluation algorithms for nominal and numerical prediction, such as evaluation statistics, confusion matrix, ROC curve, and so on.
  • weka.classifiers.functions: These are regression algorithms, including linear regression, isotonic regression, Gaussian processes, Support Vector Machines (SVMs), multilayer perceptron, voted perceptron, and others..weka.classifiers.lazy: These are instance-based algorithms such as k-nearest neighbors, K*, and lazy Bayesian rules.
  • weka.classifiers.meta: These are supervised learning meta-algorithms, including AdaBoost, bagging, additive regression, random committee, and so on.
  • weka.classifiers.mi: These are multiple-instance learning algorithms, such as citation k-nearest neighbors, diverse density, AdaBoost, and others.
  • weka.classifiers.rules: These are decision tables and decision rules based on the separate-and-conquer approach, RIPPER, PART, PRISM, and so on.
  • weka.classifiers.trees: These are various decision trees algorithms, including ID3, C4.5, M5, functional tree, logistic tree, random forest, and so on.
  • weka.clusterers: These are clustering algorithms, including k-means, CLOPE, Cobweb, DBSCAN hierarchical clustering, and FarthestFirst.
  • weka.core: These are various utility classes such as the attribute class, statistics class, and instance class.
  • weka.datagenerators: These are data generators for classification, regression, and clustering algorithms.
  • weka.estimators: These are various data distribution estimators for discrete/nominal domains, conditional probability estimations, and so on.
  • weka.experiment: These are a set of classes supporting necessary configuration, datasets, model setups, and statistics to run experiments.
  • weka.filters: These are attribute-based and instance-based selection algorithms for both supervised and unsupervised data preprocessing.
  • weka.gui: These are graphical interface implementing explorer, experimenter, and knowledge flow applications. The Weka Explorer allows you to investigate datasets, algorithms, as well as their parameters, and visualize datasets with scatter plots and other visualizations. The Weka Experimenter is used to design batches of experiments, but it can only be used for classification and regression problems.The Weka KnowledgeFlow implements a visual drag-and-drop user interface to build data flows and, for example, load data, apply filter, build classifier, and evaluate it.

Java Machine Learning Library

The Java Machine Learning Library (Java-ML) is a collection of machine learning algorithms with a common interface for algorithms of the same type. It only features the Java API, and so it is primarily aimed at software engineers and programmers. Java-ML contains algorithms for data preprocessing, feature selection, classification, and clustering. In addition, it features several Weka bridges to access Weka’s algorithms directly through the Java-ML API. It can be downloaded from http://java-ml.sourceforge.net.

Java-ML is also a general-purpose machine learning library. Compared to Weka, it offers more consistent interfaces and implementations of recent algorithms that are not present in other packages, such as an extensive set of state-of-the-art similarity measures and feature-selection techniques, for example, dynamic time warping (DTW), random forest attribute evaluation, and so on. Java-ML is also available under the GNU GPL license.

Java-ML supports all types of files as long as they contain one data sample per line and the features are separated by a symbol such as a comma, semicolon, or tab. The library is organized around the following top-level packages:

  • net.sf.javaml.classification: These are classification algorithms, including Naive Bayes, random forests, bagging, self-organizing maps, k-nearest neighbors, and so on
  • net.sf.javaml.clustering: These are clustering algorithms such as k-means, self-organizing maps, spatial clustering, Cobweb, ABC, and others
  • net.sf.javaml.core: These are classes representing instances and datasets
  • net.sf.javaml.distance: These are algorithms that measure instance distance and similarity, for example, Chebyshev distance, cosine distance/similarity, Euclidean distance, Jaccard distance/similarity, Mahalanobis distance, Manhattan distance, Minkowski distance, Pearson correlation coefficient, Spearman’s footrule distance, DTW, and so on
  • net.sf.javaml.featureselection: These are algorithms for feature evaluation, scoring, selection, and ranking, for instance, gain ratio, ReliefF, Kullback-Leibler divergence, symmetrical uncertainty, and so on
  • net.sf.javaml.filter: These are methods for manipulating instances by filtering, removing attributes, setting classes or attribute values, and so on
  • net.sf.javaml.matrix: This implements in-memory or file-based arrays
  • net.sf.javaml.sampling: This implements sampling algorithms to select a subset of datasets
  • net.sf.javaml.tools: These are utility methods on dataset, instance manipulation, serialization, Weka API interface, and so on
  • net.sf.javaml.utils: These are utility methods for algorithms, for example, statistics, math methods, contingency tables, and others

Apache Mahout

The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers.

Mahout features a console interface and the Java API as scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems:

  • Item recommendation: Recommending items such as People who liked this movie also liked
  • Clustering: Sorting of text documents into groups of topically-related documents
  • Classification: Learning which topic to assign to an unlabelled document

Mahout features the following libraries:

  • org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS
  • org.apache.mahout.classifier: These are in-memory and distributed implementations, including logistic regression, Naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron
  • org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering
  • org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on
  • org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes
  • org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation
  • org.apache.mahout.math: These are various math utility methods and implementations in Hadoop
  • org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, and MapReduce jobs

Apache Spark

Apache Spark, or simply Spark, is a platform for large-scale data processing builds atop Hadoop, but, in contrast to Mahout, it is not tied to the MapReduce paradigm. Instead, it uses in-memory caches to extract a working set of data, process it, and repeat the query. This is reported to be up to ten times as fast as a Mahout implementation that works directly with data stored in the disk. It can be grabbed from https://spark.apache.org.

There are many modules built atop Spark, for instance, GraphX for graph processing, Spark Streaming for processing real-time data streams, and MLlib for machine learning library featuring classification, regression, collaborative filtering, clustering, dimensionality reduction, and optimization.

Spark’s MLlib can use a Hadoop-based data source, for example, Hadoop Distributed File System (HDFS) or HBase, as well as local files. The supported data types include the following:

Local vectors are stored on a single machine. Dense vectors are presented as an array of double-typed values, for example, (2.0, 0.0, 1.0, 0.0), while sparse vector is presented by the size of the vector, an array of indices, and an array of values, for example, [4, (0, 2), (2.0, 1.0)].

Labelled point is used for supervised learning algorithms and consists of a local vector labelled with double-typed class values. The label can be a class index, binary outcome, or a list of multiple class indices (multiclass classification). For example, a labelled dense vector is presented as [1.0, (2.0, 0.0, 1.0, 0.0)].

Local matrices store a dense matrix on a single machine. It is defined by matrix dimensions and a single double-array arranged in a column-major order.

Distributed matrices operate on data stored in Spark’s Resilient Distributed Dataset (RDD), which represents a collection of elements that can be operated on in parallel. There are three presentations: row matrix, where each row is a local vector that can be stored on a single machine, row indices are meaningless; indexed row matrix, which is similar to row matrix, but the row indices are meaningful, that is, rows can be identified and joins can be executed; and coordinate matrix, which is used when a row cannot be stored on a single machine and the matrix is very sparse.

Spark’s MLlib API library provides interfaces for various learning algorithms and utilities, as outlined in the following list:

  • org.apache.spark.mllib.classification: These are binary and multiclass classification algorithms, including linear SVMs, logistic regression, decision trees, and Naive Bayes
  • org.apache.spark.mllib.clustering: These are k-means clustering algorithms
  • org.apache.spark.mllib.linalg: These are data presentations, including dense vectors, sparse vectors, and matrices
  • org.apache.spark.mllib.optimization: These are the various optimization algorithms that are used as low-level primitives in MLlib, including gradient descent, stochastic gradient descent (SGD), update schemes for distributed SGD, and the limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
  • org.apache.spark.mllib.recommendation: These are model-based collaborative filtering techniques implemented with alternating least squares matrix factorization
  • org.apache.spark.mllib.regression: These are regression learning algorithms, such as linear least squares, decision trees, Lasso, and Ridge regression
  • org.apache.spark.mllib.stat: These are statistical functions for samples in sparse or dense vector format to compute the mean, variance, minimum, maximum, counts, and nonzero counts
  • org.apache.spark.mllib.tree: This implements classification and regression decision tree-learning algorithms
  • org.apache.spark.mllib.util: These are a collection of methods used for loading, saving, preprocessing, generating, and validating the data

 

Learn more

If you found this article helpful and would want to learn more about machine learning with Java, you can explore Machine Learning in Java – Second Edition. Written by Ashish Singh Bhatia and Bostijan Kaluza, Machine Learning in Java – Second Edition, will provide you with the techniques and tools you need to quickly gain insight from complex data.

 

Top 5 solutions for Java Http Clients

In this tutorial we will check some of the best solutions if you want to implement an Http Client. You can run the Http Client on the top of WildFly application server or as part of any Java process.

First of all, it is worth mentioning that WildFly ships with an embedded Web Server but it is not supposed to provide native Http Client libraries, therefore you have to use one of the following options:

Use Java built-in HttpURLConnection

This is the simplest solution and it does not require any additional library to be included in your classpath:

URL url = new URL("http://www.acme.com");
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setDoOutput(true);
out.flush();
out.close();

You can additionally send parameters along with your HTTP Request:

Map<String, String> params = new HashMap<>();
params.put("key1", "val1");
params.put("key2", "val2");
 
con.setDoOutput(true);
DataOutputStream out = new DataOutputStream(con.getOutputStream());
out.writeBytes(ParameterStringBuilder.getParamsString(parameters));
out.flush();
out.close();

Use Java 9’s new HttpClient API

If you are using Java 9 and above, you can use the following classes to initiate and handle the communicating via HTTP: a jdk.incubator.http.HttpClient will be used to send requests that are collected via jdk.incubator.http.HttpRequests and replied using jdk.incubator.http.HttpResponses.

Check this example:

HttpClient client = HttpClient.newHttpClient();

HttpRequest request = HttpRequest.newBuilder()
    .uri(new URI("http://www.acme.com/"))
    .build();

HttpResponse<String> response = client.send(request, HttpResponse.BodyHandler.asString());

System.out.println(response.statusCode());
System.out.println(response.body());

Using Apache HttpComponents HttpClient.

The Apache Commons HttpClient project is well-known to developers but it is now end of life, and is no longer being developed. It has been replaced by the Apache HttpComponents project in its HttpClient and HttpCore modules, which offer better performance and more flexibility.

This example demonstrates how to process HTTP responses using a response handler. This is the recommended way of executing HTTP requests and processing HTTP responses. This approach enables the caller to concentrate on the process of digesting HTTP responses and to delegate the task of system resource deallocation to HttpClient. The use of an HTTP response handler guarantees that the underlying HTTP connection will be released back to the connection manager automatically in all cases.

import java.io.IOException;

import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

public class ClientWithResponseHandler {

    public final static void main(String[] args) throws Exception {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {
            HttpGet httpget = new HttpGet("http://www.acme.com");

            System.out.println("Executing request " + httpget.getRequestLine());

            // Create a custom response handler
            ResponseHandler<String> responseHandler = new ResponseHandler<String>() {

                @Override
                public String handleResponse(
                        final HttpResponse response) throws ClientProtocolException, IOException {
                    int status = response.getStatusLine().getStatusCode();
                    if (status >= 200 && status < 300) {
                        HttpEntity entity = response.getEntity();
                        return entity != null ? EntityUtils.toString(entity) : null;
                    } else {
                        throw new ClientProtocolException("Unexpected response status: " + status);
                    }
                }

            };
            String responseBody = httpclient.execute(httpget, responseHandler);
            System.out.println("----------------------------------------");
            System.out.println(responseBody);
        } finally {
            httpclient.close();
        }
    }

}

In order to compile examples using the Http Components library, you need to include in your pom.xml:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.6</version>
</dependency>

The following code snippet show you how to send POST request with a JSON body using HttpClient. The Http payload in this example is placed in an object called StringEntity:

public void postJson() 
  throws ClientProtocolException, IOException {
    CloseableHttpClient client = HttpClients.createDefault();
    HttpPost httpPost = new HttpPost("http://www.acme.com");
 
    String json = "{"id":1,"name":"Frank"}";
    StringEntity entity = new StringEntity(json);
    httpPost.setEntity(entity);
    httpPost.setHeader("Accept", "application/json");
    httpPost.setHeader("Content-type", "application/json");
 
    CloseableHttpResponse response = client.execute(httpPost);
    client.close();
}

Please note that we have a specialized tutorial on writing high performance Java HTTP Clients with Apache HTTP Client: Writing high performance Java HTTP Client applications

Use Unirest HTTP Api

Unirest is a set of lightweight HTTP libraries available in multiple languages, built and maintained by Mashape, who also maintain the open-source API Gateway Kong. Using Unirest can be that simple:

Unirest.post("http://httpbin.org/post")
  .queryString("name", "John")
  .field("surname", "Smith")
  .asJson()

Besides the standard post and get requests, you can also create multipart requests for uploading files: simply pass along a File or an InputStream Object as a field:

HttpResponse<JsonNode> jsonResponse = Unirest.post("http://acme.com/post")
  .header("accept", "application/json")
  .field("parameter", "value")
  .field("file", new File("/tmp/file"))
  .asJson();

Asynchronous requests are supported as well by using anonymous callbacks, or direct method placement::

Future<HttpResponse<JsonNode>> future = Unirest.post("http://acme.com/post")
  .header("accept", "application/json")
  .field("param1", "value1")
  .field("param2", "value2")
  .asJsonAsync(new Callback<JsonNode>() {

    public void failed(UnirestException e) {
        System.out.println("The request has failed");
    }

    public void completed(HttpResponse<JsonNode> response) {
         int code = response.getStatus();
         Map<String, String> headers = response.getHeaders();
         JsonNode body = response.getBody();
         InputStream rawBody = response.getRawBody();
    }

    public void cancelled() {
        System.out.println("The request has been cancelled");
    }

});

You can use Maven by including the library:

<dependency>
    <groupId>com.mashape.unirest</groupId>
    <artifactId>unirest-java</artifactId>
    <version>1.4.9</version>
</dependency>

Also, as Unirest brings in the picture also Apache Http libraries, the following ones must be included in your pom.xml:

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.3.6</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpasyncclient</artifactId>
  <version>4.0.2</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpmime</artifactId>
  <version>4.3.6</version>
</dependency>
<dependency>
  <groupId>org.json</groupId>
  <artifactId>json</artifactId>
  <version>20140107</version>
</dependency>

Using OkHttp

OkHttp is an HTTP client with an eye on efficiency as it provides native HTTP/2 support, Conneciton pooling, transparent GZIP shrinks and response cachine

Using OkHttp is easy. Its request/response API is designed with fluent builders and immutability.
The following code downloads a URL and print its contents as a string:

OkHttpClient client = new OkHttpClient();

String run(String url) throws IOException {
  Request request = new Request.Builder()
      .url(url)
      .build();

  Response response = client.newCall(request).execute();
  return response.body().string();
}

Posting to a Server is also pretty simple. The following example posts a JSON body:

public static final MediaType JSON
    = MediaType.parse("application/json; charset=utf-8");

OkHttpClient client = new OkHttpClient();

String post(String url, String json) throws IOException {
  RequestBody body = RequestBody.create(JSON, json);
  Request request = new Request.Builder()
      .url(url)
      .post(body)
      .build();
  Response response = client.newCall(request).execute();
  return response.body().string();
}

Finally, the list of dependencies is really minimal:

<dependency>
   <groupId>com.squareup.okhttp3</groupId>
   <artifactId>okhttp</artifactId>
   <version>3.12.0</version>
</dependency>