Java is an amazing language with lots of features, performance, and its ecosystem. This coming from a Java developer, intersect that with AI use cases, and suddenly, questions can be asked is Java suitable for AI/ML workloads? Truth be told, the short answer is YES! In this article, I will highlight some of the latest advancements and areas we, the Java developer community, can feel at ease doing some of the most interesting things out there. This may give us all some inspiration to bridge and strengthen this intersection further in 2024.
Let’s start with the most interesting bits that have taken the tech world by storm. Although still in its infancy, LLMs have come to serious attention once OpenAI released ChatGPT for the masses. It has changed how we work, and how we search and ask questions. But most importantly, it has also given new opportunities when generating content for specific use cases. One of those areas is LangChain. It is originally a Python-based library. However, now also has a Java variant. Dmytro Liubarskyi, the author of LangChain4J, made his initial commit on June 20th this year, making it one of the most interesting projects for LLM-related work in Java space.
LangChain
LangChain is a framework that enables and enhances the use of LLMs for more use cases than just simplistic prompt engineering sent as questions to an LLM. It introduces concepts such as Chains for API and datasets that can be vectorized and shared with the LLM to give context. It enables context/memory and techniques like RAG (Retrieval Augmented Generation). All in all, it brings more to the basic model and enables all of us to write applications for interesting new use cases.
Let’s take a look at a basic example.
- Create an embedded store. In this example, the simple InMemoryStore is used. However, there are quite a few other options possible, e.g., Chroma, Redis, PgVector, etc.
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
- A prevalent method for handling and searching through unstructured data involves embedding this data and saving the resultant vectors. When a query is made, the unstructured query is also embedded, as in our example, the documents are split based on segment size and stored in the vector store.
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() .documentSplitter(DocumentSplitters.recursive(500, 0)) .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) .build();
- The following code loads a document. As per our instructions, it will split this up and load it into the vector store.
Document document = loadDocument(toPath("example-files/story-about-happy-carrot.txt")); ingestor.ingest(document);
- the program then retrieves those embedding vectors that closely match the query’s embedding. Essentially, a vector store is responsible for maintaining these embedded data records and executing vector-based searches on your behalf.
ConversationalRetrievalChain chain = ConversationalRetrievalChain.builder() .chatLanguageModel(OpenAiChatModel.withApiKey(ApiKeys.OPENAI_API_KEY)) .retriever(EmbeddingStoreRetriever.from(embeddingStore, embeddingModel)) // .chatMemory() // you can override default chat memory // .promptTemplate() // you can override default prompt template .build();
- And finally, sending our query to the LLM
String answer = chain.execute("Who is Charlie?");
For more in-depth examples, follow the langchain4j-examples
Quarkus
Clement Escoffier, in his recent post on Quarkus.io, introduced the first version of the Langchain4J extension. Relying solely on the knowledge of a Large Language Model (LLM) might not suffice. Hence, the Quarkus LangChain4j extension introduces two features to augment AI capabilities for application developers using Quarkus. The extension uses the RegisterAIService annotation, similar to how REST applications are developed in Quarkus. With this annotation, developers can introduce simple functions like Memory, Beans, etc. Furthermore, the ability to inject Embeddings, Stores, and Ingestors. Another interesting annotation introduced is Tools, This lets the LLM invoke Quarkus as required e.g. by providing Tool to a method that calls on a Panache entity to access data from the database.
@ApplicationScoped public class CustomerRepository implements PanacheRepository<Customer> { @Tool("get the customer name for the given customerId") public String getCustomerName(long id) { return find("id", id).firstResult().name; } }
LangChain4J is not the only Gen-AI support Quarkus can offer, it also integrates well with the Semantic Kernel. The reference superheroes Quarkus app showcases the use of Semantic Kernel. Let’s delve into Semantic Kernel.
Semantic Kernel
To build awesome Gen-AI apps LangChain is not the only option out there. Earlier in July this year, Microsoft announced Semantic Kernel for Java, an opensource library similar to LangChain, specifically for use cases with Azure AI and OpenAI. This functionality empowers developers to utilize a variety of prompts as distinct skills, link these prompts together, and establish shared contexts for them. For developers, it also offers a framework for managing the prompting pipeline and applying specialized design patterns. SK supports prompt templating, function chaining, vectorized memory, and intelligent planning capabilities out of the box. One of the interesting features is the ability to add Skills to a program. Semantic Kernel keeps its compatibility to Java 8 which might be great for certain types of Java applications even though we by now have more language features up to JDK 21. One of the interesting bits of this framework is the ability to add Skills so prompts are enhanced and then the obvious integration to Memory, context etc similar to LangChain.
The following code is from the Quarkus superheroes app
- Creating a textCompletion function using the OpenAI client.
var textCompletion = SKBuilders.chatCompletion() .withOpenAIClient(this.openAIAsyncClientInstance.get()) .build();
- Creating the Semantic Kernel with the text completion
var kernel = SKBuilders.kernel() .withDefaultAIService(textCompletion) .build();
- Registering which skills will be used. A skill refers to a domain of expertise made available to the kernel as a single function, or as a group of functions related to the skill. A Function is represented by a “skprompt.txt” and optionally a “config.json”. In this case a NarrationSkill
var skill = kernel.importSkillFromResources("skills", "NarrationSkill", "NarrateFight");
Spring AI
Spring framework offers an alternative to LangChain and Semantic Kernel with Spring AI. Similar concepts as Memory, Prompts, Function chaining, Transformers, Retrievers etc. Spring AI also like LangChain enables the integrations with multiple LLMs such as HuggingFace. This bring more flexibility for the Java developers. For example for a simple RAG (Retrieval Augmented Generation)
- Loading documents
JsonLoader jsonLoader = new JsonLoader(bikesResource, "name", "price", "shortDescription", "description"); List<Document> documents = jsonLoader.load();
- Adding the vectorized data to the Vector store
VectorStore vectorStore = new InMemoryVectorStore(embeddingClient); vectorStore.add(documents);
- Similarity search
List<Document> similarDocuments = vectorStore.similaritySearch(message);
The detailed example can be found here
These advancements demonstrate the growing integration of LLMs in Java applications, showcasing AI-enhanced capabilities in practical applications. There is more to come. Java’s role in the enterprise and the evolution process that it has been through brings so much more to this field. Take the example of Apache Camel or Mule, bringing more integrations into this space is a strength of Java, and would be great to see more of it next year.
That’s some of the amazing things to look forward to in 2024 and get our hands dirty with.
However one might ask, can Java do more in the areas of AI/ML? Let’s take a look at some of those advancements and how they can likely turn the industry to make use of the Java language in more ways than ever before.
Vectors
“A vector computation consists of a sequence of operations on vectors. A vector comprises a (usually) fixed sequence of scalar values, where the scalar values correspond to the number of hardware-defined vector lanes. A binary operation applied to two vectors with the same number of lanes would, for each lane, apply the equivalent scalar operation on the corresponding two scalar values from each vector.”
– JEP 448
Vectors are important for training model data primarily because of the performance optimization it brings. Imagine running training models and going through thousands of features * data input. The complexity is high, and if the operations were done scalar, it would take ages to complete. A good example is NumPy, a popular tool in the Python ecosystem that is used for vectorized operations enabling the handling and manipulating of data when training machine learning models. The ability to take advantage of SIMD(Single Instruction, Multiple Data) instructions of modern CPUs or GPU acceleration boosts the performance of the training process.
In recent years Java has made advancements in these areas and continues to do so. The Java Vector API was first proposed in JEP 338 and integrated into JDK 16 as an incubation API. As of JDK 22 – JEP 460 it’s in the incubation. Another notable improvement for this is Project Valhalla which aims to enhance Java’s object model, e.g., Value Classes and Objects
The following code is a basic example of 2 Vectors multiplied while they are split into multiple lanes by the mask defined. Once the multiplication is done. The vector is then again, this time multiplied with one value and stored in a variable named vm
.
static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED; public static void vectorComputation(float[] a, float[] b, float[] c) { for (int i = 0; i < a.length; i += SPECIES.length()) { var m = SPECIES.indexInRange(i, a.length); // FloatVector va, vb, vc; var va = FloatVector.fromArray(SPECIES, a, i, m); var vb = FloatVector.fromArray(SPECIES, b, i, m); var vc = va.mul(va) .add(vb.mul(vb)) .neg(); vc.intoArray(c, i, m); System.out.println(vc); // multiply the whole vector with a single float var vm = vc.mul(5.0f); System.out.println(vm); } }
Output: java --enable-preview --add-modules jdk.incubator.vector VectorExample
java --enable-preview --add-modules jdk.incubator.vector VectorExample [-2.0, -8.0, -18.0, -32.0, -0.0, -0.0, -0.0, -0.0] [-10.0, -40.0, -90.0, -160.0, -0.0, -0.0, -0.0, -0.0]
A detailed video with examples
GPU Support
GPU support is crucial for AI/ML workloads because of their parallel processing capabilities, computational power and optimization of the types of calcultations commonly encountered in machine learning tasks. GPU acceleration helps speedup training, reduce inference time and ultimately maxis AI/ML applications more efficient and practical. Good examples are Real-time interfaces like autonomous vehicles, image processing etc.
TornadoVM serves as a plugin for OpenJDK and GraalVM, enabling developers to offload JVM applications onto diverse hardware platforms, including multi-core CPUs, GPUs, and FPGAs. It’s important to note that this integration has not yet become an integral part of core Java frameworks. Alternatively, leveraging GraalVM to enhance the native experience and harness the potential of native instruction sets presents another approach.
Additionally, Project Babylon is driven by the objective of expanding Java’s applicability to diverse programming paradigms, encompassing SQL, differentiable programming, machine learning models, and GPUs. This endeavor opens up exciting new possibilities, making it a development worth monitoring.
Summary
Author: Shaaf Syed
Shaaf is a Principal Architect at Red Hat. Mostly developing code with Keycloak, Quarkus and Knative. For the last 15 years, he has helped customers create and adopt open source solutions for applications, cloud and managed service, continuous integration environments, and frameworks. Shaaf is a technical editor and InfoQ and spends his time writing about Kubernetes, Security, and Java