Java, migrations argh #@! and now Large Language Models

Shaaf Syed

3 months ago

In today’s post, let’s talk about modernization, a world so fully loaded and spread out in almost every executive presentation. Well, maybe that’s an over-exaggeration on my part, but it indeed feels like that. Like most of my peers, I have taken on the challenge of demystifying this word in the context of Java applications. Transforming, upgrading, and updating a Java application to a future state is what we will talk about today, and mostly refactoring. For example, I won’t talk about moving an application from VMs to Kubernetes or a cloud platform, which could be termed re-platforming. There are typically 4 considerations that aid a modernization journey decision.

Technical debt: An inability to change the stack and frameworks leaves organizations sticky and vulnerable to upgrade and update.

Flexibility: The Inability to make changes to the application in a timely manner. As a result, organizations are challenged to bring in new features at their own pace and will.

Security risks: In some cases, older frameworks and applications threaten business and application and IT operations, e.g., data leaks etc.

Costs: Maintaining legacy applications is expensive; vendor support costs for outdated technology are usually higher. Furthermore, skills are costly and hard to find.

Brief history of Konveyor

Migrating legacy Java applications to modern frameworks like Spring Boot, Quarkus, or Micronaut can be daunting. It consumes time, is costly, and entails a business risk that needs validations and a migration process to rule it all. Konveyor, an opensource project part of the CNCF sandbox program, aims at making migrations easier and cost-effective. It provides a suite of tools that help with the migration process. e.g., migrating apps from Virtual machines to containers, Static code analysis of Java, .Net, Go applications, etc., and most recently, the introduction of Konveyor AI, a tool leveraging Gen-AI.

Code migrations and static code analysis

Analyzing the code base is one of the first steps for any migration. This can be done manually by a person going over every code segment. However, this person, often referred to as a ‘time traveler’ in the tech industry, needs to have a deep understanding of the current state of code and frameworks in use, as well as the future state. They are called ‘time travelers‘ because they have knowledge of a past distance away and a future from the current point in time, much like a time traveler in science fiction.

Enter static code analysis; it gives us insights into the current code base—the as-is state. Using Konveyor static code analysis, it is possible also to add a target framework, so a list of incidents would be reported by Konveyor if, for example, a code block needs to be migrated from Java 8 to 17, Javax to JakartaEE, EJB to REST, JMS to Reactive, etc. Konveyor uses the Language server protocol (LSP) for static code analysis.

Konveyor rules, are written in YAML. It consists of metadata, conditions, and actions. It instructs the analyzer to take specified actions when given conditions match. For example, the following rule checks for javax.ejb.Stateful annotation.

  when:

    or:

    - java.referenced:

        location: ANNOTATION

        pattern: javax.ejb.Stateful

More complex rules can form a ruleset. This way, frameworks, technologies, or certain domain areas can be grouped together. Following is another example of checking localhost use in code and files within a source base.

when:

    builtin.filecontent:

      filePattern: .*\.(java|properties|jsp|jspf|tag|xml|txt|yaml)

      pattern: http(s)?://((localhost)|(127\.0\.0\.1))+(:[0-9]+)?(/.*)?

Analysis can be executed on a CLI using the Kantra binary.

konveyor-analyzer --rules rules-file.yaml ...

It is also reported as an HTML and YAML file. The latter also used in the VSCode extension that reports incidents on code blocks.

Konveyor community has built about 2400 rules that are classified by different types of migrations.

Introducing LLMs to Static code analysis

Earlier this year, we embarked on a journey to use the strength of Static code analysis and combine it with Large Language Models (LLM). Most of the LLM use cases have been focusing on code generation, for example, with a defined Chat UI. This is great for sparring, and perhaps with some trial and error, one learns faster, but also while making mistakes. The inherent way of Chat UI is a transactional conversation focused on solving a certain issue, where a user continuously tries to explain its context or boundaries to an LLM, so the generation is helpful. The process can be painstaking.

What if we could provide context using the static code analysis and give enough to an LLM so that it generates a more predictable code for our use case and application context?

This is precisely the journey the Konveyor community embarked on. Let’s take a look at how this is done. The tool is called KonveyorAI (Kai).

Static Code Analysis: The tool starts by running analyzer-lsp on the codebase, identifying specific migration issues based on predefined or custom rulesets.

Generate Code Suggestions: Using the static analysis data and the relevant solved examples, the LLM creates suggestions for resolving migration issues. and finally display the suggestions in IDE.

Example JavaEE to Quarkus

To build this experience, we created a demo that takes a simple JavaEE application and migrates it to Quarkus. In this demo, we go through a standard migration scenario where a coolstore app (an e-store with cool swag) that is written in JavaEE and deployed on a platform like Wildfly is migrated to Quarkus. Let’s take a look at what’s going on.

The first step is static code analysis with a target for JakaratEE and Quarkus.
Next we are moving simple javax namespaces to jakartaee. A simple use case that could likely also be done without employing an LLM in most cases.
Next, we will convert JMS beans to Quarkus reactive messaging. This is a big leap from the previous namespace changes. The static code analysis is able to detect that we need to make JMS changes when moving to Quarkus and the LLM integration provides a comprehensive reasoning and git patch for the changes to reactive messaging.
We also do the same for EJB to REST
Note that the application.properites also changes as the changes are made to the codebase.
Finally, the static code analysis also identifies which files we won’t require further.

Okay, so that’s the process that we just went through. Looks great, doesn’t it!?

Let’s dissect this flow of changes further for our understanding.

Large Language Models (LLM) have limited context size. Only a certain amount of tokens can be processed with a given request. Using static code analysis, Kai can reduce and narrow down the problem to specific code areas and generate meaningful results. Furthermore, Since Kai can re-use the static code analysis from multiple applications, the few-shot prompting technique provides the LLM additional context to generate relevant code, even when dealing with unfamiliar frameworks.

Another challenge that we face is repeatable code changes. For example, a simple pattern to change logging across the application’s entire codebase doesn’t need to be called by the user every time. What if this could be done reactively and perhaps at one time? Using inspiration from Microsoft’s CodePlan research should allow Kai to automatically propagate changes to multiple related files.

Finally, Kai also includes an agent that iteratively refines code suggestions by checking the validity of the initial output and providing feedback to the LLM. This ensures the quality of the final code solution and a way to interact with the code base reactively.

Recap

Konveyor AI (Kai) simplifies modernizing legacy Java applications by integrating static code analysis with Large Language Models (LLMs). Designed to assist in complex migrations, Kai analyzes codebases using Konveyor’s Language Server Protocol (LSP)-based static analysis, identifying migration issues based on YAML-defined rulesets. These rules detect tasks such as migrating namespaces (e.g., javax to jakartaee), converting EJB to REST, or transitioning JMS to Quarkus reactive messaging.

Work on the Kai project is underway, and you are all welcome to try it out. The community is producing evaluation builds as Kai takes more shape toward a comprehensive user experience.

Author: Shaaf Syed

Shaaf is a Principal Architect at Red Hat. A contributor to Konveyor community a CNCF Sandbox project.
Mostly developing code with Java, Node and recently AI/ML. For the last 15 years, he has helped customers create and adopt open source solutions for applications, cloud and managed service, continuous integration environments, and frameworks. Shaaf is a technical editor at InfoQ and spends his time writing about Kubernetes, Security and Java.

Twitter Github