[DISCUSS] IBM using LLMs to convert COBOL to Java

bahmanm@lemmy.ml · edit-2 1 year ago

[DISCUSS] IBM using LLMs to convert COBOL to Java

FoxBJK@midwest.social · 1 year ago

Converting ancient code to a more modern language seems like a great use for AI, in all honesty. Not a lot of COBOL devs out there but once it’s Java the amount of coders available to fix/improve whatever ChatGPT spits out jumps exponentially!

gravitas_deficiency@sh.itjust.works · 1 year ago

The fact that you say that tells me that you don’t know very much about software engineering. This whole thing is a terrible idea, and has the potential to introduce tons of incredibly subtle bugs and security flaws. ML + LLM is not ready to be used for stuff like this at the moment in anything outside of an experimental context. Engineers are generally - and with very good reason - deeply wary of “too much magic” and this stuff falls squarely into that category.

FoxBJK@midwest.social · 1 year ago

All of that is mentioned in the article. Given how much it cost last time a company tried to convert from COBOL, don’t be surprised when you see more businesses opt for this cheaper path. Even if it only converts half of the codebase, that’s still a huge improvement.

Doing this manually is a tall order…

sugar_in_your_tea@sh.itjust.works · 1 year ago

And doing it manually is probably cheaper in the long run, especially considering that COBOL tends to power some very mission critical tasks, like financial systems.

The process should be:

set up a way to have part of your codebase in your new language
write tests for the code you’re about to port
port the code
go to 2 until it’s done

If you already have a robust test suite, step 2 becomes much easier.

We’re doing this process on a simpler task of going from Flow (JavaScript with types) to TypeScript, but I did a larger transition from JavaScript to Go and Ruby to Python using the same strategy and I’ve seen lots of success stories with other changes (e.g. C to Rust).

If AI is involved, I would personally use it only for step 2 because writing tests is tedious and usually pretty easy to review. However, I would never use it for both step 2 and 3 because of the risk of introducing subtle bugs. LLMs don’t understand the code, they merely spot patterns and that’s absolutely not what you want.

gravitas_deficiency@sh.itjust.works · edit-2 1 year ago

Yeah, I read the article.

They’re MASSIVELY handwaving a lot of detail away. Moreover, they’re taking the “we’ll fix it in post” approach by suggesting “we can just run an armful of security analysis software on the code after the system spits something out”. While that’s a great sentiment, you (and everyone considering this approach) needs to consider that complex systems are pretty much NEVER perfect. There WILL be misses. Add this to the fact that a ton of organizations that still use COBOL are banks - which are generally considered fairly critical to the day-to-day operation of our society, and you can see why I am incredibly skeptical of this whole line of thinking.

I’m sure the IBM engineers who made the thing are extremely good at what they do, but at the same time, I have a lot less faith in the organizations that will actually employ the system. In fact, I wouldn’t be terribly shocked to find that banks would assign an inappropriately junior engineer to the task - perhaps even an intern - because “it’s as simple as invoking a processing pipeline”. This puts a truly hilarious amount of trust into what’s effectively a black box.

Additionally, for a good engineer, learning any given programming language isn’t actually that hard. And if these transition efforts are done in what I would consider to be the right way, you’d also have a team of engineers who know both the input and output languages such that they can go over (at the very, very least) critical and logically complex areas of the code to ensure accuracy. But since this is all about saving money, I’d bet that step simply won’t be done.

IHeartBadCode@kbin.social · 1 year ago

For those who have never worked on legacy systems. Any one who suggests “we’ll fix it in post” is asking you to do something that just CANNOT happen.

The systems I code for, if something breaks, we’re going to court over it. Not, oh no let’s patch it real quick, it’s your ass is going to be cross examined on why the eff your system just wrote thousands of legal contracts that cannot be upheld as valid.

Yeah, that fix it in post shit any article, especially this one that’s linked, suggests should be considered trash that has no remote idea how deep in shit one can be if you start getting wild hairs up your ass for changing out parts of a critical system.

gravitas_deficiency@sh.itjust.works · 1 year ago

And that’s precisely the point I’m making. The systems we’re talking about here are almost exclusively banking systems. If you don’t think there will be so Fucking Huge Lawsuits over any and all serious bugs introduced by this - and there will be bugs introduced by this - you straight up do not understand what it’s like to develop software for mission-critical applications.

PuppyOSAndCoffee@lemmy.ml · 1 year ago

Trusting IBM engineers, perhaps…sales/marketing? Oooh now I am skeptical.

Kerfuffle@sh.itjust.works · 1 year ago

Even if it only converts half of the codebase, that’s still a huge improvement.

The problem is it’ll convert 100% of the code base but (you hope) 50% of it will actually be correct. Which 50%? That’s left as an exercise to the reader. There’s no human, no plan, no logic necessarily to how it was converted also so it can be very difficult to understand code like that and you can’t ask the person who wrote why stuff is a certain way.

Understanding large, complex codebases one didn’t write is a difficult task even under pretty ideal conditions.

PuppyOSAndCoffee@lemmy.ml · edit-2 1 year ago

First, odds are only half the code is used, and in that half, 20% has bugs that the system design obscures. It’s that 20% that tends to take the lionshare of modernization effort.

It wasn’t a bug then, though it was there, but it is a bug now.

FoxBJK@midwest.social · 1 year ago

The problem is it’ll convert 100% of the code base

Please go read the article. They specifically say they aren’t doing this.

Kerfuffle@sh.itjust.works · 1 year ago

I was speaking generally. In other words, the LLM will convert 100% of what you tell it to but only part of the result will be correct. That’s the problem.

FoxBJK@midwest.social · 1 year ago

And in this case they’re not doing that:

“IBM built the Code Assistant for IBM Z to be able to mix and match COBOL and Java services,” Puri said. “If the ‘understand’ and ‘refactor’ capabilities of the system recommend that a given sub-service of the application needs to stay in COBOL, it’ll be kept that way, and the other sub-services will be transformed into Java.”

So you might feed it your COBOL code and find it only coverts 40%.

Kerfuffle@sh.itjust.works · 1 year ago

So you might feed it your COBOL code and find it only coverts 40%.

I’m afraid you’re completely missing my point.

The system gives you a recommendation: that has a 50% chance of being correct.

Let’s say the system recommends converting 40% of the code base.

The system converts 40% of the code base. 50% of the converted result is correct.

50% is a random number picked out of thin air. The point is that what you end up with has a good chance of being incorrect and all the problems I mentioned originally apply.

FoxBJK@midwest.social · 1 year ago

One would hope that IBM’s selling a product that has a higher success rate than a coinflip, but the real question is long-term project cost. Given the example of a $700 million dollar project, how much does AI need to convert successfully before it pays for itself? If we end up with 20% of the original project successfully done by AI, that’s massive savings.

The software’s only going to get better, and in spite of how lucrative a COBOL career is, we don’t exactly see a sharp increase in COBOL devs coming out of schools. We either start coming up with viable ways to move on from this language or we admit it’s too essential to ever be forgotten and mandate every CompSci student learn it before graduating.

HellAwaits@lemm.ee · 1 year ago

Is ChatGPT magic to people? ChatGPT should never be used in this way because the potential of critical errors is astronomically high. IBM doesn’t know what it’s doing.

socsa@lemmy.ml · 1 year ago

I’m more alarmed at the conversation in this thread about migrating these cobol apps to java. Maybe I am the one who is out of touch, but what the actual fuck? Is it just because of the large java hiring pool? If you are effectively starting from scratch why in the ever loving fuck would you pick java?

NightAuthor@beehaw.org · 1 year ago

Java is the new cobol, all the enterprises love it.

LeylaLove [she/her, love/loves]@hexbear.net · 1 year ago

This is what in thinking. Even the few people I know IRL that know COBOL from their starting days say it’s a giant pain in the ass as a language. It’s not like it’s really gonna cost all that much time to do compared to paying labor to rewrite it from the base, even if they don’t end up using it. Sure, correcting bad code can take a lot of time to do manually. But important code being in COBOL is a ticking time bomb, they gotta do something.

gravitas_deficiency@sh.itjust.works · 1 year ago

Counterpoint: if it ain’t broke, don’t fix it.

FaceDeer@kbin.social · edit-2 1 year ago

Counter counterpoint: The longer you let it sit the more obsolete the language becomes and the harder it becomes to fix it when something does break.

This is essentially preventative maintenance.

gravitas_deficiency@sh.itjust.works · 1 year ago

Counter^3 point: a system that was thoroughly engineered and tested a long time ago, and that still fulfills all the technical requirements that the system must meet will simply not spontaneously break.

Analogously: this would be like using an ML + LLM to rewrite the entire Linux kernel in Rust. While an (arguably) admirable goal, doing that in one fell swoop would be categorically rejected by the Linux community, to the extent that if some group of people somehow unilaterally just merged that work, the rest of the Linux kernel dev community would almost certainly trigger a fork of the entire kernel, with the vast majority of the community using the forked version as the new source of truth.

This is not preventative maintenance. This is fixing something that’s not broken, that has moreover worked reliably, performantly (enough), and correctly for literal decades. You do not let a black box rewrite your whole codebase in another language and then expect everything to magically work.

[DISCUSS] IBM using LLMs to convert COBOL to Java

[DISCUSS] IBM using LLMs to convert COBOL to Java

IBM taps AI to translate COBOL code to Java | TechCrunch