fbpx
3-5 Evaggelikis Scholis, 14231 Nea Ionia, Greece
+302108321279
+302110135659

Can Meta AI code? I tested it against Llama, Gemini and ChatGPT – it wasn’t even close

Originally posted on zdnet.

I threw my suite of simple coding tests against Meta AI. The results proved there’s really only one AI chatbot worth your time for programming.

How well do AI tools write code? Over the past year or so, I’ve been putting large language models through a series of tests to see how well they handle some fairly basic programming challenges.

The idea is simple: if they can’t handle these basic challenges, it’s probably not worth asking them to do anything more complex. On the other hand, if they can handle these basic challenges, they might become helpful assistants to programmers looking to save some time.

To set this benchmark, I’ve been using three tests (and just added a fourth). They are:

  1. Writing a WordPress plugin:This tests basic web development using the PHP programming language, inside of WordPress. It also requires a bit of user interface building. If an AI chatbot passes this test, it can help create rudimentary code as an assistant to web developers. I originally documented this test in “I asked ChatGPT to write a WordPress plugin I needed. It did it in less than 5 minutes.”
  2. Rewriting a string function: This test evaluates how an AI chatbot updates a utility function for better functionality. If an AI chatbot passes this test, it might be able to help create tools for programmers. If it fails, first-year programming students can probably do a better job. I originally documented this test in “OK, so ChatGPT just debugged my code. For real.”
  3. Finding an annoying bug: This test requires intimate knowledge of how WordPress works because the obvious answer is wrong. If an AI chatbot can answer this correctly, then its knowledge base is pretty complete, even with frameworks like WordPress. I originally documented this test in “OK, so ChatGPT just debugged my code. For real.”
  4. Writing a script: This test asks an AI chatbot to program using two fairly specialized programming tools not known to many users. It essentially tests the AI chatbot’s knowledge beyond the big languages. I originally documented this test in “Google unveils Gemini Code Assist and I’m cautiously optimistic it will help programmers.”

I’m going to take you through each test and compare the results to those of the other AI chatbots that I’ve tested. That way, you’ll be better able to gauge how AI chatbots differ when it comes to coding performance.

This time, I’m putting Meta’s new Meta AI to the test. Let’s get started.

Both AI chatbots generated the fields required, but ChatGPT’s presentation was cleaner, and it included headings for each of the fields. ChatGPT also placed the Randomize button in a more appropriate location given the functionality.

In terms of operation, ChatGPT took in a set of names and produced randomized results, as expected. Unfortunately, Meta AI took in a set of names, flashed something, and then presented a white screen. This is commonly described in the WordPress world as “The White Screen of Death.”

Here are the aggregate results of this and previous tests:

  • Meta AI: Interface: adequate, functionality: fail
  • Meta Code Llama: Complete failure
  • Google Gemini Advanced: Interface: good, functionality: fail
  • ChatGPT: Interface: good, functionality: good

This test is designed to test dollars and cents conversions. Meta AI had four main problems: it made changes to correct values when it shouldn’t have, didn’t properly test for numbers with multiple decimal points, completely failed if a dollar amount had less than two decimals (in other words, it would fail with $5 or $5.2 as inputs), and rejected correct numbers once processing was completed because it formatted those numbers incorrectly.

Related Posts