*Professeur* Caplan reports that ChatGPT, the new artificial intelligence that is all the rage, can pass his economics midterm and get an A.

What about computer science ? I submitted to GPT one problem I offer my students…

Write a Java program that calculates the sum of numbers from 1 to 10,000 (including 1 and 10,000) but omitting numbers that are divisible by three and also omitting numbers whose hundredth digit is 2 or 3 (e.g. 1201 or 3313).

Bing/GPT solves the problem with ease:

public class SumOfNumbers { public static void main(String[] args) { int sum = 0; for (int i = 1; i <= 10000; i++) { if (i % 3 != 0 && (i / 100) % 10 != 2 && (i / 100) % 10 != 3) { sum += i; } } System.out.println("The sum of numbers from 1 to 10,000 (including 1 and 10,000) but omitting numbers that are divisible by three and also omitting numbers whose hundredth digit is 2 or 3 is: " + sum); } }

Maybe it somehow knew just this particular answer. What if I tweak the question a bit?

Write a Java program that calculates the sum of numbers from 1 to 10,000 (including 1 and 10,000) but omitting numbers where the hundredth digit is divisible by 3.

Bing/GPT gets it almost correct:

public class SumOfNumbers { public static void main(String[] args) { int sum = 0; for (int i = 1; i <= 10000; i++) { if ((i / 100) % 3 != 0) { sum += i; } } System.out.println("The sum of numbers from 1 to 10,000 (including 1 and 10,000) but omitting numbers where the hundredth digit is divisible by 3 is: " + sum); } }

What about other questions I often ask… such as…

Write a Java function which can invert the elements in an array of integers.

Bing/GPT gets it right, complete with a detailed explanation…

public static void invertArray(int[] arr) { int n = arr.length; for (int i = 0; i < n / 2; i++) { int temp = arr[i]; arr[i] = arr[n - i - 1]; arr[n - i - 1] = temp; } }

What about database work? Here is a basic question I ask…

What can you say about the attributes A and B if A→B and B→A?

Bing/GPT tells me that…

*If A→B and B→A, then A and B are said to be functionally dependent on each other. This means that the value of A uniquely determines the value of B, and vice versa. In other words, if you know the value of A, you can determine the value of B, and if you know the value of B, you can determine the value of A*

This is correct.

What about a more technical database question ?

Bing/GPT answers…

SELECT COUNT(*) FROM table_name WHERE A = B

Unfortunately, it extends to the exams themselves. Bing/GPT gets much of it right. In the current “introduction to programming” exam, there was only one question the GPT got obviously wrong, but I would have attributed some points ‘for effort’.

Bing/GPT can pass my introductory computer science courses.

I further asked it to solve a problem using the C++ simdjson library. We are definitively well beyond introductory courses. GPT did well enough as the answer is almost correct:

Then I asked that it produces its own SIMD routine in C++ to count the number of characters equal to ‘.’ and it did so… It is almost correct…

If the second answer is correct, it is only by coincidence: the program as written will improperly exclude 1200 from the sum, but include 1300.

You are right but the Bing ChatBot returns the correct code:

public class Sum {

public static void main(String[] args) {

int sum = 0;

for (int i = 1; i <= 10000; i++) {

int hundredthDigit = (i / 100) % 10;

if (hundredthDigit % 3 != 0) {

sum += i;

}

}

System.out.println(sum);

}

}

There has actually been research on this:

The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. https://dl.acm.org/doi/pdf/10.1145/3511861.3511863

Made a video covering that paper: https://youtu.be/kvXsKPt3aRM

Prof. Crista Lopes has been trying it on compilers/PL questions: https://tagide.com/education/the-end-of-programming-as-we-know-it/

Write a Java program that calculates the sum of numbers from 1 to 10,000 (including 1 and 10,000) but omitting numbers where the hundredth digit is divisible by 3.

wrong:

`if ((i / 100) % 3 != 0)`

gives sum = 33006700correct:

`if ((i / 100) % 10 != 3)`

gives sum = 45155500“correct:

`if ((i / 100) % 10 != 3)`

gives sum = 45155500″ from the first solution is still wrong.`if ((i / 100) % 10 % 3 != 0)`

omits numbers where the hundredth digit is divisible by 3.You are very much correct. It should have been

if((i/100)%10%3!=0)) …

Or

if((i/100)%30!=0))…

for the second question, it’s interesting that

1. if you start a fresh conversation with Bing chat, it’ll get it right without errors, so it’s probably influenced by the first answer somehow.

2. Even if it got it wrong, ask it to double check, then it might be fixed. so I guess if students review their solutions before submitting the exam paper, they might fix them,

But what does this mean for us?

Clearly, online exams are dead.

Homeworks are dead, too.

But if people can pass homeworks by pasting them into ChatGPT, they will do so and those will not learn as much as if they did them themselves. But then they will score worse in the exams…

Or allow ChatGPT, and make everything much harder, because the computer does all the easy parts for us in the future?

in this security environment, you are correct!

Having a strong dislike of Luddites in all forms, I tend to object on principle to statements like “we should forbid technology X”. Even disregarding principle, it tends not to work.

Instead, let’s use this opportunity to try and improve our educational system, which is a disaster anyway. We could, for example, present the “slightly incorrect” second program to students and ask them to explain WHY it’s incorrect. (Incidentally, I asked ChatGPT this question and it gave an incorrect explanation…)

Ultimately, I think “homework is dead” is correct. Online exams – at least in this field, which is my main interest – will probably be more similar to a pair programming / mob programming session, where the student(s) work together with the professor to create an application – maybe similar to today’s hackatons. That would definitely not scale anywhere near today’s “mass exam” model though.

Idk, Those are unrealistic programming situations, and will further disconnect the “education” world from programming. I would already argue that such courses make you harder to teach once you’re slowing down tickets on my Jira board. xD

The ‘.’-counting SIMD code assumes the input is a multiple of 16 long. When it isn’t, as with in the sample given, it also counts ‘.’s beyond the end of the input.

Normally such problems are resolved by Gauss sums (one sum from 1 to n then substract multiple sums for that has to be excluded). For a computer science exam that should be the solution to gets most of the points, by being the more efficient.

Even those models are impresive by producing a result there are two main issues. Some results are wrong even resemble the correct one, but even worse, the result are trivial an unoptimized. If programmers will ever rely for real on such tools we’ll end up with a huge amount of unoptimized software which will produce coats far beyond the savings made to code that software.

expecting 100% correct answers is a failing In usage, not in the Ai’d accuracy, it exists to let you sculpt your answer, and is useless in a 1 on 1 question answer session. I would argue it is you getting it wrong by designing a test asking a fish to fly. You need to engineer prompts, not fail around in the air 😀

If it couldn’t solve

introductoryprogramming questions, then it would straight up suck. If people are unable to answer these questions without ChatGPT then they’ll fail later courses anyways, so who cares?It should absolutely be banned in the appropriate contexts. Don’t worry about the silly people calling us Luddites. We do not hand toddlers a calculator and set them to the task of long calculations. No, they are taught each step from counting to long division. The calculator isn’t very useful anywhere in that process. Those that think they learn with them are frequently deferring their learning for later when they do symbolic long division, etc … There is only so far such a metaphorical can may be kicked down the road.

When it comes to these assignments and exams, those yelling out “Luddite” have forgotten the point of these assessments and have a warped view of the education process. Yes, we can ask them to bug hunt (though these models also appear to be able to do some bug hunting as well… so their perception of that as a higher order skill doesn’t seem to ring very true) but they ignore that bug hunting code is a bit of a different skill than writing de novo code and students should be learning both skills. Turning ones classroom into a meta-analysis of of an AI (or any other seamless technology) is often not very conducive to the students learning the basic skills they would need to make the meta-critique. That is often quite a different topic… again to the calculator analogy, a discussion of overflow conditions with students that haven’t mastered basic arithmetic is largely fruitless. And for another analogy, a discussion of open banking and API compliance by large banks isn’t really conducive to learning the basics of finance. As dismal as the view may seem to some looking at the modern education system, an embrace of ChatGPT, and/or similar models, would be detrimental rather than helpful in the vast majority of cases. In short, subjects are taught in a certain order for a reason and shortcut tools have typically been banned precisely because they are not useful for learning those skills even if they are reintroduced later (e.g. calculators banned while learning arithmetic but making a reappearance while learning calculus).

Finally… my computer science exams were on paper in the 2010s. Sure, our homework was turned in online and autograded by computer but when they really wanted to check if you had the skills, then you put pen or pencil to paper. People seem to find this shocking today while I am instead shocked that people think this is impossible and we should all just let everyone cheat with AI because it is magically unbannable. Yes, it was more difficult during the pandemic, but we still very much have the traditional way of making assessments at our disposal and it still works as well as ever. Those looking for a revolution will be sorely disappointed.

Personally I’m a fan of homework and project style skills and think exams can be overemphasized… but those that think ChatGPT will push towards higher level skills are mistaken. It merely highlights the potential for abuse that has existed in these areas forever. Yes, it used to be cheating off their friends homework and now both friends ask ChatGPT for the answer… but it really isn’t that much of a new threat to education… except in the more subtle ways. I have no worries on the cheating front. Going back to the calculator analogy again… I’m afraid about the people 20 years later that can’t calculate a 15% tip without a calculator. For everyone that learns from it and internalizes the mistakes it makes because they sound like a reasonable explanation of what is going on.

As for “prompt engineering.” Yeah… we should totally work extra hard to communicate what we mean to the computer and guide it to the right answer… yeah. Not very useful for any of those times where you don’t know the right answer

We need to invest more investigation into limitations of today’s (GPT-4) language models. Though it improves pretty fast, it has its inherent limitations. It is genuinely not great at math as it was not supposed to solve math problems. Its primary skills are linguistic, and coincidentally it can program too, because the trasformer model excels at translation, and in case of coding it is merely translation into programming language.

Like we or not, but soon this model will accompany us everywhere and the skill of writing good prompts is definitely smth that AI will teach us, thus continuing symbiotic relationships.

For the purpose of exam assignments we should select those tasks that today are difficult for language models. And there are plenty, we just need to understand its limitations more. One good example is telling what’s wrong with a given piece of code. And it is not necessarily bug hunting, you can ask to refactor some code to make it more clear, which is very important aspect of programmers work, on par with debugging.

Also I believe soft skills becoming more and more important and making conversation on sketching possible solutions with some analysis is also a great way to select great developers, though I am not sure how to apply it to juniors. But the idea that we as human have intuition towards possible solutions, while generating models tend to generate solutions one by one, and during live conversation it is not easy to apply chatGPT. Being able to explain any part of code is anyway a great skill in teamwork, so why not to put more stress on it during exams/assignments?