Annoyances with GPT4-Turbo

This is a non-exhaustive list of things I've noticed about GPT4 that irritate me. Some of them are more specific to programming, some of them are more general. I am creating it in hopes that people making custom trains / DPO datasets for open source language models like Mixtral will be able to avoid these specific behavioral patterns; especially models that are trained on synthetic GPT data.

Bullet Point Spam

The tendency to always use bullet point lists with bolded markdown when explaining literally any concept, in response to seemingly any question. This is particularly annoying because the model will ramble about vaguely related things which are ultimately irrelevant to the question asked. e.g:

⎗
✓
USER: "What would be the drawbacks involved in making a custom language model sampler?"
GPT4: "Creating a custom language model sampler involves significant challenges and drawbacks. Here are some of the key considerations:

1. **Complexity:** Developing a language model sampler requires a deep understanding of natural language processing (NLP), machine learning algorithms, and sometimes linguistics. It's a complex task that involves not just coding but also a theoretical understanding of how language works.
2. **Resources:** Training a language model, even a small one, requires substantial computational resources. Large-scale models like those from OpenAI or Google require clusters of high-end GPUs or TPUs, which can be prohibitively expensive for individuals or small organizations."
3. **Data:** A good language model requires a large and diverse dataset for training. The process of collecting, cleaning, and preparing this data is time-consuming and resource-intensive. Moreover, there could be legal and ethical considerations regarding the use of certain datasets.
4. **Bias and Fairness:** All language models are subject to biases present in their training data. Mitigating these biases is a significant challenge and requires ongoing effort to ensure the model doesn't propagate or amplify harmful stereotypes or misinformation.
5. **Ethical Concerns:** There are various ethical concerns surrounding the development of AI, such as the potential for misuse, the environmental impact of training large models, and the implications for job displacement in certain sectors.
... it does this for 12 different bullet points so I'm stopping here for brevity ...

Most of this text is irrelevant to the question, as the question does not ask anything about general language models, it specifically asks about sampling. Sampling doesn't require any extra data or "ethical" analysis. It does a wonderful job at evading the actual request.

Asking GPT4 to not use bullet points or numbered lists like this also seems to backfire, ironically. I tried various system prompts via API and also the ChatGPT interface itself to get it to avoid this behavior, but it seems like it's overfit to express answers in this particular pattern no matter what (probably caused by RLHF).

Needlessly Apologetic

Picture this: you give GPT4 a request to build a function step by step, and then present the complete code. You import what it gives you and run the code. It seems to work as expected, but you're not sure if there are edge cases yet. You try to coax it into analyzing what it created with a request like, "are you sure this works as intended?" Regardless of whether or not it is actually functional, what happens is universally:

An apology that almost always mentions confusion.

What makes this one obnoxious is the fact that it pops up even when there is no ostensible hint of the user being confused about anything.
In this example, I clearly communicate a desired change to the code, but it still prefaces the response with an apology:

What makes this even more irritating is:

if you ask the model if there's anything wrong with the code, it immediately jumps to the conclusion that, because the user asked if something was wrong, this means what it provided must be wrong in some ostensible way. This occurs even if you ask it to deliberately analyze what it wrote first, before making its conclusion.

Too Complicated For Me

This complaint is probably the most popular complaint w.r.t. GPT4's aggressive alignment / preference tuning. You ask the model to perform some arbitrary task that it clearly can do, but it then rambles about the limited scope of being an AI language model. Or, it does what you asked it to do, but sneakily prefaces the "solution" with terms like "high level overview" or "pseudo-code" and then it's followed with something like...

⎗

1
2
3

// .. your existing code

// Fill in working code here

Fence Sitting

This one is also obvious. If you ask it for any question that has some level of subjectivity, even if it is a mathematical question, it immediately uses the subjectivity as a way to dodge the question entirely.

Misc Phrases