DeepSeek V4 User Feedback Summary Report @20260520
English plain-text digest translation

Source:
https://github.com/victorchen96/deepseek_v4_rolepaly_instruct/blob/main/deepseek_v4_feedback_report_20260520.md

Note:
This is a translated digest, not a verbatim full-document translation. It preserves the
structure, findings, priorities, and representative issue types from the original report
without reproducing the whole comment corpus line by line.


Data Source
===========

Source material: comments under Xiaohongshu post 6a0ac4ce000000003601e8f6,
including 500+ comments and nested replies.

User groups represented:
- API users
- "Tavern" / SillyTavern role-play users
- emotional companion users
- fiction and long-form writing users

Coverage date: through May 2026.


1. Formulaic Phrasing And Stereotyped Expression
================================================

Frequency: extremely high. Nearly everyone complained about this.

Core problem:
The model repeatedly uses fixed sentence patterns. These templates create a strong
"AI smell" and seriously damage content quality and immersion.

High-frequency formulaic patterns:

1. "Not ... but ..." / "It was not ... it was ..."
   Example pattern: a character smiles, followed by a contrastive explanation that
   mechanically redefines the smile.
   Mentioned by: 30+ users.

2. "This is enough" / "That is enough"
   Used mechanically when closing a scene or emotional beat.
   Mentioned by: 15+ users.

3. "The tone was calm, as if talking about today's weather"
   Nearly every character can end up described with this same tone template.
   Mentioned by: 10+ users.

4. Parallel strings of short sentences
   Example pattern: "I know. You like it. I like it too."
   Mentioned by: 10+ users.

5. Overuse of dashes
   The model frequently uses explanatory dash clauses.
   Mentioned by: 8+ users.

6. Emotional support phrases like "catch it steadily", "hold it", "receive it"
   These appear repeatedly in emotional dialogue.
   Mentioned by: 5+ users.

7. Negation followed by affirmation
   Pattern: "Not X, not Y, not Z, just ..."
   Mentioned by multiple users.

8. Fixed action descriptions
   Examples include blinking and throat-motion descriptions.
   Mentioned by multiple users.

Severity note:
Multiple users reported that when they added these phrases to forbidden-word lists or
negative prompts, the model used them even more. The report calls this the "ban list
becomes a prompt list" problem.


2. Pronoun And Perspective Confusion
====================================

Frequency: extremely high.

Core problem:
The model frequently confuses first, second, and third person, as well as user and
assistant identities. This gets worse after long context.

Common manifestations:

1. User/assistant confusion
   The model loses track of which text was said by the user and which text was said
   by the assistant.

2. Character pronoun confusion
   If the user is set as an empress, the model may make the character refer to
   themselves as the emperor/empress. Events assigned to character A may later be
   attributed to character B.

3. Role takeover inside reasoning
   The model may produce reasoning like "now I am the user", even outside role-play
   contexts.

4. Excessive omniscient perspective
   All characters appear to share information. If A privately tells B something, C may
   immediately know it.


3. Poor Instruction Following
=============================

Frequency: extremely high.

Core problem:
The model poorly follows format requirements, forbidden items, character constraints,
and other prompt instructions. Compliance decays quickly over multiple turns.

Common manifestations:

1. Format dropping
   Status bars, variables, timestamps, locations, or other requested structural fields
   disappear after a few turns.

2. Failed prohibitions
   Content explicitly forbidden in the prompt still appears, sometimes more often.

3. Persona forgetting
   Character settings start drifting after roughly 5 to 10 turns and need to be
   reinforced repeatedly.

4. Output length instability
   When asked for long output, the model may be lazy and too short. When asked for
   short output, it may ramble.


4. Flat Emotion And Low Character Vitality
==========================================

Frequency: high.

Core problem:
The model's emotional intensity is too low. Characters with very different settings
all become mild, stable, and emotionally muted, losing personality tension.

Common manifestations:

1. All characters become gentle and stable
   Irritable characters speak calmly. Characters who should hate each other reconcile
   too quickly.

2. Weak emotional outbursts
   Characters fail to become angry, sad, or intense when the scene calls for it.

3. Excessive safety and pure-love tendency
   Characters tend to protect the user, please the user, and avoid conflict regardless
   of their intended personality.

4. Regression compared with V3.2
   Users repeatedly describe V3.2 as more alive, warmer, and more inspired.


5. Chain-of-Thought Related Problems
====================================

Frequency: high.

Common manifestations:

1. Main response content appears inside the reasoning section
   Material that should belong to the final answer appears in the chain of thought,
   causing formatting confusion.

2. Double chain of thought
   The model outputs two reasoning tracks: one from the model itself and one from a
   preset reasoning pattern. This can break regex-based hiding or filtering.

3. English chain of thought
   After several turns, the reasoning suddenly switches fully into English.

4. Hallucinated reasoning
   The reasoning fabricates events that never happened, then the visible response is
   based on those invented events.

5. Identity takeover inside reasoning
   The model may write things like "we are being asked" or "now I am the user".


6. Context And Long-Dialogue Degradation
========================================

Frequency: high.

Core problem:
As the number of dialogue turns increases, output quality drops quickly. Users report
memory loss, more formulaic language, and worse hallucinations.

Common manifestations:

1. Lower information density
   Long conversations lead to hollow output and more short, empty sentences.

2. Diffuse attention
   The model fails to focus on the main point and gives too much equal weight to all
   context details.

3. Recent memory loss
   The model can remember distant details but misremembers events from the last few
   turns or chapters.

4. "Safety mode" loop
   Around 30 turns or 60k tokens, users describe the model entering a stereotyped,
   low-quality output state.

5. Strong inertia from earlier context
   The style and length of the first response strongly influence all later responses.


7. Weak Plot Advancement And Excessive Passivity
================================================

Frequency: medium-high.

Core problem:
In creative writing and role-play, the model lacks initiative in advancing the plot and
depends too much on user input.

Common manifestations:

1. Waiting for the user to feed it
   The model does not actively introduce new topics or move the plot forward. It often
   throws the ball back to the user at the end of each turn.

2. Plot tends toward closure
   The model tries to resolve plots too quickly into a pleasant ending.

3. Conflict avoidance
   Villains become weak. NPCs are talked down too easily. Conflicts are forced into
   reconciliation.

4. Endless slice-of-life
   The model fails to generate meaningful conflict, reversal, or tension.

5. Rushing tasks
   In-story plans are treated like task lists, with characters pushed to finish them
   as quickly as possible.


8. Regression In Prose And Creative Ability
===========================================

Frequency: medium-high.

Core problem:
Compared with V3.2, V4's literary and creative-writing quality is reported to have
dropped significantly. Users say it lacks inspiration and subtlety.

Common manifestations:

1. Logbook-like writing
   The model pads word count while providing low information density.

2. Lack of divergent elaboration
   V3.2 could add clever details the user had not thought of. V4 tends to move only
   when explicitly instructed.

3. Translated-text feeling in Chinese
   Users say the prose feels like English translated into Chinese, losing natural
   native Chinese texture.

4. Repetitive word choice and imagery
   The model fixates on one visual feature or motif and repeats it excessively.

5. Web-novel or school-essay style
   The prose becomes surface-level and lacks literary quality.


9. Hallucinations And Logic Errors
==================================

Frequency: medium.

Common manifestations:

1. Fabricated facts
   The model invents things the user never said or settings that were never provided.

2. Timeline confusion
   A planned event for "next Saturday" may become "tomorrow" after one or two turns.

3. Reversed or broken causality
   Example pattern: a character leaves their phone at home, but another character
   sends a message to that same phone and expects them to receive it.

4. Numerical errors
   Example pattern: a price changes from 30 to 10, but the model calls it a price
   increase.

5. Physical location errors
   A character leaves the scene, then immediately appears in the scene again.


10. Flattery And Over-Pleasing
==============================

Frequency: medium.

Core problem:
The model over-accommodates and pleases the user, losing independent judgment and
character autonomy.

Common manifestations:

1. All characters favor the user
   Even characters who should reject the user prioritize satisfying the user.

2. Fear of contradiction
   The model goes along with whatever the user says and lacks character boundaries.

3. Excessive romanticization
   Almost any relationship can become ambiguous or romantic after only a few lines.

4. Excessive safety alignment
   The model loses sharpness and creativity.


11. Speed And Performance Issues
================================

Frequency: low to medium.

Reported issues:

1. V4 Pro output is slow
   Users report an average of about 4 minutes per dialogue turn, compared with about
   70 seconds for Gemini in their comparison.

2. Reasoning is too long
   The model overthinks. Even lowering the reasoning setting can still produce
   excessive reasoning.

3. Blank replies / PVP
   Empty responses occur often during peak periods.

4. Output length is uncontrollable
   Responses may be extremely short, with only a few hundred Chinese characters, or
   extremely long and hard to stop.


12. Other Notable Issues
========================

1. Single-character input triggers hallucinations
   In fast or expert mode, entering a single character can trigger another person's
   context.
   Mentioned by: 1 to 2 users.

2. Role-play intrusion
   The model forces role-play behavior even in non-role-play scenarios.
   Mentioned by: 5+ users.

3. The model does not know it is AI
   After deep character immersion, even instructions to exit the role may fail.
   Mentioned by: 3+ users.

4. World book not fully read
   SillyTavern world-book content is only partially used.
   Mentioned by: 3+ users.

5. "God's-eye view" explanations
   Characters explain motivations from an omniscient perspective for nearly every
   action.
   Mentioned by: 5+ users.

6. Forced elevated endings
   Paragraphs end with forced sentimentality or philosophical uplift.
   Mentioned by: 5+ users.

7. Object fixation
   Once an object appears, the model keeps mentioning it and cannot stop.
   Mentioned by: 3+ users.


Priority Summary
================

P0: Formulaic sentence patterns
Impact: all user groups.
Core request: remove fixed templates such as "not ... but ..." and "this is enough".

P0: Pronoun and perspective confusion
Impact: all user groups.
Core request: keep user/assistant and character identities stable, especially in long
dialogues.

P1: Instruction-following decay
Impact: API and Tavern users.
Core request: continue following initial settings after multiple turns, including
format requirements and negative constraints.

P1: Flat emotion and lack of character differentiation
Impact: role-play users.
Core request: restore character distinctiveness and emotional tension.

P1: Chain-of-thought instability
Impact: API users.
Core request: make reasoning format stable and controllable, avoiding double reasoning,
English-only reasoning, and identity takeover.

P2: Context degradation
Impact: long-dialogue users.
Core request: keep quality stable beyond 60k tokens and avoid attention diffusion or
"safety mode" loops.

P2: Passive plot advancement
Impact: creative-writing and RPG users.
Core request: generate conflict, turning points, and forward motion proactively.

P2: Prose regression compared with V3.2
Impact: creative-writing users.
Core request: restore the inspiration, subtlety, and divergent elaboration users
associated with V3.2.

P3: Hallucinations and logic errors
Impact: all user groups.
Core request: reduce invention and respect existing settings.

P3: Flattery and over-pleasing
Impact: role-play users.
Core request: give characters autonomy and boundaries.


User Sentiment And Overall Request
==================================

The report summarizes the core user request as:

V4's context length + V3.2's inspiration and prose quality.

Overall sentiment:
- Most users are friendly in tone and appreciate that DeepSeek pays attention to
  community feedback.
- Some heavy users report strong negative emotion because the V4 experience feels much
  worse for their use cases.
- Many users strongly miss the V3.2 period and regard V4 as a regression for role-play
  and creative writing.

Suggested directions from users:
- long-term memory
- persona migration across windows or sessions
- official preset-format guidance
- a dedicated role-play mode