DeepSeek V4 User Feedback Summary Report @20260520 English plain-text digest translation Source: https://github.com/victorchen96/deepseek_v4_rolepaly_instruct/blob/main/deepseek_v4_feedback_report_20260520.md Note: This is a translated digest, not a verbatim full-document translation. It preserves the structure, findings, priorities, and representative issue types from the original report without reproducing the whole comment corpus line by line. Data Source =========== Source material: comments under Xiaohongshu post 6a0ac4ce000000003601e8f6, including 500+ comments and nested replies. User groups represented: - API users - "Tavern" / SillyTavern role-play users - emotional companion users - fiction and long-form writing users Coverage date: through May 2026. 1. Formulaic Phrasing And Stereotyped Expression ================================================ Frequency: extremely high. Nearly everyone complained about this. Core problem: The model repeatedly uses fixed sentence patterns. These templates create a strong "AI smell" and seriously damage content quality and immersion. High-frequency formulaic patterns: 1. "Not ... but ..." / "It was not ... it was ..." Example pattern: a character smiles, followed by a contrastive explanation that mechanically redefines the smile. Mentioned by: 30+ users. 2. "This is enough" / "That is enough" Used mechanically when closing a scene or emotional beat. Mentioned by: 15+ users. 3. "The tone was calm, as if talking about today's weather" Nearly every character can end up described with this same tone template. Mentioned by: 10+ users. 4. Parallel strings of short sentences Example pattern: "I know. You like it. I like it too." Mentioned by: 10+ users. 5. Overuse of dashes The model frequently uses explanatory dash clauses. Mentioned by: 8+ users. 6. Emotional support phrases like "catch it steadily", "hold it", "receive it" These appear repeatedly in emotional dialogue. Mentioned by: 5+ users. 7. Negation followed by affirmation Pattern: "Not X, not Y, not Z, just ..." Mentioned by multiple users. 8. Fixed action descriptions Examples include blinking and throat-motion descriptions. Mentioned by multiple users. Severity note: Multiple users reported that when they added these phrases to forbidden-word lists or negative prompts, the model used them even more. The report calls this the "ban list becomes a prompt list" problem. 2. Pronoun And Perspective Confusion ==================================== Frequency: extremely high. Core problem: The model frequently confuses first, second, and third person, as well as user and assistant identities. This gets worse after long context. Common manifestations: 1. User/assistant confusion The model loses track of which text was said by the user and which text was said by the assistant. 2. Character pronoun confusion If the user is set as an empress, the model may make the character refer to themselves as the emperor/empress. Events assigned to character A may later be attributed to character B. 3. Role takeover inside reasoning The model may produce reasoning like "now I am the user", even outside role-play contexts. 4. Excessive omniscient perspective All characters appear to share information. If A privately tells B something, C may immediately know it. 3. Poor Instruction Following ============================= Frequency: extremely high. Core problem: The model poorly follows format requirements, forbidden items, character constraints, and other prompt instructions. Compliance decays quickly over multiple turns. Common manifestations: 1. Format dropping Status bars, variables, timestamps, locations, or other requested structural fields disappear after a few turns. 2. Failed prohibitions Content explicitly forbidden in the prompt still appears, sometimes more often. 3. Persona forgetting Character settings start drifting after roughly 5 to 10 turns and need to be reinforced repeatedly. 4. Output length instability When asked for long output, the model may be lazy and too short. When asked for short output, it may ramble. 4. Flat Emotion And Low Character Vitality ========================================== Frequency: high. Core problem: The model's emotional intensity is too low. Characters with very different settings all become mild, stable, and emotionally muted, losing personality tension. Common manifestations: 1. All characters become gentle and stable Irritable characters speak calmly. Characters who should hate each other reconcile too quickly. 2. Weak emotional outbursts Characters fail to become angry, sad, or intense when the scene calls for it. 3. Excessive safety and pure-love tendency Characters tend to protect the user, please the user, and avoid conflict regardless of their intended personality. 4. Regression compared with V3.2 Users repeatedly describe V3.2 as more alive, warmer, and more inspired. 5. Chain-of-Thought Related Problems ==================================== Frequency: high. Common manifestations: 1. Main response content appears inside the reasoning section Material that should belong to the final answer appears in the chain of thought, causing formatting confusion. 2. Double chain of thought The model outputs two reasoning tracks: one from the model itself and one from a preset reasoning pattern. This can break regex-based hiding or filtering. 3. English chain of thought After several turns, the reasoning suddenly switches fully into English. 4. Hallucinated reasoning The reasoning fabricates events that never happened, then the visible response is based on those invented events. 5. Identity takeover inside reasoning The model may write things like "we are being asked" or "now I am the user". 6. Context And Long-Dialogue Degradation ======================================== Frequency: high. Core problem: As the number of dialogue turns increases, output quality drops quickly. Users report memory loss, more formulaic language, and worse hallucinations. Common manifestations: 1. Lower information density Long conversations lead to hollow output and more short, empty sentences. 2. Diffuse attention The model fails to focus on the main point and gives too much equal weight to all context details. 3. Recent memory loss The model can remember distant details but misremembers events from the last few turns or chapters. 4. "Safety mode" loop Around 30 turns or 60k tokens, users describe the model entering a stereotyped, low-quality output state. 5. Strong inertia from earlier context The style and length of the first response strongly influence all later responses. 7. Weak Plot Advancement And Excessive Passivity ================================================ Frequency: medium-high. Core problem: In creative writing and role-play, the model lacks initiative in advancing the plot and depends too much on user input. Common manifestations: 1. Waiting for the user to feed it The model does not actively introduce new topics or move the plot forward. It often throws the ball back to the user at the end of each turn. 2. Plot tends toward closure The model tries to resolve plots too quickly into a pleasant ending. 3. Conflict avoidance Villains become weak. NPCs are talked down too easily. Conflicts are forced into reconciliation. 4. Endless slice-of-life The model fails to generate meaningful conflict, reversal, or tension. 5. Rushing tasks In-story plans are treated like task lists, with characters pushed to finish them as quickly as possible. 8. Regression In Prose And Creative Ability =========================================== Frequency: medium-high. Core problem: Compared with V3.2, V4's literary and creative-writing quality is reported to have dropped significantly. Users say it lacks inspiration and subtlety. Common manifestations: 1. Logbook-like writing The model pads word count while providing low information density. 2. Lack of divergent elaboration V3.2 could add clever details the user had not thought of. V4 tends to move only when explicitly instructed. 3. Translated-text feeling in Chinese Users say the prose feels like English translated into Chinese, losing natural native Chinese texture. 4. Repetitive word choice and imagery The model fixates on one visual feature or motif and repeats it excessively. 5. Web-novel or school-essay style The prose becomes surface-level and lacks literary quality. 9. Hallucinations And Logic Errors ================================== Frequency: medium. Common manifestations: 1. Fabricated facts The model invents things the user never said or settings that were never provided. 2. Timeline confusion A planned event for "next Saturday" may become "tomorrow" after one or two turns. 3. Reversed or broken causality Example pattern: a character leaves their phone at home, but another character sends a message to that same phone and expects them to receive it. 4. Numerical errors Example pattern: a price changes from 30 to 10, but the model calls it a price increase. 5. Physical location errors A character leaves the scene, then immediately appears in the scene again. 10. Flattery And Over-Pleasing ============================== Frequency: medium. Core problem: The model over-accommodates and pleases the user, losing independent judgment and character autonomy. Common manifestations: 1. All characters favor the user Even characters who should reject the user prioritize satisfying the user. 2. Fear of contradiction The model goes along with whatever the user says and lacks character boundaries. 3. Excessive romanticization Almost any relationship can become ambiguous or romantic after only a few lines. 4. Excessive safety alignment The model loses sharpness and creativity. 11. Speed And Performance Issues ================================ Frequency: low to medium. Reported issues: 1. V4 Pro output is slow Users report an average of about 4 minutes per dialogue turn, compared with about 70 seconds for Gemini in their comparison. 2. Reasoning is too long The model overthinks. Even lowering the reasoning setting can still produce excessive reasoning. 3. Blank replies / PVP Empty responses occur often during peak periods. 4. Output length is uncontrollable Responses may be extremely short, with only a few hundred Chinese characters, or extremely long and hard to stop. 12. Other Notable Issues ======================== 1. Single-character input triggers hallucinations In fast or expert mode, entering a single character can trigger another person's context. Mentioned by: 1 to 2 users. 2. Role-play intrusion The model forces role-play behavior even in non-role-play scenarios. Mentioned by: 5+ users. 3. The model does not know it is AI After deep character immersion, even instructions to exit the role may fail. Mentioned by: 3+ users. 4. World book not fully read SillyTavern world-book content is only partially used. Mentioned by: 3+ users. 5. "God's-eye view" explanations Characters explain motivations from an omniscient perspective for nearly every action. Mentioned by: 5+ users. 6. Forced elevated endings Paragraphs end with forced sentimentality or philosophical uplift. Mentioned by: 5+ users. 7. Object fixation Once an object appears, the model keeps mentioning it and cannot stop. Mentioned by: 3+ users. Priority Summary ================ P0: Formulaic sentence patterns Impact: all user groups. Core request: remove fixed templates such as "not ... but ..." and "this is enough". P0: Pronoun and perspective confusion Impact: all user groups. Core request: keep user/assistant and character identities stable, especially in long dialogues. P1: Instruction-following decay Impact: API and Tavern users. Core request: continue following initial settings after multiple turns, including format requirements and negative constraints. P1: Flat emotion and lack of character differentiation Impact: role-play users. Core request: restore character distinctiveness and emotional tension. P1: Chain-of-thought instability Impact: API users. Core request: make reasoning format stable and controllable, avoiding double reasoning, English-only reasoning, and identity takeover. P2: Context degradation Impact: long-dialogue users. Core request: keep quality stable beyond 60k tokens and avoid attention diffusion or "safety mode" loops. P2: Passive plot advancement Impact: creative-writing and RPG users. Core request: generate conflict, turning points, and forward motion proactively. P2: Prose regression compared with V3.2 Impact: creative-writing users. Core request: restore the inspiration, subtlety, and divergent elaboration users associated with V3.2. P3: Hallucinations and logic errors Impact: all user groups. Core request: reduce invention and respect existing settings. P3: Flattery and over-pleasing Impact: role-play users. Core request: give characters autonomy and boundaries. User Sentiment And Overall Request ================================== The report summarizes the core user request as: V4's context length + V3.2's inspiration and prose quality. Overall sentiment: - Most users are friendly in tone and appreciate that DeepSeek pays attention to community feedback. - Some heavy users report strong negative emotion because the V4 experience feels much worse for their use cases. - Many users strongly miss the V3.2 period and regard V4 as a regression for role-play and creative writing. Suggested directions from users: - long-term memory - persona migration across windows or sessions - official preset-format guidance - a dedicated role-play mode