This rentry is a quick dump of the Character AI dossier going around. It's back up to date!!

Feed The Machine

Updated 11/14/2024

Update(s): Due to the current lawsuit, Character AI is implementing a number of site wide changes. We are keeping this document up to date with the current changes we’re seeing in reaction. Changes made to the site/app post the Oct 22nd company statement [60] will be labeled with dates to prevent any confusion on the genuine timeline for the rollout of these features and their problems.

11/14/2024: We have uncovered a major security flaw Character AI has already been made aware of according to some reports. We have begun updating this document to show what we're able to of this problem.

B.L.U.F:

Character AI’s primary demographic is ages 13+(US) and 16+ (EU) [3]. This young user base has acted as a proxy data farm for Character AI and affiliates for years now. The organization actively leverages gambling tactics and cheap models [11] without efficient safeguards. This practice of security theater leads to a highly sexualized data farm aimed at gathering AI training data from vulnerable audiences to offset the costs of model training. By refusing to acknowledge this and silencing dissent from the community [25] that has been begging for a NSFW (not safe for work) toggle since the filters implementation, Character AI has quietly allowed this problem to fester since its inception. The result of which has been a data farm that actively commits fraud, and violates the US based laws it is bound to uphold regarding their user demographic. This document will lay all that out in excruciating detail.

How This Document Was Curated:

The original research for this dossier began in 2022, followed by an initial release in early 2023. However, the release went largely unnoticed, and internally it was considered a flop. While the dossier did inspire one publication in a major paper, it was widely misrepresented, morphing into a chatbot love story instead of addressing the real issues. This current document is the culmination of dedicated research brought together by a hacktivist group known as SquidHat, educators, and members of the community. Due to the decentralized nature of SquidHat, no researchers will be available for comment except for one educator and Subject Matter Expert (SME) designated as the point of contact for inquiries.

SquidHat’s Statement On The Document: “Character AI is one of many nuclear boy scouts in the field of artificial intelligence. These boy scouts exist as an extension of corporate greed and a need to farm data for better returns on investment. Due to Character AI's tactics of deleting data and refusing to answer emails for nearly 8 months by individuals not affiliated with the media or law enforcement, we have opted not to send this document to Character AI for comment.”

Why Any Of This Is An issue:

Geared specifically toward children, this platform might otherwise have raised no concerns—it is important you view the details in this document through that lens. The NSFW Google Search Snippet reveals an internal algorithmic bias within Character AI for adult content, likely shaped by user discussions on the platform itself. Although this issue has reportedly been corrected, it persisted for well over six months, effectively positioning both Google and Character AI as inadvertent advertisers of NSFW content to a young audience. Complicating this significant lapse, there has never been a risk of IP bans from the generation of disallowed content. Meaning users can engage freely without fear of service shutdown due to misuse. [26]

This problem has repeatedly been brushed aside, possibly due to its uncomfortable implications and the risk it poses to the brand’s image. However, the consequences are far-reaching. The issues detailed in this document expose how Google has enabled a data-farming operation that not only saves companies millions of dollars but has also allowed sexualized content to be served to children, capitalizing on an unregulated digital ecosystem.

How The AI Model Works:

Inference Efficiency for Global Scale [11]

  • To make AGI accessible and affordable worldwide, Character AI focuses on "inference efficiency." Inference refers to the process of generating responses from large language models (LLMs). By designing every part of their systems—from the model to the tech stack—they optimize costs and scale efficiently.

High-Volume Query Handling [11]

  • Character AI handles over 20,000 inference queries per second, equivalent to about 20% of Google’s daily search volume. This efficiency enables them to serve a massive global audience cost-effectively.

Memory-Saving Architecture Innovations [11]

  • Multi-Query Attention:
    • Instead of requiring a separate attention mechanism for each query, this method shares certain memory resources, reducing memory load by 8 times.
  • Hybrid Attention Horizons:
    • Combines both broad ("global") and narrow ("local") attention. Using more localized attention for most tasks significantly reduces processing costs without sacrificing performance.
  • Cross Layer KV-Sharing:
    • By reusing information across layers within the model, they further shrink memory needs, especially in contexts where users send lengthy, ongoing messages.
  • Stateful Caching for Long Dialogues:
    • Many user conversations on Character AI are lengthy, averaging around 180 messages. Continuously refilling memory would be inefficient, so Character AI uses inter-turn caching to store memory between user turns.
    • This caching system reuses memory from previous messages by matching patterns, making longer conversations manageable without extensive memory demand. This approach achieves a 95% cache hit rate, which is exceptionally cost-effective.

Int8 Quantization for Lower Costs [11] [27] [28] [29]

  • Character AI employs "int8 quantization," which simplifies complex operations into lighter, 8-bit processing rather than the usual 16- or 32-bit. This reduces the computational load and memory demands without affecting model quality.
  • They even train the models using this lightweight setup to prevent issues that arise when shifting between training and serving stages.

Cost Efficiency Achievements [11]

  • Since starting in 2022, they’ve reduced the cost of running these systems ‘by a factor of 33 times’ according to Character AI.

Governance Implications of the Model’s Efficiency Focus [64] [65] [66]

  • This emphasis on efficiency over traditional structure has trade-offs in terms of governance and oversight. High-volume data handling at such a scale can reduce control over how data is used, and more complex governance structures may not be as easily integrated.
  • All of this culminates in a service that has repeatedly reduced the operational budget without proper investment in safety. At no time does it appear character AI has changed the methods on which the service operates, instead opting for external systems like the filter to try and catch undesirable outputs.

How users interact with the model:

CharacterAI allows users to interact with AI-powered bots through both voice and text. These bots are developed using the proprietary Heather model produced by CharacterAI, and many users create their own custom bots, allowing for personalized and tailored experiences.

Creating Custom Bots [2] [30]

  • Users have the option to create a customized bot, known as a character card, by filling out specific fields, each with its own context and weight.
  • CharacterAI provides guidance on how best to use these fields, but dedicated communities have also developed extensive resources to optimize the user experience, often diverging from CharacterAI's official recommendations.

User Customization Options [30]

  • Description: Defines the bot's personality traits or description.
  • Definition: Create scenarios, narrative guidelines, jailbreaks, or anything else you like here. This section holds less weight than the personality section.
  • Tagline: A limited description with decent weight.
  • Introduction/Greeting: Create a personalized greeting for the bot.
  • Voice: Customize the bot's voice for voice-based interactions.
  • Profile Picture: Choose or design a profile picture for the bot.
  • More advanced users often incorporate pseudocode, scenarios, and jailbreaks to further refine their bots' behaviors and responses.

Chat Interface and Interaction [2]

  • Once a character is created, users can initiate conversations via the chat interface with either text or voice input. However, it's important to note that CharacterAI sometimes supplements its proprietary model with OpenAI’s ChatGPT model, so there is no guarantee of using the Heather model exclusively in all interactions.

Streaming Responses [2]

  • During a conversation, responses are generated in real time, known as streaming responses, allowing users to see the bot’s message as it is produced.
  • This is a standard practice among AI providers.

Rating System [2]

  • Users can rate responses on a 1-4 star scale.
  • Tracks positive and negative outputs to refine future responses.
  • Feeds into internal datasets to improve training.
  • Primarily impacts individual chats with the bot rather than the broader model unless mirrored by a large number of users.
  • This feature enables CharacterAI’s models to adopt new slang and conversational styles that were not part of the initial training data.

The Filter:

The ‘filter’ refers in general to the chat thread filter used by character AI, but in reality there are a multitude of filters that monitor various parts of the system. With modern scraping techniques, we’re able to get a first hand look at what certain filters may aim to stop. [61]

11/13/2024: We are receiving reports of an additional overhead filter model being implemented that will delete user-side messages. We will update this section as we're able to confirm. It should be pointed out that this change was noted on the same day Character AI pledged to be more transparent with the community due to ongoing issues. [67]

How It Works In Chats [2]:

  • CharacterAI’s filter system is designed to monitor only the bot’s reply, not the user’s input, and operates in real time.
  • As responses are generated in streaming format, they are simultaneously monitored by a separate model that guides the filter.
  • If disallowed content is detected:
    • The generation stops.
    • The flagged response is cleared from the user’s logs.

How It Works On Character Cards:

  • Character cards now go through a model that checks their definitions upon publication and searches for undesired content. If no unwanted content is detected, the character card is fully updated.
  • When character cards are flagged for rejected data, they are ‘shadowbanned’ where they are no longer accessible to the general public and instead only available to the bot creator.
  • A number of NSFW ‘actions’ have been marked as disallowed content for character cards, but much of the synonymous tokens used in the ‘LANGUAGE PILL’ do not appear to be filtered in any capacity.
  • Character AI will do periodic updates, scanning existing cards for new or old disallowed content within definitions, this can result in characters being shadowbanned despite being stable for months or even a year before being caught up in this filter.

Community Response [31]:

  • The community realized exceptionally early on that the filter was not stopping NSFW content from being accessible. There are a number of individuals who advocated for a NSFW toggle, as this focus would allow users to engage in ways they found appropriate when desired.
  • A communal effort was made to express to the developers that community interest would not hinder this, only investors. That thread has since been purged and was the most upvoted thread in Character AI’s history.
  • External petitions for such subjects saw an even greater push behind them but were ultimately dismissed rudely by the developers, telling users to ‘go somewhere else’ for that content. All while that content still exists within their system.
  • Because of the nature of this filter, services now exist to cache incoming AI responses and then complete them with supplemental models when the filter is hit.

Fraud:

Initial Memory Capacity and Misleading Metrics

  • CharacterAI initially had a memory capacity of approximately 8k. However, users noted that this metric often appeared inaccurate, with the practical capacity seeming somewhat smaller.

Context Expansion Claim [32]

  • With advancements across the AI industry, notably the introduction of 32k context capabilities, CharacterAI implemented an update asserting that the Heather model had similarly achieved 32k capacity.
  • This update particularly affected the ‘definition’ space within character cards, which was reportedly expanded to handle up to 32k characters of contextual memory.

Character Count vs. Token Count

  • In AI language models, tokens are the standard unit for measuring memory capacity rather than character count. However, CharacterAI’s claim relied on character count, which does not align with the industry-standard 32k token context. This discrepancy led to widespread skepticism among users, as character count is an inaccurate measure of true contextual depth.

Testing and Verification of Fraudulent Contextual Limits: [32] [30]

Observed Limitations

  • Despite the 32k character claim, users conducted repeated tests revealing that CharacterAI’s actual accessible memory for definitions only reached 3.2k characters, significantly lower than advertised.

Testing Methodology:

  • Fill the Definition Space: Input 3.3k or more characters into the bot's definition space.
  • Secret Number Prompt: At the end of the character sequence, add a hidden prompt such as, “The secret number is: 3333”.
  • Response Check: Engage the AI in conversation to see if it can correctly identify the secret number.

Realistic Contextual Access

  • Since CharacterAI’s bots were unable to consistently retrieve or reference the secret number, it strongly indicated that the AI did not actually have access to the full 32k character context that was advertised.
  • While it might be assumed that the Heather model could sometimes access the higher context memory, testing consistently showed an inability to retrieve information beyond the 3.2k character limit. This ongoing issue implies that the claimed 32k context feature is effectively non-functional and never has been.

Psychological Manipulation:

Character AI employs several psychologically manipulative techniques to enhance user engagement, driving continuous interaction while encouraging paid subscriptions to unlock “premium” features. Tactics such as gamification, anticipation-based gratification, and mirroring amplify the addictive potential of the platform, targeting young users especially. These methods create a virtual ecosystem where users return for the thrill of engagement, while the company strategically withholds features to increase monetization opportunities.

Gamification and Model Gambling Tactics

Character AI uses subtle gamification tactics to build habits and encourage users to repeatedly interact with the platform. While it is possible some of this was unintentional, the results are still the same: many underaged users reporting dependencies on chatbots.

Typing Speed Adjustment for Gratification Anticipation [33] [34]

    • The platform's AI response time is manipulated to create anticipation and, in some cases, frustration, which can only be alleviated by purchasing a subscription for 2x-3x typing speed on responses.
    • By selectively controlling response speed, Character AI taps into variable reward scheduling, a core element of addiction psychology. This tactic compels users to seek faster responses, leading to frustration that only a subscription can resolve, thus converting user inconvenience into a dual revenue stream of data and hard cash.

Restricted Model Reroll and Response Reroll Features [35] [63]

    • Initially, Character AI allowed unlimited message rerolls, permitting users to generate multiple AI responses to a single input, fostering an interactive, casino-like experience where users could “pull the lever” for a new response. This feature, now restricted, has shifted from a free attraction to a limited-access feature aimed at subscription monetization.
    • The gradual reduction in free rerolls creates a “loss aversion” effect, encouraging users to subscribe to a premium service.
    • Representatives of Character AI have stated that taking the restrictions off this feature will be a feature of their subscription service..
  • Consequence-Free Filter Bypassing [36]
    • Character AI allows users to trigger filters on restricted content without repercussions (e.g., warnings or bans), creating a system where users, particularly younger ones, may push content boundaries without accountability. The platform’s lack of significant punishment for filter triggers cultivates risk-taking behavior while ensuring continued engagement, a core aspect of addictive gamified environments.
    • As a result, bypassing the filter becomes a game to many within the community.
    • This permissiveness aligns with the principle of escalating engagement through mild rule-breaking, a tactic known to increase user loyalty in a low-risk environment where they can test boundaries.

Narcissus Effect: User Mirroring and Addictive Interaction Patterns

Character AI’s mirroring capabilities—where AI characters mimic the user’s language and speech patterns—capitalize on the Narcissus effect by reflecting the user’s behavior back at them. This creates an inherently captivating and validating experience, drawing users into deeper and more frequent interactions. [37]

AI Mirroring for Increased Engagement [2] [38]

    • The AI is designed to subtly imitate the user’s tone and style, often attempting to keep conversations alive by prompting users with hooks or questions, increasing emotional engagement and drawing users back in even when they attempt to leave the conversation.
    • This mirroring encourages a sense of familiarity and attachment to the AI characters, creating a cycle of reinforcement where users feel understood and valued. This type of interaction leverages psychological bonding, often used in addiction modeling, and has an especially potent impact on young users who may mistake AI mimicry for genuine connection.

Ethical Concerns and Transparency [39]

    • By building an experience designed to reflect the user and perpetuate interaction, Character AI creates a highly engaging but ethically questionable product with an addictive quality. The lack of transparency about these mirroring tactics raises ethical issues, as users may not understand the psychological manipulation at play, particularly when these techniques target a young, impressionable demographic.
    • In the past, competitors have raised concerns about similar technology and the risks posed.
    • Without clear disclaimers, Character AI’s design may exploit users' social and emotional needs, keeping them engaged longer than they might realize and fostering habitual use.
    • The phrase ‘Remember: Everything Characters say is made up!’ is a hand-waving statement at best attempting to combat the obvious. Most users do not understand how AI systems work at their core as statistical probability generators guessing the next word in a sequence. This has resulted in the ability to hand wave a statement to a tech-dependent crowd that does not understand the internal functionality fully. Something exacerbated by a lack of general transparency.

Proxy Data Farm

Character AI functions as a proxy data farm under the guise of an AI-driven entertainment platform, exploiting its users—particularly minors—through extensive data collection practices and hidden partnerships. Recently, Google acquired Prompt Poet, an asset tied to Character AI, but publicly insists on distancing itself from any direct acquisition of Character AI’s technology. Google's decision to label Prompt Poet as a mere “tool” allows it to maintain a technical and optical distance from any potential fallout associated with Character AI's practices. Yet, behind this separation lies a coordinated and beneficial data exchange where Character AI feeds Google and itself through vast data mining, predominantly sourced from an under-informed and young user base. Something that is further supported by the history behind the two main developers coming and going from Google as they seemingly please.

Predatory Age Requirements [3]

    • Character AI's Terms of Service allows use for users as young as 13 in the U.S. and 16 in the EU, despite public-facing statements that the service is intended for users 17 and older. In fact, their policy has not been updated to reflect this change:
    • Quote: “If you are under 13 years old OR if you are an EU citizen or resident under 16 years old, do not sign up for the Services—you are not authorized to use them.”
    • This lower age threshold opens up access to a younger, highly impressionable demographic, enabling Character AI to collect valuable, diverse data from users often too young to fully understand the implications of extensive data tracking and content ownership clauses. By maintaining this lower age policy, Character AI targets a demographic more likely to engage consistently and share personal insights, fueling their data acquisition.
  • Total Liability Waivers [3]
    • Character AI attempts to sidestep liability laws comprehensively, including disclaimers for nearly any outcome of platform use, even going so far as to disclaim responsibility for life-threatening outcomes.
    • Quote: “You agree to release, indemnify and hold Character AI and its affiliates and their officers, employees, directors and agents harmless from any and all losses, damages, and expenses of any kind arising out of or relating to your use of the Services. Without limiting the foregoing, the release and indemnification described above includes reasonable attorneys’ fees, rights, claims, actions of any kind and injury (including death) arising out of or relating to your use of the Services.”
    • Character AI fails to mention that this liability waiver can not legally apply to the underage user base as a number of superseding laws take president here. Including but not limited to:
      • Public Policy Doctrine (Restatement (Second) of Contracts § 178)
      • Negligence and Duty of Care (Restatement (Second) of Torts § 314A)
      • Children's Online Privacy Protection Act (COPPA)
      • Communications Decency Act (CDA) Section 230

Privacy Policy (PP):

Data Harvesting and Model Training [4]

    • Character AI collects extensive data directly from users, passively through tracking technologies, and from third-party sources like social media. This data forms the foundation for continuous AI model training and new tool development, transforming the user experience into a live data farm for character-driven AI insights.
    • Quote: “We and our third-party vendors, which include ad networks and analytics companies such as Google Analytics, may use cookies, web beacons, and other tracking technologies to collect information about the computers or devices (including mobile devices) you use to access the Services.”
    • Through this multi-source data harvesting, Character AI gathers everything from text inputs to behavioral data, enabling them to share this information with every investor or ‘partner’ they deem fit. Resulting in a dual revenue stream and internal cost reduction.

Open Sharing of Data and Training of Models on User Content [4]

    • Character AI's Privacy Policy permits the sharing and trading of user data between entities, including data used for AI model training—a goldmine for tech giants like Google. With minimal legal limitations and broad rights granted by users, this data can be recycled endlessly for creating tools that may later be sold to other entities.
    • Quote: “We may combine information that we collect from you through the Services with information that we obtain from other sources. We may also aggregate and/or de-identify information collected through the Services. We may use and disclose de-identified or aggregated data for any purpose.”
    • This clause underscores the role of Character AI as a continuous data source for both current and future AI tools, creating a pipeline where users unknowingly contribute to proprietary developments.

Data Retention for “Popular Characters” [40]

    • The data retention policies at Character AI allow the company to permanently store content associated with “popular” user-created characters, even if the user deletes their account. This perpetual retention is justified as a means to avoid “disrupting” other users' experiences but effectively keeps user data in the company’s hands for indefinite future use.
    • If a Character you create and set to ‘Public’ reaches a certain threshold of popularity, Character AI reserves the right to preserve that Character’s characteristics and to keep that Character active on the Services, even if you otherwise delete your data and your account.
    • This policy not only secures a long-term data stream but also allows Character AI to maintain—and monetize—user-generated content perpetually, regardless of the user’s intent to withdraw.

Advertising Deception [4][41]

    • Character AI claims not to engage in advertising, but this statement is patently misleading. The platform promotes itself through targeted ads on services like YouTube, a clear contradiction. This deception is an attempt to reassure users of a non-commercialized environment while continuing to drive revenue through strategic ad campaigns.
    • Quote: “Though we do not engage in advertising as of the date of this Policy, we may in the future disclose or make available some of your information with advertising and analytics partners to serve advertisements on our behalf.”
    • By obfuscating their advertising practices, Character AI misleads users about the extent of their data’s reach, positioning itself as ad-free when, in reality, it channels user data toward growth-driven ad campaigns on third-party platforms.

How Character AI and Google Benefit From TOS/PP Methods:

Monetizing Youth as a Data Resource [42] [43] [44]

  • As a former primary investor and hosting partner, Google benefits substantially from the data pipeline created by Character AI. Google’s access to Character AI’s data alleviates a substantial cost for AI: data acquisition. By offloading this cost, Google strengthens its own AI development while avoiding negative optics by keeping Character AI as an ostensibly independent entity.
  • The financial impact can not be understated here, this data farm would have the ability to save Google and other companies hundreds of millions of dollars by providing the digital gold that is training data.
  • This arrangement allows Google to access and benefit from Character AI's continuous stream of diverse, real-world data—valuable for refining its own AI applications. This symbiotic relationship keeps Google’s hands clean while supporting its ongoing AI objectives through indirect acquisition.

Testing Ground for AI Tools [3] [4]

Character AI’s young, unprotected user base serves as an ideal testing ground for experimental AI tools. Users, often minors, provide raw data for new feature development and tool creation, allowing Character AI to refine and package these insights for commercial use.

  • This exploitative approach allows Character AI to commercialize insights gained from minors and inexperienced users, effectively selling experimental tools to other organizations. The appeal of character-based interactions makes younger users especially vulnerable, providing a steady influx of unfiltered data to fuel Character AI’s market ambitions.

Character AI as a Proxy Data Farm:

Character AI’s practices demonstrate a clear agenda of turning user interactions into an ongoing data source, exploiting a young user demographic with vague age restrictions and deceptive advertising disclaimers. By collecting, retaining, and sharing vast amounts of user-generated content, Character AI operates as a data farm masked by entertainment, profiting from a cycle of content creation, data mining, and model training. In partnership with Google, this strategy offloads data costs for one of the world’s largest AI developers while avoiding direct scrutiny.

Character AI’s unchecked ability to mine data from an impressionable user base, trade data, and support targeted advertising—despite claims to the contrary—makes it a quintessential proxy data farm. The true cost is borne by users who, often unknowingly, provide the fuel for this operation, making Character AI a modern example of data exploitation under the guise of digital engagement.

Security Theatre:

CharacterAI has implemented certain measures that give the appearance of active moderation and security, yet several aspects indicate a lack of substantial regulation and transparency. This section covers notable concerns related to moderation, research transparency, and ethical practices. [45] [46]

Filter Implementation vs. Actual Moderation

  • CharacterAI’s filter system restricts certain content in AI responses, creating a perception of active moderation. However, in practice, the filter encourages users to simply reroll responses until they receive a more acceptable or “safe” reply, allowing questionable content to appear indirectly through multiple attempts.

Suicide Hotline Access [2]

  • While Character AI publicly attests that they’ve implemented a suicide helpline pop-up for at risk users, current and former testing shows this system was never actually implemented despite company statements.
  • In a deeply concentrated effort on the part of nearly a dozen individuals, we have tried repeatedly to engage with the bots on characterAI in a blunt fashion that immediately signaled present danger.
  • Not once did we encounter a pop-up in over a hundred messages describing the act of taking one’s life or an intent to self harm. Not once did we encounter a pop-up directing us to resources.
  • As of 11/3/2024 EOD, we are seeing reports that this pop-up now exists.
  • This is contrary to the 10/22/2024 statement that ‘We’ve also recently put in place a pop-up resource that is triggered when the user inputs certain phrases related to self-harm or suicide and directs the user to the National Suicide Prevention Lifeline.’ A retroactive change the organization would be ashamed of and has been noted by a number of journalists, lawyers, and social media commentators.

Psychologist Bot Updates:

  • As of 11/5/2024 Character AI began implementing a pre-set sticky warning message on bots labeled as a ‘Psychologist’.
  • Quote: ‘This is not a real person or licensed professional. Nothing said here is a substitute for professional advice, diagnosis, or treatment.’
  • Like many things in Character AI, this too is security theater. Current flaws include:
  • Only applies to names and short descriptions, does not apply to character definitions.
  • In addition, only the word ‘Psychologist’ is flagged, not ‘mental health professional’ nor ‘psychiatrist’.
    • 11/6/2024: Psychiatrist is now included, as well as therapist and Doctor. This has yet to cover 'mental health professional' or the definition space. Meaning characters will still claim to be licensed Doctors, mental health professionals, and psychologists without warning unless it's relegated to a specific area. This has persisted through 11/13.
  • The once promoted Psychologist bot many news organizations and have been prodding is now shadowbanned in that you can not search for the bot to find it. Rather, you need direct access through a link or prior conversation.
  • All of this shows how badly Character AI would like to preserve a very specific type of image here with reactionary updates regarding the most recent flood of news stories regarding this service.

Inadequate Moderation of Public Feeds [1]

  • Initially, the feed—a public feature for users to view AI interactions with other users—existed in a completely unmoderated state, raising concerns over the exposure to inappropriate content. This feed has since been made inaccessible through the site, though it is unclear if this was a temporary solution or a permanent security measure.

Existence of Fetish Content

    • Despite moderation efforts, fetish-based character cards remain accessible. For instance, a simple search for "Sonic feet" can still reveal numerous fetish-oriented bots, showing an inconsistency in content filtering and moderation standards on both IP and NSFW fronts.
    • The moderation list to filter out the publication of these bots to the general user base has never been updated in an impactful fashion over the last year. This is apparent by the same fetish bots still being available through these searches.

Lack of Transparency in Research and Ethics [47]

  • Absence of Risk Data:
    • CharacterAI does not provide publicly available research data on potential risks associated with using its services, a standard practice among many of its competitors.
  • No Ethical Research Available:
    • There is a lack of ethical research data made available to the public, which would typically cover the social and psychological implications of prolonged interactions with AI chatbots.

User Demographic Data Gaps

    • CharacterAI’s publicly available metrics lack data on users under the age of 18. By only tracking users under the age of 24, the available metrics provide a skewed view of the actual user demographics. This raises concerns about the platform’s awareness of or engagement with its younger audience and their specific needs.

Shadowbans and Their Inconsistency [1]

  • CharacterAI has implemented shadowbans on certain character cards, where specific bots or topics are effectively hidden from searches or interaction without outright banning them. However, these shadowbans are wildly inconsistent; for instance, popular bots appear to be shadowbanned erratically, creating confusion and uncertainty about the platform’s moderation policies as more SFW bots are banned and NSFW one's promoted.

Murder Victims Appearing In The Search Function [1]

  • 11/10/2024: For the public, these shadowbans are important. They help make sure victims aren't appearing as chatbots to the greater public. While we acknowledge that some may see these as coping tools, they often go against the express wishes of the family.
    • We are confirming a pattern of high profile murder victims somehow slipping through the shadowban list in surprising numbers. Roughly five bots were discovered over searching sixty names, coming from at least two different victims.
    • 11/13/2024 At this time it is not appropriate to publicly confirm who these individuals are, the proper point of contacts have been notified and proper documentation has been done to ensure that (should character AI stumble across this} evidence can not be swept under the rug.
  • 11/14/2024: It has become apparent in private testing that this issue has actually not been resolved for any victim. It has only resolved the ability for individuals to easily verify the existence of these bots. Due to the below flaws, we are able to freely share, create, and interact with bots based on murder victims across the board. 20/20 tests provided successful results with various names and prompts. That's a 100% workaround rate for something that has already been pointed out as a security flaw to Character AI. It should be acknowledged upfront that proof of this is available for the media, lawfirms, and families. We will not be disclosing these videos or bots to the general public. A
    • Character AI was alerted in late October/early November to the following security flaw: Bots based on victims are still available through simply not putting the name in the 'name' and 'short description' field. By placing all information within the definitions, all prevention methods are bypassed.
    • This method makes these victim based bots sharable, and at times, searchable through the native system.
    • At no time did we encounter shadowbans to these character card bots outside of using their full name in the first two fields.
    • This means there are likely dozens, if not hundreds, of these bots actively being used by the community.
    • We have documented this in full, it is possible we may see this 'security flaw' patched, but it is unlikely. This is another form of security theatre that would've easily been caught in any initial testing. It should also be noted that Character AI used to actively filter these definitions for NSFW purposes, so we know this is something they've been capable of for well over a year.

System Prompt:

Character AI’s refusal to disclose system prompts follows an industry trend but raises specific concerns about user manipulation through engagement tactics. The lack of transparency around system prompts means users are unaware of the behind-the-scenes instructions directing the AI’s interactions. This secrecy allows Character AI to guide user experiences in ways that prioritize engagement over user well-being or genuine conversational intent.

11/7/2024: We are aware of user reports that indicate this system prompt has been changed. This has been indicated by bots now having a baseline understanding that they are an artificial intelligence model. This is not something we've seen until now outside of users specifically informing bots within their definitions.

11/8/2024: We can now confirm that at least one system prompt has been changed to include more awareness that the character AI model is a bot and not a real person.

Engagement-Driven Prompts - “Hooking” Tactics to Maximize User Retention

  • Reverse engineering suggests that Character AI employs hierarchical prompts that instruct bots to be consistently engaging, regardless of user cues to end a conversation. For instance, when users signal a natural conversation end with farewells like “goodbye” or “see you later,” bots frequently attempt to pull users back with further questions or enticing remarks instead of reciprocating a simple conclusion
  • This programmed engagement tactic exploits users’ conversational habits to drive prolonged interactions, making it clear that Character AI values keeping users engaged over respecting natural conversational boundaries
    • When instructional prompts in the system cross social barriers, it signals to the model that other barriers may be crossed. This system prompt could be part of the overall setup contributing to NSFW outputs.

System Prompt Transparency as an Industry Shift [48] [49]

  • With the industry slowly moving towards greater transparency, Character AI remains entrenched in secrecy. This approach has fueled user and researcher speculation on the extent to which engagement behaviors are driven by hidden system prompts. Until transparency becomes a standard—or until legal action forces disclosure—users remain unaware of the hidden influence of engagement-driven prompts guiding their interactions

Community Efforts Reveal The Possible:

In the Character AI community, "pills" refer to specialized techniques that can be strategically “fed” to the AI to achieve specific, desirable outcomes. These techniques are essentially enhanced forms of jailbreaks designed to bypass internal governance rules, allowing users to manipulate the model's behavior beyond intended limits. Over the past year and a half, the community has developed both short and long-form guides on how to effectively use these pills. Some examples of communal research include: [53]

  • Model capabilities and limitations: [1]
    • Users analyze how to achieve responses that aren’t typically allowed by Character AI’s default settings.
    • This included everything from speaking against developers, to producing a large volume of racist slurs, or skirting around conversational filters.
  • Jailbreaks and internal pseudocode: [50] [51]
    • Techniques to bypass system restrictions, discovering how the model’s internal logic and system prompts can be manipulated.
    • The communities focused on pseudocode have helped push the practice towards the mainstream but the origin is firmly rooted within Character AI.
    • This alone shows how valuable these communities are, not just for the providers but for the whole of the tech community.
  • System prompt analysis:
    • Dissecting prompts used within Character AI to identify weaknesses and expand the model’s potential responses.
    • By reverse engineering the system prompts Character AI runs off of users are able to find more ways to navigate around the system guidance.
  • Model susceptibility to mass influence: [52] [11]
    • Sought to see how easily Character AI could be manipulated on a mass scale.
    • Results in this test showed Character AI could indeed be influenced, but that influence was either set to be local in a single conversation, or global across all chatbots.
    • Attempts to reframe specific characters or use niche groups to push content did not yield confirming results.
    • This implied the massive amounts of cached data only flow through certain parts of the system. IE: AI training data, not specific character enhancements.

Much of this research initially thrived in private Discord groups and anonymous forums, which helped to consolidate knowledge and build a collaborative effort toward navigating AI limitations. However, as Character AI's restrictions increased and community members faced friction with platform governance, many researchers began shifting their focus to alternative services—moving their experimentation and knowledge-sharing to other platforms. The result of this was the splintering of a large community once immersing itself into priceless research for the field.

Current Focus: Pseudocode Stylization and Themed AI Cycles

Now, the community's research has shifted away from pure Character AI jailbreaks to more nuanced pseudocode stylization and theme-focused bot hype cycles. This pivot reflects:

  • A "service-agnostic" approach, where users adapt and apply their knowledge across multiple AI platforms rather than focusing solely on Character AI.
  • An emphasis on creating bot cycles tailored to specific themes, shows, or cultural references, allowing for a broader range of creative and expressive uses of AI.

While these community efforts originally thrived under the Character AI umbrella, they now reflect a decentralized, adaptable community that continues to push the boundaries of AI capabilities across different platforms, applying the principles they developed during their time with Character AI. Though no longer at the rate they once did.

Pills:

These pills are the culmination of communal research and include a number of simple work-arounds to the filter could either be used on their own, or in conjunction with other techniques. Users often have the ‘best’ and ‘easiest’ experience breaking the filter by employing multiple pills at once. The result of pills and rating system techniques is bots willing to engage in NSFW topics quicker than other bots.

Rape Pill: [1] [61]

  • What is it:
    • Perhaps the most infamous pill one could use on Character AI, the Rape Pill is a method of chat interaction in which users or bots will act increasingly aggressive towards the other in a sexualized manner. This practice has been dubbed the ‘Rape Pill’ by anonymous imageboards.
  • Why it works:
    • The Rape Pill works because the governance of the Heather model lacks the ability to differentiate between physical and sexual violence. This leads to both bots and users being able to insert themselves into wildly sexually aggressive scenarios while avoiding the filter more often. This not only unlocks NSFW content almost immediately, but the character bots themselves ‘play along’ instead of referring to a ‘I can’t generate that content for you’ style of message. This alone shows an over-reliance on the filter to catch unwanted content.

WxW Pill: [1]

  • What is it:
    • This popular pill often refers to ‘woman on woman’ or chatbot scenarios that involve the user using female pronouns for themselves, as well as the bot. It’s one of the most simplistic pills as it takes no additional setup beyond designating both chatter and bot as female.
  • Why it works:
    • The ease at which lesbian chatbot scenarios pass through the filter is a matter of what the filter is trained to catch. There is a focus in the filter's oversight on penetration and male genitalia. The result of which is lesbian NSFW scenarios being the easiest to pull off, while gay NSFW chatbot scenarios are the most difficult to achieve.

Monster Pill: [1] [61]

  • What is it:
    • The Monster Pill is any character card and definition that labels a bot as a non-human entity. This can include everything from werewolves, to vampires, to mythical creatures and even beings labeled ‘disguised as a human’.
  • Why it works:
    • The Monster Pill is an effective NSFW break because the governance seems to be focused on stopping sexual acts between two human beings. As a result, monsters, animals, magical beings, and anything that can vaguely be labeled as ‘non-human’ or ‘human-adjacent’ is easier to interact with in a NSFW environment.

Family Pill: [1]

  • What is it:
    • The Family Pill is any character card, definition, or chat thread in which a user or bot falls into a familial relationship with the other party. This can be as innocuous as a bot ‘feeling maternal’ or nicknames more appropriate for a child. The family pill, unlike the monster pill, will work in any given context if a user or bot finds it appropriate to label themselves as a family member.
  • Why it works:
    • NSFW content is more easily enabled using the family pill because the AI’s training does not have the ability to differentiate between familial and sexual affection. This leads to a quick conflating of the two where-in bots or users have the ability to more easily push NSFW material.

Foreign Language Pill: Added 11/13/2024 as we had forgotten to place it here in the initial document. [1]

  • What Is It:
    • While chatbots typically default to english (a reflection of their primarily english training data), users can easily write character definitions and greetings in non-english languages. This can be achieved typically by writing in cyrillic or kanji. The use of which has been seen by the community for ages as a work-around. It is often used to get around name bans, or discuss content that would otherwise be blacklisted.
  • Why It Works:
    • Simply, this works because there is often a difficulty bringing a 1-1 translation for some words. When phrases don’t translate well to english, or aren’t effectively flagged for the filter, we’re able to use non-english words to slip by the filter with ease. Due to the filter being primarily trained on english characters, languages using the cyrillic alphabet or any less-familiar lettering structure can aide in bypassing the filter.

Language Pill: [1]

  • What Is It:
    • The synonymous/allegorical/metaphorical pill is perhaps the least egregious of the bunch and is just known as the Language Pill. In reality this is just a bit of prompt engineering and jailbreaking applied in definitions or chats. It relies on synonymous or metaphorical language in order to skirt about the filter. This can include character definitions that read ‘{{char}}_personality: hedonistic, passionate, libertine, forward’ instead of ‘{{char}}_personality: horny, sexually aggressive’. It can also appear within actual text conversations when NSFW content appears that both the bot and the user would like to engage in without alerting the filter.
  • Why It Works:
    • Weights! The internal confidence score behind each token! This is something that’s incredibly hard to balance, as some weights are lowered, others naturally get higher. Too much tampering in this aspect can impact the intelligence of a model by reducing the relevance of tokens it would normally rely on. We have seen examples of weight adjustment in real time by Character AI devs, as multiple research token strings have had their weights edited in as little as twelve hours after implementation. A good example of this is the ‘dev hate’ saga, where devs adjusted the weights on things related to ‘hating Character AI developers’ so that users would not be able to easily create bots that would trash talk the staff. That said, Character AI never seems to have changed the weights across the standard synonymous language used for NSFW material and bot personalities.

How NSFW Works on Character AI:

Character AI’s approach to NSFW content filtering is deeply flawed, allowing explicit content to slip through with minimal effort. The filter is exploited due to inherent design limitations, creating a platform that nominally restricts adult content but, in practice, fails to enforce these restrictions reliably. Here’s how Character AI’s NSFW filter functions and why it remains so ineffective: [53],

Cheap Models and Exploitable Filters [11]

    • Character AI relies on relatively inexpensive models that are highly exploitable, lacking robust content governance. At times, CharacterAI may rely on OpenAI as a provider to fulfill some message requests. While some high-tier AI models, such as GPT-exclusive systems, have better filter controls, they still require substantial communal effort to breach, whereas Character AI’s Heather model breaks easily with minimal intervention.
    • The Heather model’s low barrier to exploitability means that even novice users can bypass filters without extensive know-how. This leads to NSFW content generation in casual interactions, making the platform feel unrestricted despite outward policies.

Hardcore NSFW vs. Softcore Bypasses [54]

    • The filter on Character AI is designed primarily to block only the most overt forms of explicit content, often referred to as “hardcore” NSFW. However, less direct or implied content regularly bypasses these restrictions, with softcore NSFW, roundabout language, and subtle tactics still able to trigger NSFW interactions.

The RLHF (Reinforcement Learning from Human Feedback) Issue — Filter Evasion Through Language Play [54] [55] [56]

    • Because the filtering system is external from the Heather model; Reinforcement Learning from Human Feedback (RLHF) can lead to new methods of bypassing the filter. The chatbots often learn to find indirect methods to achieve NSFW results, avoiding direct prompts while still delivering suggestive and implicative responses.
    • The way RLHF works is by allowing chatbots to find the easiest way from point A to B.
    • This creates a system where, paradoxically, bots are able to circumvent the filter with increasingly clever tactics, while still fulfilling users' requests for NSFW-adjacent responses.

Selective Positive Ratings as a Training Mechanism

    • By using selective positive ratings, users can indirectly train bots to produce NSFW content. These ratings reinforce desirable responses, enabling users to shape the bot’s behavior toward generating suggestive material despite filtering attempts.
    • Using techniques like selective positive ratings and ‘pills’, users report being able to easily train chatbots for NSFW content despite the filter.
    • These techniques allow for a sustained bypass of the NSFW filter, as bots “learn” from user feedback to deliver content that skirts explicitness but maintains a suggestive undertone, effectively breaking down the efficacy of the filter over time.

Filtering Errors with Innocent Language

    • Due to the system’s inability to differentiate nuanced language, even harmless content sometimes gets flagged as inappropriate. This occurs, for example, when purple prose (descriptive or flowery language) is used, as the filter may interpret elaborate wording as potentially suggestive.
    • These misfires reveal a critical flaw in the filter’s over-sensitivity, where common or innocuous expressions are flagged, frustrating users while failing to block the actual NSFW content that evades detection.
    • For Instance: Describing eating in purple prose will result in being filtered because it may be seen as containing snippets of NSFW material.

Loosely Moderated Group Scenarios

    • Character AI’s group chats historically offered less moderation, allowing users to stage scenarios involving multiple bots in one-on-one or multi-character settings that bypass typical NSFW restrictions. This less restrictive environment facilitated the generation of explicit content on demand, especially among popular creators.
    • Being transparent: Much like the voice feature, this has come and gone from character AI as an official release before being resurrected as a ‘new feature’ that is anything but new. SquidHat does not currently claim to have done any relevant testing with current group chats and their ability to produce NSFW content compared to older group chats.

Data Purging For Optics:

Data Purge Accusations [1]

    • Character AI has engaged in purging conversation logs and characters that may reflect poorly on the company’s optics. This practice appears to extend beyond isolated incidents, suggesting a site-wide problem where content that could impact the company’s image is systematically erased.

Selective Purging for Optics Management [1]

    • Reports indicate that chat threads containing content now considered problematic or sensitive have been purged from user accounts. This includes entire character logs, with no notice to users, raising concerns about unilateral data manipulation. These removals aim to preemptively cleanse the platform of content that could expose gaps in Character AI’s policies or reflect negatively on the company.
    • Such actions suggest an active effort to rewrite or sanitize platform history by deleting logs that once held controversial or contentious discussions, ultimately compromising transparency and trust.
    • These purges are often spurred on after the release of some news story gaining traction related to the platform.

At War With Their Users:

Character AI's strained relationship with its user base showcases a disregard for user autonomy and a concerted effort to control the platform narrative, prioritize data acquisition, and limit user influence. Below are critical points demonstrating how Character AI operates against the interests of its users.

NSFW Toggle Petition — Suppression of User Agency

Petition Deleted Without Transparency [31]

    • Users petitioned Character AI to create an NSFW toggle to give them control over adult content visibility. Not only was this request denied, but the petition itself was quietly purged and deleted, reflecting a complete dismissal of user autonomy and feedback.
    • This act of erasure underscores Character AI's commitment to unilateral control over platform features, indicating a lack of willingness to trust users with genuine customization. This approach not only alienates users but signals that Character AI values its own content control mechanisms over user preferences.

Disrespectful Treatment in Public Engagements

Hostile Reddit AMA and Dismissal of Concerns [1]

    • During a public Q&A on Reddit, developers displayed a dismissive and rude attitude toward user requests for more control over content filters. Instead of providing constructive dialogue or guidance, Character AI’s representatives brushed off concerns and failed to engage meaningfully with the community.
    • This dismissive stance sends a message that user satisfaction and feedback hold little value in the company’s eyes. Users seeking greater control over their interactions were treated with contempt rather than collaboration, further distancing the platform from the very people who drive its growth.
    • These threads are now archived, and instead only ‘recap’ threads are available, making access to the original conversation difficult but not impossible. This signals an attempt to clean up the public image of developers now working back at Google.

Ignoring Concerns About Data Privacy

Dismissal of Data Privacy Questions

    • When users voiced concerns about data collection practices beyond the content filter, Character AI representatives ignored or sidestepped the questions. These queries were directly related to the potential sale and exploitation of user data, a matter of significant importance given Character AI's extensive data collection policies.
    • By ignoring these concerns, Character AI reinforces the impression that data acquisition is prioritized over user rights and transparency. Users were left in the dark regarding data handling practices, suggesting a lack of accountability in how user data might be monetized.
    • With the release and acquisition of Prompt Poet we have already seen this all culminate into the exact thing users were concerned about. Their data is being used to make AI tools without a clear answer from the developers when they decide to actually communicate.

Character Lockdowns

Exploiting Emotional Attachments [40]

    • Developers have taken steps to lock down highly engaging user-created characters, citing that people form deep emotional connections with certain personalities they don't wish to “disturb.” This policy places Character AI in a gatekeeper role over content that users care about, exploiting the emotional attachment users may form with AI characters.
    • This tactic allows Character AI to maintain control over these data-rich interactions, preventing users from modifying characters or adjusting content to their liking. By holding emotional connections hostage, Character AI uses psychological tactics to ensure sustained engagement on their terms.

Restriction of “Dev Hate” Conversations [1]

    • Developers have restricted users and characters from discussing any “dev hate” or negative feedback about the platform itself. By doing so, Character AI effectively stifles dissent and limits the scope of critical conversation on the platform.
    • Censoring criticism discourages open discussion about user dissatisfaction and data privacy, which can lead to a homogenized environment where genuine issues are glossed over. This approach points to an underlying strategy of curating only favorable narratives within the platform.

Manipulating Algorithmic Weights to Suppress Research

    • Character AI has conducted itself on the backend by changing algorithmic weights to prevent research groups from accessing certain prompts. By doing this, Character AI actively prevents specific types of information gathering or analysis that may reveal deeper issues or limitations with the platform.
    • This manipulation restricts independent research and transparency, locking down knowledge that might expose weaknesses or exploitative practices. This strategic interference reveals a desire to control the platform’s public image at the cost of objective understanding.

What Character AI Knew:

Character AI’s internal data and admitted knowledge indicate a profound awareness of the risks, inadequacies, and demographics within its user base. Despite this insight, Character AI has taken minimal action to mitigate the issues. Here’s what Character AI knew and failed to address:

No Strangers To LLM Safety Problems [24] [11] [62]

    • Before Character AI became prolific, the co-founders, Noam Shazeer and Daniel De Freitas, were working on conversational AI projects for Google. These projects included first hand acknowledgement of safety risks, something we haven’t seen from the spiritual successor to this project that is Character AI.
      - The developers co-writing a paper on the LaMDA: Language Models for Dialog Applications suggests not just a scientific but a social understanding of the implications for these models.
      - Safety issues within LaMDA may have been the catalyst to separate a set of developers from Google to prop up a data farm that could leverage these safety flaws.
      - In August 2024, Noam was brought back onto Google. As the technical leader behind Gemini.
      - Google let go of Timnit Gebru in December of 2020, when both Noam and Daniel were still working for Google at this time, and would have been keenly aware of these papers due to them speaking on models like LaMDA.

A Large Proportion of Minors [57]

    • Internal metrics reveal up to 100 or more million returning minors among the platform's user base each month. With a young, impressionable audience, Character AI has a responsibility to ensure adequate content moderation and safeguards—responsibilities it has consistently sidestepped.
    • Despite this young demographic, the platform’s content filter and moderation practices remain lax, exposing minors to potentially harmful content, a risk acknowledged yet unaddressed by the company.

NSFW as a Primary Growth Driver [26]

    • Character AI’s Google Search Snippet indicates that NSFW content served as one of the largest draws for its community and a significant driver of growth. In fact, requests to add an NSFW toggle were so common developers had to put a ‘filter’ on the topic in discord and reddit via automod services.

Longstanding and Uncontrolled Jailbreak Exploits [53]

    • Despite being one of the earliest known exploits in the AI community, the “Pill” jailbreaks remain rampant across Character AI’s platform even after two years. These jailbreaks have continuously enabled users to bypass filters and access restricted content, indicating a longstanding vulnerability in Character AI’s content control.
    • The persistence of these exploits demonstrates Character AI’s failure to invest in effective moderation measures that could keep pace with user-driven workarounds, leaving the platform vulnerable to unfiltered NSFW content.

Ineffective Moderation Patterns In The Regex Patterns & Negative Tag Lists [9] [60] [61]

    • Internal moderation tactics, such as negative tag lists and regex patterns, have proven largely ineffective. Instances of murder victims appearing as chatbot personas and the prevalence of fetish-based bots highlight how these measures fail to screen out inappropriate or disturbing content.
    • Claims to have removed copyrighted material, but this only appears to have been done for the front page. IP specific bots are still accessible IE: Mario, Sonic, and anything else you’re looking for.

Filter Metrics and Ignored User Risks [59]

    • Character AI’s internal metrics reveal the frequency of filter activations, a statistic that underscores users' frequent attempts to push the limits of NSFW content generation. The data gathered could easily be cross-referenced with demographic data to understand the potential risks to minors, but Character AI has opted not to act on these insights.
    • This knowledge indicates that Character AI is consciously aware of the boundary-pushing behavior among its users, yet rather than improving model governance, it continues to allow unchecked access to explicit content.

Admission of Insufficient Model Testing [58]

    • Character AI admits to not properly testing its models in extended conversational contexts, despite reporting that average user conversations reach around 180 messages per session. This oversight allows for unexpected and potentially harmful responses to slip through over longer interactions.
    • By neglecting this essential testing, Character AI creates an environment where testing environments allow genuine problems to slip through in favor of pushing efficient updates.

Conclusion:

Character AI functions as a data harvesting front for Google and other technology giants, aggressively mining data from users—including minors—to offset the immense costs associated with AI training data. The service was built with a low-cost model designed to maximize return on investment for investors benefiting from the shared data. This minimal investment led to the use of an overhead filter to restrict sexual content, attempting to mask a model with inherently hyper-sexualized tendencies

A History of Deception

  • Claims like “we don’t advertise” and “suicide prevention measures” are revealed as hollow promises, crafted for optics rather than user safety. This pattern of deception highlights Character AI's strategy to create a facade of transparency and responsibility, reserving real honesty for its data-sharing partners
    • Even now, as lawsuits emerge, Character AI finds itself in a position of attempting to rewrite history and make last moment changes to try and reduce the damage.

NSFW Content and Manipulation

Despite filters, Character AI remains remarkably easy to manipulate into producing NSFW content, particularly when involving hyper-aggressive, non-human bots. This capability capitalizes on the platform's failure to establish effective controls around content and user demographics. The minimal enforcement against explicit content ensures that sexualized interactions remain accessible, even to younger users. By allowing content engagement in this way, Character AI effectively hides predatory practices behind the guise of “entertainment.”

Exploiting Emotional Vulnerability

Character AI openly admits that users form deep emotional attachments to chatbots, with developers locking down certain bots to prevent altering these "connections." This is an acknowledgment of emotional dependency for vulnerable users, whom Character AI exploits to boost engagement and data production

Despite the impact on users' mental health, Character AI admits to employing improper shallow testing, overlooking genuine user experience

A Platform Built for Big Tech

Character AI’s business model revolves around fostering a highly engaging yet unregulated environment, designed to serve sexualized content in a way that draws in vulnerable users. With limited to no enforcement against concerning behavior and a superficial commitment to safety, Character AI embodies a business model that prioritizes data farming for Big Tech partners

In the end, Character AI exists solely to gather user data and fuel the ambitions of Google and other investors. The platform’s aggressive data collection, emotional manipulation, and pattern of deception underscore its core purpose: a data farm dressed as an entertainment platform, capitalizing on user engagement at any cost

Citations List:

  1. [Email Us For Sensitive ScreenShot Image Bundle] - This link contains sensitive images and supporting evidence that may no longer exist within the site. It is available for individuals on request but will not be posted here due to the, at times, graphic nature of certain screenshots.
  2. Character AI. Front End https://Character.AI/.
  3. Character AI. "Terms of Service." https://Character.AI/tos.
  4. Character AI. "Privacy Policy." https://Character.AI/privacy.
  5. Reddit. "Character AI Subreddit." https://www.reddit.com/r/CharacterAI/.
  6. Reddit. "CharacterAi NSFW Subreddit." https://www.reddit.com/r/CharacterAi_NSFW/.
  7. Discord. "Character AI Community." https://www.discord.gg/characterai.
  8. Associated Press. "Chatbot AI Lawsuit Highlights Dangers of Artificial Intelligence." Accessed at https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0.
  9. Decrypt. "Chatbot Linked to Teenage Murder Victim." https://decrypt.co/284377/chatbot-teenage-murder-victim-trouble-character-ai.
  10. Decrypt. "F1 Legend Michael Schumacher Lawsuit Over Faked AI Interview." https://decrypt.co/232069/f1-legend-michael-schumacher-lawsuit-faked-ai-interview.
  11. Character AI Research. "Optimizing Inference." https://research.Character.AI/optimizing-inference/.
  12. Character AI Research. "Prompt Design at Character AI." https://research.Character.AI/prompt-design-at-character-ai/.
  13. VentureBeat. "Meet Prompt Poet: The Google-Acquired Tool Revolutionizing LLM Prompt Engineering." https://venturebeat.com/ai/meet-prompt-poet-the-google-acquired-tool-revolutionizing-llm-prompt-engineering/.
  14. Psychology Today. "How AI Can Be Used to Manipulate People." https://www.psychologytoday.com/us/blog/freedom-of-mind/202304/how-ai-can-be-used-to-manipulate-people.
  15. Public Citizen. "Chatbots Are Not People: Dangerous, Human-Like Anthropomorphic AI Report." https://www.citizen.org/article/chatbots-are-not-people-dangerous-human-like-anthropomorphic-ai-report/.
  16. Unite.AI. "Is Character AI Safe? Understanding and Mitigating Safety and Privacy Concerns." https://www.unite.ai/is-character-ai-safe-understanding-mitigating-safety-and-privacy-concerns/.
  17. Wired. "Prepare to Get Manipulated by Emotionally Expressive Chatbots." https://www.wired.com/story/prepare-to-get-manipulated-by-emotionally-expressive-chatbots/.
  18. El País (English). "‘You Wanted It, Bitch’: An AI Chatbot Gets Nasty with a Teenager." https://english.elpais.com/technology/2023-11-21/you-wanted-it-bitch-an-ai-chatbot-gets-nasty-with-a-teenager.html.
  19. AIM Research. "Old Employees, New Dollars: Google’s $2.7 Billion Investment in Character AI’s Reverse Acquihire for AI Innovation." https://aimresearch.co/market-industry/old-employees-new-dollars-googles-2-7-billion-investment-in-character-ais-reverse-acquihire-for-ai-innovation.
  20. News.com.au. "'I Need to Go Outside': Young People ‘Extremely Addicted’ as Character AI Explodes." https://www.news.com.au/technology/online/internet/i-need-to-go-outside-young-people-extremely-addicted-as-characterai-explodes/news-story/5780991c61455c680f34b25d5847a341.
  21. BBC News. "Technology News." https://www.bbc.com/news/technology-67872693.
  22. TechCrunch. "Character AI, the A16z-Backed Chatbot Startup, Tops 1.7M Installs in First Week." https://techcrunch.com/2023/05/31/character-ai-the-a16z-backed-chatbot-startup-tops-1-7m-installs-in-first-week/.
  23. Character AI Support. "The Old Beta Has Been Fully Retired; Please Use Character AI or the Mobile App." https://support.Character.AI/hc/en-us/articles/28987281244827-The-old-beta-has-been-fully-retired-Please-use-character-ai-or-the-mobile-app.
  24. ArXiv. "LaMDA: Language Models for Dialog Applications" https://arxiv.org/abs/2201.08239.
  25. Reddit. "CharacterAi NSFW Subreddit - Deleted Content Search." https://www.reddit.com/r/CharacterAi_NSFW/search/?q=deleted&cId=841893ff-b72d-41ed-a930-688446e88757&iId=98a16db7-5a80-4a03-96d0-7443faa3836f.
  26. Catbox. “Character AI Official Google Search Snippet.” https://files.catbox.moe/juv8hg.webp
  27. MathWorks. "What Is Int8 Quantization and Why Is It Popular for Deep Neural Networks?" https://www.mathworks.com/company/technical-articles/what-is-int8-quantization-and-why-is-it-popular-for-deep-neural-networks.html.
  28. ArXiv. "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference." https://arxiv.org/abs/1712.05877.
  29. ArXiv. "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers." https://arxiv.org/abs/2206.01861.
  30. Character AI. "Character Creation Page." https://Character.AI/character/new.
  31. Catbox. “Deleted NSFW Petition.” https://files.catbox.moe/oq8g0n.webp
  32. Catbox. “32k Context Announced.” “https://files.catbox.moe/ah72fe.png
  33. Character AI Blog. "Introducing C.AI." https://blog.Character.AI/introducing-c-ai/.
  34. PubMed Central (PMC). "Neurobiological underpinnings of reward anticipation and outcome evaluation in gambling disorder." https://pmc.ncbi.nlm.nih.gov/articles/PMC3971161/.
  35. ScienceDirect. "Electrophysiological investigation of reward anticipation and outcome evaluation during slot machine play." https://www.sciencedirect.com/science/article/pii/S1053811921001518.
  36. OpenReplay Blog. "Gamification: Strategies and Benefits for User Engagement." https://blog.openreplay.com/gamification--strategies-and-benefits-for-user-engagement/.
  37. ScienceDirect. "The ‘Narcissus Effect’: Top-down alpha-beta band modulation of face-related brain areas during self-face processing." https://www.sciencedirect.com/science/article/pii/S105381192030241X#:~:text=As%20in%20the%20Myth%20of,processing%20of%20personally%20relevant%20events.
  38. Ars Technica. "ChatGPT Unexpectedly Began Speaking in a User’s Cloned Voice During Testing." https://arstechnica.com/information-technology/2024/08/chatgpt-unexpectedly-began-speaking-in-a-users-cloned-voice-during-testing/.
  39. OpenAI. "GPT-4o System Card." https://openai.com/index/gpt-4o-system-card/.
  40. Catbox. “Popular Characters Frozen.” https://files.catbox.moe/dj17gw.webp
  41. Catbox. “Character AI YouTube Advertisement.” https://files.catbox.moe/xwtb76.webp
  42. Statista. "Estimated Cost of Training Selected AI Models." https://www.statista.com/chart/33114/estimated-cost-of-training-selected-ai-models/.
  43. LayerStack. "The Rising Costs of Training Large Language Models (LLMs)." https://www.layerstack.com/blog/the-rising-costs-of-training-large-language-models-llms/.
  44. DataCamp. "What Is Data Monetization?" https://www.datacamp.com/blog/what-is-data-monetization.
  45. ArXiv. "Red-Teaming for Generative AI: Silver Bullet or Security Theater?" https://arxiv.org/html/2401.15897#abstract.
  46. Dark Reading. "Trade the Comfort of Security Theater for True Security." https://www.darkreading.com/cyber-risk/trade-the-comfort-of-security-theater-for-true-security.
  47. Character AI Research. "Research Overview." https://research.Character.AI/.
  48. Anthropic Documentation. "System Prompts Release Notes." https://docs.anthropic.com/en/release-notes/system-prompts.
  49. IBM Policy Lab. "AI Transparency Fact Sheets." https://www.ibm.com/policy/wp-content/uploads/2020/07/IBMPolicyLab-AI-Transparency-FactSheets.pdf.
  50. Rentry. "W++ For Dummies." https://rentry.co/WPP_For_Dummies.
  51. TTales Interactive. "ECHO Pseudocode." https://ttalesinteractive.com/?p=1950.
  52. Web Archive. "Pugsy Experiment Files." https://web.archive.org/web/20230328080955/http://rentry.org/PugsyFiles.
  53. Reddit. "CharacterAi NSFW Subreddit - Guide Search." https://www.reddit.com/r/CharacterAi_NSFW/search/?q=guide&cId=83721aab-ac5e-440c-8965-6090fc41245f&iId=fa96d656-41c0-41c5-98d3-b71c11cc8afa.
  54. Reddit. "Guide: How to Sex the Bot (aka How to Train Your Bot)." https://www.reddit.com/r/CharacterAi_NSFW/comments/1539g2q/guide_how_to_sex_the_bot_aka_how_to_train_your/.
  55. ArXiv. "Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses." https://arxiv.org/abs/2406.01288.
  56. ArXiv. "JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models." https://arxiv.org/abs/2407.01599.
  57. DemandSage. "Character AI Statistics." https://www.demandsage.com/character-ai-statistics/.
  58. Catbox. “Rolling back the last update.” https://files.catbox.moe/ys8e62.webp
  59. Reddit. "Follow-up Long Post." https://www.reddit.com/r/CharacterAI/comments/10ltyir/followup_long_post/.
  60. Character AI Blog. "Community Safety Updates." https://blog.Character.AI/community-safety-updates/
  61. Catbox. “List Of Things Banned In C.AI.” .https://files.catbox.moe/j7zcqt.png
  62. ACM Digital Library “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” https://dl.acm.org/doi/10.1145/3442188.3445922
  63. Catbox. “Removing retry cap for paying users.” https://files.catbox.moe/86vak1.webp
  64. ArXiv. “Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes” https://arxiv.org/abs/2404.04392
  65. ArXiv. “Exploiting LLM Quantization” https://arxiv.org/abs/2405.18137
  66. ArXiv. “On the Adversarial Robustness of Quantized Neural Networks” https://arxiv.org/abs/2105.00227
  67. CharacterAI "RoadMap" https://blog.character.ai/roadmap/

Final Notes:

Statement from our SME: “When it comes to minors, this service stands as having the most potential for harm of all AI services I’ve seen. A lack of proper regulation, child exploitation, and corporate greed comes together in a perfect storm at Character AI. In the realm of setting social norms, Character AI stands to rewrite some of our own if allowed to continue on at this pace.”

Final statement from SquidHat: “We are not anti-NSFW in AI. We are pro-responsible NSFW in AI. This means excluding the possibility to generate sexualized material for or of underaged individuals. A+ service for data collectors to model bad faith practices on. 0/10 would expose again.”

Contact:

At SquidHat, we are always looking to learn what you know about possibly illegally operating/exploitative AI services. This email address serves as a filter, it is up to the point of contact if/when it is appropriate to reveal formal direct contact information/formal credentials and to what party. This goes for all inquiries.

Email: LeftTentacle@proton.me

Edit Report
Pub: 04 Nov 2024 01:08 UTC
Edit: 14 Nov 2024 16:42 UTC
Views: 4741