LLMs Can Now Deanonymize Online Users with 90% Precision by Analyzing Writing Patterns
A new research paper titled "Large-scale online deanonymization with LLMs" demonstrates that anonymous usernames provide diminishing protection against modern AI systems. The study shows that large language models can piece together a person's public trail across different platforms by analyzing their writing style and content, achieving 68% recall at 90% precision—meaning 9 out of 10 matches are correct.
What the Researchers Built
The research team developed a three-stage LLM-based pipeline for linking anonymous user accounts across different online platforms. Unlike traditional methods that rely on exact string matching or simple metadata comparison, this approach uses LLMs to extract identity-revealing patterns from unstructured text posts.
How the System Works
The system employs LLMs for three distinct tasks:
Identity Hint Extraction: The LLM analyzes raw text from user posts to extract personal clues, writing style patterns, topic preferences, and other identity-revealing information. This goes beyond simple keyword matching to understand semantic content and stylistic fingerprints.
Candidate Search: The system searches through massive pools of potential matches across different platforms, using the extracted identity hints to narrow down possibilities from what would otherwise be an intractable search space.
Reasoning-Based Comparison: For the most promising candidate pairs, the LLM performs detailed comparison reasoning to determine whether they represent the same person, rejecting weak matches that don't withstand scrutiny.
Key Results and Benchmarks
The researchers tested their approach on three real-world scenarios:
- Hacker News to LinkedIn matching: Linking anonymous technical discussion forum users to their professional profiles
- Cross-community Reddit matching: Identifying the same users across different movie-related subreddits
- Temporal Reddit matching: Tracking the same users across different time periods on the same platform
The system achieved 68% recall at 90% precision, meaning it correctly identified 68% of true matches while maintaining that 9 out of 10 identified matches were correct. This represents a dramatic improvement over traditional methods, which "stay near 0%" according to the researchers.
HN → LinkedIn 68% 90% Near 0% Cross-community Reddit Similar performance Similar performance Near 0% Temporal Reddit Similar performance Similar performance Near 0%Why This Matters for Online Privacy
The research demonstrates that pseudonyms and anonymous usernames have become significantly less effective as privacy protection mechanisms. Historically, linking a person across different sites required extensive manual investigation or sophisticated technical analysis. The paper shows that LLMs can now automate this process at scale using only publicly available writing samples.
As the authors note, "The problem is that pseudonyms often seemed safe only because linking a person across sites used to take lots of careful manual work." This research effectively eliminates that barrier, making large-scale deanonymization feasible with current AI technology.
Technical Implications
The system's performance remains robust even as candidate pools grow, which is crucial for real-world applications where platforms may have millions of users. The reasoning step proves particularly valuable, beating simple matching approaches by a wide margin and maintaining accuracy at scale.
This suggests that public writing alone—without metadata, IP addresses, or other traditional tracking methods—can now be sufficient to link accounts or identify individuals across the internet.
gentic.news Analysis
This research represents a significant escalation in the capabilities available for online deanonymization. While previous methods could sometimes link accounts through stylistic analysis or topic modeling, they required substantial manual tuning and typically achieved much lower accuracy rates. The 90% precision at 68% recall demonstrated here crosses a practical threshold where such systems become operationally useful for both legitimate investigations and potential misuse.
From a technical perspective, the most interesting aspect is the three-stage pipeline design. By separating hint extraction, candidate search, and reasoning comparison, the researchers have created a modular system that could be adapted to various LLM architectures and scaled efficiently. The fact that this works with general-purpose LLMs rather than specialized models trained specifically for authorship attribution suggests the technique could be widely deployed without extensive retraining.
Practitioners should note that this development fundamentally changes the risk calculus for online anonymity. The traditional advice of "use different usernames on different platforms" may no longer provide meaningful protection against determined adversaries with access to modern LLMs. This has implications for whistleblowers, journalists, activists, and ordinary users who rely on pseudonymity for safety or privacy.
Frequently Asked Questions
How accurate is the LLM-based deanonymization system?
The system achieves 68% recall at 90% precision, meaning it correctly identifies 68% of true matches while maintaining that 9 out of 10 of its positive identifications are correct. This represents a dramatic improvement over traditional methods that struggle to achieve meaningful accuracy rates.
What types of online platforms can this system work against?
The researchers tested the system on Hacker News, LinkedIn, and Reddit, demonstrating effectiveness across technical forums, professional networks, and social discussion platforms. The approach relies on analyzing writing style and content, so it should work on any platform where users generate substantial text content.
Can users protect themselves against this type of deanonymization?
Traditional protection methods like using different usernames are now less effective. More sophisticated approaches might include deliberately varying writing style, using translation tools to alter linguistic fingerprints, or minimizing personally identifiable information in posts. However, the research suggests that determined adversaries with sufficient data can overcome many such countermeasures.
Is this technology currently being used in the wild?
The paper describes academic research, but similar capabilities could be developed by various entities including law enforcement, intelligence agencies, private investigators, or malicious actors. The public availability of this research lowers the barrier to developing such systems.
Paper reference: "Large-scale online deanonymization with LLMs" – arXiv:2602.16800






