Feeding AI My Life I Fed 15 Years of My Private Conversations to an AI and Asked It Who I Am – Part 1

15 years of private chat messages fed to an AI. 850,982 messages, one question: Who am I? Part 1 - the method behind the madness.

English/ March 17, 2026, at 12:45 PM /Published inAI

I Fed 15 Years of My Private Conversations to an AI and Asked It Who I Am – Part 1

Somewhere around March 2026, sitting in my apartment in Phú Mỹ Hưng with a sugar-loaded Highlands coffee that had gone cold two hours ago (and frankly, it wasn’t great when it was warm either), I had an idea that was either brilliant or profoundly unhinged. Possibly both. Probably both.

Here’s the setup: I had roughly 1,440,000 chat messages from my twenties sitting somewhere hidden underneath data dumps and binary dust on my Proton Drive. Private conversations, all of them – Trillian log files, ICQ archives, Skype exports, Facebook data dumps, the full digital exhaust of a young, possibly immature guy talking too much to too many random people across too many platforms. Messages I’d sent to friends, romantic interests, flirty interests, fuck interests, basically all sorts of interests, plus strangers from the internet and people I’d completely forgotten existed. Alongside those: nearly 700 blog posts from this very site, spanning 2008 to the present. Sixteen years of me, talking. Publicly and privately. Excessively, in both cases – though mostly in the digital realm, because this was where I’d gone looking for people who actually got it, after the ones around me turned out to be a different species wearing similar clothes…

What if I fed all of it to an AI and asked it to tell me who I am?

Not in a “fun personality quiz” kind of way. In a “apply established psychological frameworks, code behavioral indicators, track patterns across a decade and a half, and produce a formal psychological profile” kind of way. The kind of analysis you’d find in a clinical research paper, except the subject, the analyst, and the person requesting the analysis are all the same slightly-unhinged individual. Why? Well, because why not – you can’t tell if something is a good or bad idea until you try it out (well okay, sometimes you CAN, but this wasn’t the case here).

So I did it.

This two-part series covers the entire project: how and why I built it (this post), and what it found (Part 2). Fair warning – Part 2 gets a bit personal, while this one here stays mostly in nerd territory. Relatively. Absolutely relatively.

Why This Exists

I’ve been writing about myself on this blog for a long time. Posts about identity, about the meaning of life, about emptiness, about love, about inner chaos. I’ve written from a psychiatric ward. I’ve written about losing someone I loved. Over sixteen years of public self-reflection, and I’m reasonably good at it. At least that’s what I keep telling myself, and at this point I’ve been doing it long enough that the confidence feels earned rather than delusional.

But there’s a difference between self-reflection and self-analysis, and it’s the kind of difference that sounds semantic until it isn’t. Reflection is narrative – you tell yourself a story about who you are and why, and if you’re a decent writer with a blog and too much free time, you publish that story and call it personal growth. Analysis is structural – you look at the patterns underneath the stories, the ones you can’t see because you’re inside them, the wiring behind the dashboard. I’ve always been better at telling the story than examining the wiring.

The chat logs changed the equation. Those aren’t curated self-presentations. Those aren’t blog posts where I get to choose the angle and the lighting. Those are me – unfiltered, in real time, age 21 to 25, talking to hundreds of people across multiple platforms. The raw behavioral data of a person who had grown up as an outsider in rural Germany, surrounded by people but connected to almost none of them, and who had finally found a space – the internet, in all its chaotic, anonymous, beautiful ugliness – where being different wasn’t a defect but a feature. A person who didn’t know he was being recorded for future analysis. (He was recording himself, technically. Past-me was thorough about archiving, bless his obsessive little heart. Present-me is grateful. Future-me will probably be horrified).

The question wasn’t really “who am I?” I have a decent working theory on that at this point. The question was: do the patterns in my actual behavior match the story I’ve been telling myself? And if not – where are the gaps? Where did I construct a narrative that’s more flattering, more dramatic, or just plain wrong compared to what the data shows?

Spoiler: the gaps exist. And some of them are fascinating.

The Data

Chat Archives

The primary dataset: 850,982 normalized messages after deduplication from 1,440,410 raw messages. Spanning December 17, 2009 to December 31, 2013. Extracted from 9,098 source files across Trillian (ICQ, ASTRA, Skype, Facebook, MSN).

That’s four years of private conversations with 441 unique conversation partners – after cross-platform consolidation, because the same person might be “xXsomethingXx” on ICQ (yeah, this cringy xXblobXx format was lit late-MySpace-era shit back then) and their real name on Skype; the normalization pipeline figured that out. Platform distribution: Skype 44.1%, ICQ 24.7%, Facebook 20.5%, ASTRA 10.7%, MSN less than 0.1%. Direction split: outgoing 49.6%, incoming 50.4% – nearly perfectly balanced across the full corpus, which sounds healthy and egalitarian until you learn that it masks dramatic temporal swings that tell a very different story.

These are Trillian .log and .xml files. If you’ve never used Trillian cause you grew up after the Internet’s Bronze Age: it was a multi-protocol messenger client that could connect to ICQ, AIM, MSN, Skype, and basically every chat platform simultaneously. And it logged everything by default. Unencrypted. I kept those logs. For over a decade, they just… sat there. Untouched. A digital time capsule of every dumb joke, every flirtation, every 3 AM existential crisis typed in Comic Sans, every fight, every declaration of love, every lie, every “lol” that wasn’t really laughing. All of it, timestamped and preserved by a guy who hoarded data like other people hoard metal festival bracelets.

The primary language is German, because that’s what you speak when you’re a 21-year-old near Munich talking to other Germans on ICQ. English appears occasionally, increasing over time – foreshadowing, if you’re into that sort of thing.

Blog Corpus

Secondary dataset: 691 blog posts from lui.vn, 2008 through 2026. Of these, 93 are in the Personal Lore category and 42 in Love, Sex & Identity (a surprisingly low number, considering the extent of my, um, “love life“) – those formed the primary analytical material, though the full corpus was used for context. Language shifts from German to English around 2021, tracking my transition to writing for an international audience from Sài Gòn.

The blog is the public mirror. The chat logs are the private one. The overlap period – 2010 through 2013, when both exist simultaneously – is analytically gold, the kind of dataset a behavioral researcher would sell a kidney for (or at least a toe): it lets you compare what I was saying in private to what I was performing in public during the same weeks. Same person, same time period, two entirely different audiences. The divergences between those two mirrors turned out to be one of the most interesting findings in the entire analysis, but that’s a Part 2 story.

The Tool

Let’s be explicit about this: the entire analysis was conducted using Claude Opus 4.6, an AI model by Anthropic. Not a therapist. Not a psychologist. Not a human being of any kind. An AI language model processing text and applying frameworks that were specified by a purpose-built analytical rubric. If this makes you skeptical – good. Hold that skepticism. It’s appropriate.

Here’s what Claude is extraordinarily good at: pattern recognition across large text corpora, applying structured analytical frameworks with machine-level consistency, and synthesizing findings across multiple dimensions without getting tired, sweaty, distracted, or emotionally uncomfortable when the data gets ugly. Here’s what Claude is not: a licensed mental health professional. It can’t observe body language, read tone of voice, or pick up on the thousand nonverbal cues that a real clinician would use to contextualize a patient’s words. It processes text. That’s it. That’s the whole sensory apparatus – a very sophisticated, very thorough, very text-only sensory apparatus.

Everything that follows – every pattern identified, every indicator flagged, every framework applied – should be understood within that constraint. The findings are framed as “patterns consistent with” and “indicators suggest” rather than diagnoses, because only a licensed professional can diagnose and Claude explicitly cannot. (This is an important distinction, and I’m going to repeat it in Part 2 because the findings there get into territory where the distinction genuinely matters).

What Claude CAN do is something no human therapist practically could: read almost one million messages, hold the entire corpus in analytical memory, and systematically apply the same framework across seventeen quarterly time periods with perfect consistency. No fatigue, no forgetting, no unconscious bias toward confirming what it found last session, no “hmm, let’s revisit that next week” because the hour is up. That’s the trade-off: you lose clinical intuition but gain exhaustive, systematic coverage. Whether that trade-off is worth it depends on what you’re looking for. I was looking for patterns. Patterns, it turns out, are exactly what machines are good at.

The Numbers

The full project consumed approximately 3.6 million tokens across all sessions – roughly equivalent to 2.6 million words of combined input and output. For context, that’s about twenty-six novels’ worth of text processing. Twenty-six novels. About me. I’m choosing to find this funny rather than mortifying, though the line between the two is thin. Very thin. Maybe even dashed or dotted.

The analysis ran across multiple sessions over several days: seventeen quarterly analysis passes through the chat logs, a full blog corpus review, a structured Q&A phase where I answered questions the analysis surfaced (more on this later – this phase turned out to be critical), and the final profile assembly. It was a marathon, not a sprint, and by the end of it I had the strange experience of knowing that an AI had read more of my words than any human being ever has or probably ever will. Genuinely surreal experience, would recommend, 10/10.

The Analytical Framework

You can’t just throw around one million messages at an AI and say “tell me about myself.” Well, you can. But that gets you vibes. I wanted structure.

So before feeding a single message to Claude, I built a formal analytical framework – a document that defined exactly what to look for, how to look for it, and what psychological models to use as lenses. Think of it as the rubric for a research paper, except the research paper is about me, written by a machine, and requested by the same person who is both the subject and the methodologist, which is either rigorous self-examination or a very elaborate form of narcissism depending on how you squint at it. (I prefer the first interpretation, but I’m biased. The analysis, incidentally, had opinions about this bias too).

The framework defines eight analytical dimensions, each grounded in established psychological research:

Social Communication & Interaction Patterns
How do I actually communicate with people? Not “how do I think I communicate” (I have a blog full of self-assured opinions about that) but what does the behavioral data show? Using the DSM-5 criteria for Autism Spectrum Disorder, the Broad Autism Phenotype Questionnaire, and the Autism Spectrum Quotient as structured lenses for coding specific behaviors: reciprocity patterns, topic management, social calibration across different contexts, textual pragmatics. Not to diagnose. To provide a systematic coding scheme for something that’s otherwise impressionistic.
Restricted & Repetitive Patterns
Do I have special interests that recur with unusual intensity? Ritualistic language? Insistence on sameness? DSM-5 Domain B criteria as the lens, adapted for text-based analysis. (Spoiler: the answer to the “unusual intensity” question is yes, and the chat logs prove it in ways that are both vindicating and slightly embarrassing).
Attachment & Interpersonal Dynamics
This is the big one. Using Bartholomew & Horowitz’s Four-Category Attachment Model, Bowlby’s Attachment Theory, and Gottman’s Four Horsemen of Relationship Conflict to map how I form, maintain, and dissolve relationships across different types and time periods. The chat logs are basically an attachment dynamics laboratory – 441 conversation partners, four years, every message preserved. You can literally watch relationships form and collapse in the data. One friendship goes from 11,629 messages in a single quarter to 125 in the next. A 98.9% decline. Not because of a fight. Because my attention had moved to someone new. That number hit different when I first saw it. It still does.
Personality Structure & Traits
The Big Five (Costa & McCrae), with additional indicators from the HEXACO model and the PID-5 for maladaptive traits. Operationalized indicators like planning behavior, follow-through on commitments, emotional stability patterns, response to conflict. The gap between “what Luit says he’s going to do” and “what Luit actually does” turned out to be one of the most analytically productive dimensions in the entire framework, which tells you something about me that I’d rather it didn’t.
Shadow Dynamics & Harmful Patterns
The one I specifically requested to be unflinching about, because there’s no point doing this if you’re going to flinch at the parts where you were an asshole. The Dark Triad inventory (Machiavellianism, narcissism, psychopathy), Beck’s cognitive distortion framework (all-or-nothing thinking, catastrophizing, emotional reasoning), and Vaillant’s defense mechanism hierarchy – from mature defenses like humor and sublimation all the way down to immature ones like projection, denial, and systematically lying to dozens of people about who you are. That last one isn’t a hypothetical example.
Cognitive Style & Information Processing
Kahneman’s Dual Process Theory, the Need for Cognition Scale, Baron-Cohen’s Systemizing Quotient. How do I actually think? Analytical VS intuitive, depth of processing, decision-making patterns, and – critically – the gap between insight and action. Because it turns out you can be extraordinarily good at understanding your own patterns while being mediocre at changing them, and the data has receipts.
Identity Development & Cultural Integration
Cass’s model for sexual identity development, Berry’s acculturation model for cultural integration (somehow indirectly relevant since I’m a German living in Vietnam, which is a sentence that still sounds slightly surreal to me even after eight years), and Erikson’s psychosocial development stages. Tracking how identity evolved across the entire corpus – from a fabricated persona on ICQ to whatever I am now, which is at least authentic, if occasionally confusing.
Longitudinal Change Tracking
The meta-dimension. Using Prochaska’s Transtheoretical Model of Change and the Post-Traumatic Growth Inventory to track how each of the above dimensions evolves over time. This is what turns individual snapshots into a developmental narrative – not just “what are the patterns” but “how did the patterns change, and what made them change?“

Each dimension specifies operationalized indicators – specific, observable behavioral markers that can be identified in text data. Not “seems anxious” but “initiates contact more than 60% of the time, responds to periods of silence within minutes, and message length increases during perceived relational threats.” Quantifiable. Codable. Pattern-able. The kind of precision that makes the humanities major in me uncomfortable and the engineer in me extremely satisfied.

The Processing

Phase 1: Data Normalization

Before any analysis could happen, the raw data needed cleaning, because decade-old chat logs from five different platforms are exactly as messy as you’d expect them to be – which is to say, a nightmare wrapped in inconsistent XML wrapped in character encoding issues that would make a Unicode specialist weep.

Those 1,440,410 raw messages included duplicates (Trillian sometimes logged the same conversation on multiple protocols, so a single Skype chat could appear twice with different timestamps), format inconsistencies across platforms, and encoding artifacts from a decade of software transitions that turned German umlauts into hieroglyphics. A custom normalization pipeline – written in Python, built with Claude’s help – deduplicated, standardized timestamps, consolidated cross-platform identities (those 6 confirmed same-person merges across platforms, because 2010-era internet identity was a beautifully fragmented mess), and produced the clean 850,982-message corpus.

Each message was tagged with: timestamp, source platform, sender, recipient, raw content, and detected language. The blog posts got similar treatment: date, category, language, word count. Everything structured, everything queryable. Past-me archived compulsively. Present-me made that archive useful. The circle of neurotic data hoarding closes.

Phase 2: Quarterly Analysis Passes

The chat logs were divided into seventeen quarterly periods (Q4 2009 through Q4 2013). Each quarter was processed as a separate analytical pass: the raw messages for that period were loaded, the eight-dimensional framework was applied, and structured intermediate findings were produced with specific evidence citations – not “the subject seems to have attachment anxiety” but “in Q2 2012, outgoing message ratio to Partner X increased from 52% to 67% coinciding with a 340% increase in message frequency, consistent with hyperactivation of the attachment system.”

This is where the quantitative patterns started emerging. Message volumes per partner per quarter. Outgoing-to-incoming ratios that shifted like mood rings across the timeline. Response latency patterns (how quickly I replied, and how that speed correlated with who I was talking to – the data is honest about crushes in a way that’s almost cruel). The concentration of communication – what percentage of total messages went to the top 1, 3, and 5 conversation partners. These numbers told stories that the text content alone couldn’t, and some of those stories were ones I would have preferred not to hear again.

Phase 3: Blog Corpus Review

The blog was analyzed separately, focusing on the “Personal Lore” and “Love, Sex & Identity” categories. The goal here was twofold: identify the public self-narrative themes and compare them with the private behavioral data from overlapping periods. Where did the blog tell the same story as the chat logs? Where did it diverge? And when it diverged – was I consciously shaping a narrative, or had I genuinely remembered things differently than they happened?

(The answer, it turns out, is was a mixture of both. Which is somehow worse than I expected.)

Phase 4: The Q&A Phase

This was the part I didn’t expect to matter as much as it did.

After the initial analysis passes, Claude had questions. Not small questions – structural questions that the data alone couldn’t answer. What happened between specific people? What was the context for certain behavioral shifts? Were there events outside the chat logs that explained patterns inside them? The AI could see the WHAT, but not always the WHY.

I answered honestly. Uncomfortably, in some cases. And some of those answers fundamentally changed the interpretation of the findings. One answer in particular – about a close friend’s suicide that occurred during a period of maximum communication intensity with a romantic partner – recontextualized an entire year of behavioral data. The correlation the AI had identified was real (the numbers don’t lie), but the causal story was completely different from what the data alone suggested. What looked like pure romantic obsession was, in part, two people processing a shared trauma in real time while simultaneously falling in love, which is a very different thing – messier, darker, and more human than the neat pattern the algorithm had initially drawn.

This phase is why I don’t think you can fully automate this kind of analysis. The data provides the skeleton. The human provides the flesh. Neither one alone tells the truth – the Human In The Loop is a thing!

Phase 5: Synthesis and Profile Assembly

All dimensional findings were cross-referenced. Patterns that appeared across multiple dimensions were flagged – not as separate findings but as clusters, because avoidant attachment plus intellectualization defense plus professional silence isn’t three separate data points, it’s one behavioral system wearing three different hats. Contradictions between dimensions were surfaced too (high openness combined with rigid behavior patterns sounds like a data error, but it’s actually analytically interesting – it means the openness is genuine and the rigidity is a coping mechanism, and the tension between them is where a lot of the interesting psychology lives).

The temporal dimension was overlaid across everything: how did each pattern evolve across the sixteen-year span? What changed, what didn’t, and what changed in ways that surprised even the person who lived through the changes?

The final output: a formal psychological profile document following established clinical report structure. Executive summary, methodology, dimensional findings, cross-dimensional patterns, longitudinal trajectory, condition indicators, and recommendations. A document about me, written by a machine, that I find simultaneously fascinating and slightly terrifying to have in my possession.

What I Expected VS What Happened

I expected the analysis to confirm things I already knew about myself. It did that – but the confirming wasn’t the interesting part. The interesting part was the stuff I didn’t know, or knew but had never seen quantified, or knew but had framed differently in my self-narrative because the truth was less flattering than the story.

I expected to feel exposed. I did – but again, differently than anticipated. It wasn’t the content of the messages that was difficult (though some of it was objectively cringeworthy; I was 21 and on ICQ, with my browser window open, during night, overflowing with tiny little sexy hormones – you understand the risks you’re taking with that). It was seeing the patterns rendered in numbers. Watching the outgoing ratio swing from 42.8% to 59.8% over two years and knowing exactly what that curve means in human terms. Watching a friendship go from 11,629 messages in one quarter to 125 in the next and feeling the weight of a number that represents a person I stopped paying attention to. Watching myself, quantified, across time. It’s one thing to know you have patterns. It’s another thing entirely to see the spreadsheet.

I expected the AI to be either too harsh or too gentle. It was neither. Claude applied the frameworks consistently, cited evidence for every claim, and maintained the clinical framing I’d specified. When the data showed manipulative behavior, it documented it. When the data showed growth, it documented that too. No flinching, no cheerleading. Just pattern recognition at a scale no human could sustain, delivered with the emotional temperature of a well-written research paper. Which is exactly what I asked for, and exactly what I got, and somehow still felt like more than I bargained for.

Part 2 covers what those patterns actually are – the findings, the things that surprised me, the things that didn’t, and what it means to sit with an unflinching analysis of who you were, who you became, and what the data suggests about the distance between the two. It gets personal. It gets uncomfortable. And at least one number in it will, I think, be hard to forget once you’ve fully understood it (#clickbaitForDummies).

Part 2: The Findings.

A Note on Ethics and Privacy

This analysis was conducted on my data, about me, at my request. But chat logs are inherently dyadic – they contain other people’s words, lives, and vulnerabilities alongside mine. Every person who appears in the findings (with two exceptions) has been anonymized. No real names, no identifying details, no way to trace a finding back to a specific person. The analysis is about my patterns, not theirs. Their words were context for understanding me; they are not reproduced, quoted, or evaluated.

The two exceptions: Jayden, my current partner, who knows about and consented to this project. And Phương, who passed away in November 2024 and whose name I keep because he deserves to be remembered, and because I’ve already written publicly about our story.

If you recognize yourself in any of what follows: first, thank you for being part of my life during those years. No matter what role you played or what the circumstances were, you had an impact on my past and thus on who I am today, even if it was only on a platonic DNA-sharing level. Second, the analysis is about me, not you. You were the context. I was the variable.

Disclosure

This analysis was conducted entirely by Claude Opus 4.6, an AI model by Anthropic. All psychological frameworks were applied by the AI based on a purpose-built analytical rubric – not by a licensed therapist, psychologist, or psychiatrist. AI-generated analysis is not a substitute for professional mental health assessment. If anything in this series resonates with your own experience, please talk to an actual human professional. They’re better at this. The full disclosure lives at the end of Part 2, where the findings make it more relevant.

Hero image: Selfie taken by me on Nov 2, 2009. It was visually extended on March 16, 2026, at 12:07 AM with the help of google/nano-banana-pro.