learnathome.club

Artificial intelligence systems have become a major new category of data consumer. Unlike databases that store and retrieve information, AI systems transform data — learning from it, inferring from it, and generating outputs that can reveal far more than the inputs suggest. Understanding how AI interacts with personal data is increasingly essential privacy literacy.

Training data: the foundation

Large AI models — language models, image recognition systems, recommendation engines — are trained on enormous datasets. These datasets often include personal information scraped from the web: social media posts, forum discussions, news articles, images, and text written by real people about real events. The individuals whose writing or images contributed to this training typically had no opportunity to consent, and may have no awareness their data was used.

In 2023, a class action lawsuit was filed in the US alleging that OpenAI used private user data and scraped internet content including personal information without consent to train its models. Similar disputes have arisen around Meta's LLaMA models and Google's Gemini.

Inference-time data: what happens when you use AI

When you interact with an AI tool — typing a query into a chatbot, uploading a photo, or using an AI assistant — your inputs may be logged and used to improve future model versions. OpenAI's data use policy (as of 2024) noted that conversations could be used for training unless users opted out. Google's and Meta's AI services have comparable policies, varying by product and jurisdiction.

Users often share sensitive information with AI tools — health symptoms, relationship problems, financial concerns — treating them like search engines or journals. This information may persist in company logs.

Model inversion attacks: recovering training data

Researchers have demonstrated that it is possible to extract fragments of training data from a trained model through a technique called model inversion. A 2023 study by Nasr et al. ('Scalable Extraction of Training Data from (Production) Language Models') showed that GPT-3.5 could be prompted to reproduce verbatim training examples — including personal information — by issuing carefully crafted repetitive prompts. This means data 'in' a model is not always as opaque as the black-box framing implies.

AI-generated profiles: beyond segmentation

Traditional market segmentation puts you in a demographic bucket: 35–44 year old, urban, income band X. AI-powered profiling gös further — it can infer your emotional state, predict your likely future behaviour, estimate personality traits, and assess creditworthiness from non-financial data (facial expressions, voice tone, typing speed). This form of profiling is qualitatively different from traditional approaches and raises novel legal questions under GDPR's provisions on automated decision-making (Art. 22).

Cross-reference: If you want to go deeper on AI-driven manipulation, the companion course Critical Thinking: The Foundations covers reasoning under influence in detail.

Your takeaway

AI is not a separate category of privacy risk — it amplifies existing ones. Data you share with AI tools may be retained and trained upon. Models trained on public data may contain your information without your knowledge. And AI-powered profiling creates inferences about you that go far beyond what you have explicitly shared.

2.4 AI and Your Data

Learning Material

AI and Your Data

Flashcards

Quiz

Want more?