There’s a trend among Instagram users where they share a photo of themselves from 2016 next to a photo from 2026. Sharing memories with a 10-year gap sounds fun and interactive for online users, but you might be surprised if I told you that you’re feeding an AI algorithm your face to train it. Your before and after photos are being harvested into datasets, and by posting them side-by-side, it’s less work to correlate them.
One of the challenges AI faces is understanding how faces change over time, including wrinkles, weight changes, hairlines, and skin texture. Companies would normally have to pay a lot of money to collect long-term data like this, but people are handing it over for free.
As a data engineering and AI/ML consulting firm in Richmond Virginia, MPP Insights sees this from a unique perspective. This topic is multi-layered - from data privacy to the ‘how’ the data is actually processed.
We asked a couple of questions to our U.S. Managing Director, Peter Bilzerian, to understand what’s actually happening behind the curtain and if this trend is actually as scary as it sounds.
How Exactly Can Old Photos be Used to Train AI Models?
Peter:
"AI models learn by correlating patterns across massive datasets.
When people upload 2016 photos alongside their current photos, they're creating what we call 'temporal paired data' - the same person, location, or object but at different points in time. This can train a model to understand the concept of time and change better.
Some examples of this are:
- Aging and facial recognition across time - how does the phenotype change? Could this predict health issues?
- Fashion and style evolution - how do trends change over 10 years
- Environmental changes - have the cities or buildings changed?
From a data engineering perspective, these photos become labeled training examples. The metadata alone like the dates, locations and tagged people provides context to the model. It’s scary to think of the multitude of ways the models are trained on your simple transformation post.
In particular to this trend, I’m not concerned about privacy invasion since users aren’t directly putting them into ChatGPT or an LLM.
It’s just another trend to show your glow-up compared to other recent trends like putting actual images into ChatGPT or an LLM to generate an image of yourself looking like Studio Ghibli."
Why May Old Photos Be Particularly Useful for Training AI?
Peter:
“When ChatGPT came out in 2022 - it was trained from that point on using inputs from users (mostly recent photos) - so nearly 3.5 years of data is very recent, but older photos can help make the models more robust.
Historical photos teach models to recognize and adapt to different visual contexts across time periods.
My favorite optimal outcome is medical diagnostics and health applications of this.
Medical diagnostics can detect early disease progression from facial and body changes. We’re working on a project right now for a start-up where they’re implementing this for conditions such as Thyroid disorders (facial puffiness, eye changes), heart disease (high cholesterol causes eye changes), early-stage dementia (facial symmetry shifts).
As a two-time cancer survivor myself and Thyroid cancer survivor in particular, I’ve seen firsthand how early detection saves lives.
There is a silver lining to this trend, but only if there’s data privacy policies that are followed such as HIPPA. ”
Risk or Opportunity?
Sharing old photos from 2016 gives AI the ability to spot patterns and learn changes over time. It doesn’t stop there. The hidden metadata, such as location, also adds value.
Here comes an important question; should we get worried and stop posting old photos? As Peter explained, we can say no, because your picture is not being uploaded into a model interface. However, many vision models are trained to use public data, so it’s now your choice to upload your old photos publicly or not.
But let’s look at the bright side, these long-term facial changes can help models understand patterns that may be useful in medical applications and support early disease diagnosis. Considering all this, we can say technology itself is neutral, what matters is who controls the data and what they do with it.