Originally posted on zdnet.
With AI continuing its slow rise to prominence, consumers are concerned their personal content is being used to train Google’s generative service.
It began this morning when I saw a tweet from someone I follow stating that Google was using anything created with Google Docs to train artificial intelligence (AI). I immediately became concerned, because I write every first draft of everything I create in Google docs. All of my novels, my technical writing, various resumes, and everything in between…it’s all written with Google Docs.
I don’t want Google or any AI service using the content I create to train their models. I view that exploitation as plagiarism, plain and simple — and do not want to allow those companies to benefit from my decades of hard work. I realize I have a rather harsh opinion about AI, but I also know that I’m not alone.
Every writer I know personally stands against AI and not one of them is willing to allow a single company to use their words as fuel to feed that particular beast.
Also: Six skills you need to become an AI prompt engineer
Of course, I wanted to verify the veracity of the claim, which led me to Google’s official documentation on Document AI Security, which includes this entry:
Does Google use customer data to improve the model(s)?
No. Google does not use any of your content (such as documents and predictions) for any purpose except to provide you with the Document AI service.
At Google Cloud, we never use, nor do we intend to use in the future, customer data to train our Document AI models.
According to the Yahoo! News piece, the key word is public, in that Google’s policy says it can use publicly available data to train its AI models. However, Google states that it doesn’t use any of your content. There’s also a link in Google’s documentation that points to a privacy commitment piece. In that document, this paragraph stands out:
In addition to these commitments, for AI/ML development, we don’t use data that you provide us to train our own models without your permission. And if you want to work together to develop a solution using any of our AI/ML products, by default our teams will work only with data that you have provided and that has identifying information removed. We work with your raw data only with your consent and where the model development process requires it.
Also: These two AI models claim to be better than ChatGPT. Here’s what we know
Google has made it clear that they will only use customer data that they have permission to use. Now, the big question is this: do we trust them? That’s a big and complicated question. On the surface, I want to say, “Yes, we can trust them because they clearly state they will not use customer data without permission.” However (and this is a big however), is it possible that we’ve inexplicably given them permission when we agree to the EULA for Google Docs/Drive (which they regularly update).
Personally, I’ve never taken the time to read a complete EULA and I don’t know anyone who has. On top of which, I don’t speak fluent legalese, so much of those agreements reads like gibberish to me. As a result, I find myself in a position of being suspicious. I’m not saying that Google would do anything nefarious to trick us into handing over our content to train their AI models…but I’m also not saying they wouldn’t.
This is a rather sticky wicket we’re all in.
I do not, in any way, want my content to be used to train AI — period. I’ve worked for decades to not only develop my specific writer’s “voice”, but I’m also very protective of the words I write.
With that in mind, what are people who face this predicament meant to do?
I’m fortunate in that I know technology well enough that I can deploy a cloud service (such as Nextcloud) to my local network, such that I can use it in the same way I use Google Drive. The only difference is that it’s not available to the outside world, so any collaborative content I need to work with would have to be shared via the likes of Google Drive. However, my works of fiction aren’t shared in the same way (I send a document to the publisher because they prefer to avoid cloud services for this very reason).
And although I’ve not resorted to pulling my novels from Google Drive yet, I’m very much leaning in that direction. Or, at least I will most likely either start using a locally installed Nextcloud instance or a shared folder on my network.
In the end, it’s all about the assurance of privacy now and in the future — and there’s absolutely no guarantee that things will change at Google (or iCloud or OneDrive or Dropbox) in such a way that they retool their policies so that any content saved to their services is fair game. And because of that position, your best bet is to always be better safe than sorry.