The Google Jail
I have heard from various people that Google will penalize you for posting duplicate content on multiple sites. As I understand it, Google thinks you're trying to spam them so they knock down one or more of these sites in SEO influence with every infraction.
This presents a challenge for sites like Posterous, which lets you publish to multiple sites all at once. This was a major factor in why I decided to consolidate down to one site rather than maintain two.
A couple of observations here...
1) We kowtow to Google like it's some kind of moral authority. In exchange it would be great to see them be more transparent.
2) As the number of places for producing and distributing content skyrockets then duplicate postings will become the norm rather than the exception. It would be in Google's best interest to guide us rather than let us just test them.
What's your experience here?


Comments 31 Comments
I do think that they will adapt. Google has to figure out how to effectively index the social web. In that world content is shared all over the place. Maybe not in the exact or complete form, but pieces are all over the place.
http://twitter.com/franswaa
The Googlebot spider is smart enough to know where the content was posted first - which isn't necessarily where it found that content first - and to decide whether the authority of the subsequent websites merits that they be penalized or not.
In other words, context is everything.
As it stands, Googles black box ranking is a necessary evil, like security at the airport.
.. hey... is this textbox growing as I type? ... nice touch.
Now that I have moved to posterous full-time I actually stopped it from posting to my blog so that it wouldn't dilute the SEO.
I agree with you though that with so many new tools like posterous and tumblr that duplicitous content will become the rule rather than the exception and Google needs to come up with a way to figure out what is spam and what is legitimate content.
I would love to hear Matt Cutts' thoughts on this.
I think having fifty versions of the same content, regardless of the reason, is definitely a waste of web resources, and makes it algorithmically exponentially more difficult to know when to provide trustworthy content to the person doing the search. If you want the core information available to multiple locations, have a reference to the main original source, don't duplicate it. By doing it this way, you also build stronger inbound link authority to the main source, so it's a win-win.
I don't agree with Google on several fronts. This is one that I do.
Just sayin.
It would be disappointing to be penalised for posting to a blog via sites such as posterous, although I could see how people could be penalised if they did that all the time for every single post.
Perhaps the solution would be to consolidate as one site or just post the photos, videos and audio files weekly, rather than every post.
Google wins for making searching the net efficient. Weeding out poor quality and duplicate content (often the same thing) is a key part of that.
If anything they don't do enough. It's still way to easy to do a search and get the same content in multiple results, just slightly refactored.
IMHO Google's obligation is to the searcher, not the webmaster. They are the customer. If webmasters don't want to supply quality content, then someone else will step up and provide it (Wikipedia being a great example). SOMEONE will provide the content users want to see. Google's job is to serve the content that answers questions the best.
I don't get why webmasters think that everyone should adjust to meet their needs. They expect users to adjust to them, search engines to adjust to them.
Would this work in retail? Could Walmart insist people shop blindfolded and checkout in a foreign language of the cashiers choice? Would this be a practical business model? Webmasters seem to think so. Many have done this with awful unusable websites with poor content that are impossible to navigate and covered in ads that mask content.
Eventually webmasters will realize content is king and satisfying the visitor (customer) is key to making a website work. Until then, there will be bubble after a bubble.
/I'm a webmaster. Never bothered with SEO... never needed it. If people want the content, it's easily found regardless. If they don't: you can only force a handful to eat it. The rest will just move on.
Instead, Google will be seeking to determine which of the syndicated pieces of content is the 'owner' so it can provide most authority to *that* item. So, it would be best to put a link back to your source material. So, a micropersuasion post might say 'As posted on steverubel.com/the-google-jail'
At worst all you're really doing would be confusing Google, and that's not hard. The algorithm isn't particularly smart. (http://www.blindfiveyearold.com/search-engines-are-like-blind-five-year-olds)
There's actually quite a bit of information out there about duplicate content. I recall Matt Cutts doing a video recently on this subject. Shouldn't be hard to find.
So what? Duplicate content is NOT, by nature, a positive thing from any perspective other than web publishers who want to find lazy ways to reach more people. That's a pretty sad concept from a quality perspective.
It's that exact same mentality that led the vast majority of main stream news media to regurgitate AP wire service articles without doing any intelligent, investigative footwork.
Lazy does not equal positive. The arrogance of the web publishing community really needs to be seen for what it is.
But it won't give them authority either. So I think the best practice is to have the original item posted on the place you want Google to recognize, and then syndicated to the other places.
For that matter, when you think about it, AP routinely reposts the same thing to thousands of newspapers and websites and no one cares. If may be the only way someone at a small town newspaper sees the story.
Also, blog reposts are generally different from the original anyway. FF and Twitter just post snippets of the blog post with a link, not the full text.
And this is useful. One trend I'm seeing is blog posts that are reposted onto FF or FB often get more comments there than at the blog itself. So clearly, such posts are reaching new people.
My blog is my homebase, FF is my lifestream. they serve different puroses.
I can't tell you how often I see ppl willing to pay $5 for 1000 word articles regardless of how poorly written they are, how little value they offer anyone looking for truly relevant info on any number of topics - just for the ability to have those syndicated.
The snippet with a link is really what I believe is the right approach. It offers value all around. The original post gets content specific inbound links to it, the outside sites (FB, LinkedIn, wherever) get new content but in a way that doesn't cause duplicate content conflicts. More readers are reached that way as well, but in a non-polluting way.
ANd the concept of the AP is one of the things I mentioned previously. I think that's totally polluting when it's articles verbatim in their entirety. The whole nature of the web is about linking to outside sites so those people in a small town are not harmed - all they have to do is just click a link.
My blog article snippets show up on FB, on my company web site, on LinkedIn as well. And I get lots of clicks through from those to my blog.
I also try to quote as little as possible from articles because a) of Fair Use issues and b) if you just quote a little then add thoughts, you've added value and made it yours.
Do you even comprehend how insanely complex doing that is in an algorithm when facing millions of spammers, black hat SEO, and general crap content on billions of pages. Don't get me wrong - I think Google's algorithm is hugely flawed. Yet since the hundreds of small and mid-size business clients I serve have literally millions of people they're trying to reach with their message, Google is, from that perspective, the best solution out there. And like I said in my first comment - from that perspective, duplicate content is not good. Disseminate content - by all means. But do it through snippets with links back.
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
http://www.notojail.com