The Steve Rubel Lifestream

Daily links, insights, photos, videos and more on emerging technology. 
« Back to blog

The Google Jail

I have heard from various people that Google will penalize you for posting duplicate content on multiple sites. As I understand it, Google thinks you're trying to spam them so they knock down one or more of these sites in SEO influence with every infraction.
 
This presents a challenge for sites like Posterous, which lets you publish to multiple sites all at once. This was a major factor in why I decided to consolidate down to one site rather than maintain two.
 
A couple of observations here...
 
1) We kowtow to Google like it's some kind of moral authority. In exchange it would be great to see them be more transparent.
 
2) As the number of places for producing and distributing content skyrockets then duplicate postings will become the norm rather than the exception. It would be in Google's best interest to guide us rather than let us just test them.
 
What's your experience here?

Comments (31)

Jun 28, 2009
John Millen said...
I've found this to be true. I created a new site for my communication coaching practice, separate from my pr consulting, and it seemed I was penalized for duplicate articles on the two sites. With new content, over the past few months, I've managed to improve the SEO. I agree with your conclusion, we should have fair direction from Google.
Jun 28, 2009
frank said...
GOOGLE give us guidance? That would really be amazing :)

I do think that they will adapt. Google has to figure out how to effectively index the social web. In that world content is shared all over the place. Maybe not in the exact or complete form, but pieces are all over the place.

http://twitter.com/franswaa

Jun 28, 2009
jacobshare said...
My experience is that the duplicate content penalty has been completely blown out of proportion. What about syndication services? What about spammers who scrape content from sites and reuse it on their own?

The Googlebot spider is smart enough to know where the content was posted first - which isn't necessarily where it found that content first - and to decide whether the authority of the subsequent websites merits that they be penalized or not.

In other words, context is everything.

Jun 28, 2009
John MacIntyre said...
If Google were transparent as you suggest, it would be like publishing a manual for gaming the system, there by reducing useful search results for us. There are just too many people out there trying to push irrelevant information in our faces.

As it stands, Googles black box ranking is a necessary evil, like security at the airport.

.. hey... is this textbox growing as I type? ... nice touch.

Jun 28, 2009
James Poling said...
I've had to confront these same issues. When I first started using posterous it was simply as a tool to post mobile photos to my actual blog so I didn't care much about the juice of what my posterous was getting.

Now that I have moved to posterous full-time I actually stopped it from posting to my blog so that it wouldn't dilute the SEO.

I agree with you though that with so many new tools like posterous and tumblr that duplicitous content will become the rule rather than the exception and Google needs to come up with a way to figure out what is spam and what is legitimate content.

I would love to hear Matt Cutts' thoughts on this.

Jun 28, 2009
Alan Bleiweiss said...
I'm on the side of user experience, and in this situation, that means Google's policy about duplicate content. They want to serve the most authoritative version of it so they have to come up with some method, however flawed, to decide what that is. In that process that means the other versions will suffer.

I think having fifty versions of the same content, regardless of the reason, is definitely a waste of web resources, and makes it algorithmically exponentially more difficult to know when to provide trustworthy content to the person doing the search. If you want the core information available to multiple locations, have a reference to the main original source, don't duplicate it. By doing it this way, you also build stronger inbound link authority to the main source, so it's a win-win.

I don't agree with Google on several fronts. This is one that I do.

Just sayin.

Jun 28, 2009
Links don't matter - they stopped mattering years ago ("miserable failure").

Likewise: Google doesn't matter all that much any more -- it's mostly noob who think that is matters.

Everone else only uses it because they're too lazy to type the URL into the location bar.

Search will increasingly happen on other sites (like homes.com, gifts.com, hotels.com, cars.com, movies.com, etc [and also other TLDs]).

Many Americans are very much behind the curve.

Barack Obama was ahead of it (causes.com, change.gov, etc.)

( in this context, see also http://gaggle.info/post/122/what-is-the-govern-algorithm-used-by-the-government-engine ;)

Jun 28, 2009
Sally Church said...
Hmmm, that's an interesting point I hadn't realised. I have just started using posterous as a neat tool for publishing text and photos more easily to my blog from Gmail as I travel a lot and typing from the iPhone is a lot easier and quicker.

It would be disappointing to be penalised for posting to a blog via sites such as posterous, although I could see how people could be penalised if they did that all the time for every single post.

Perhaps the solution would be to consolidate as one site or just post the photos, videos and audio files weekly, rather than every post.

Jun 28, 2009
Alexander Ainslie @Aainslie said...
Sounds like another opportunity for Bing to gain market share by not penalizing the "atomizing" of content.
Jun 28, 2009
@Alexander I've given people @ M$ a lot of advice how they could acquire LOADS of search share, but I think maybe they were too focused on acquiring some email accounts or something like that -- maybe they;ve finally given up on that (I don't think I'll ever figure out what they're up to -- maybe they don't even know it themselves? ;)

Bing seems to be a search engine that only does local searches (#FAIL ;)

Jun 28, 2009
Robert said...
I like that Google penalizes duplicate content... if they didn't it would just be a time consuming effort to do research.

Google wins for making searching the net efficient. Weeding out poor quality and duplicate content (often the same thing) is a key part of that.

If anything they don't do enough. It's still way to easy to do a search and get the same content in multiple results, just slightly refactored.

IMHO Google's obligation is to the searcher, not the webmaster. They are the customer. If webmasters don't want to supply quality content, then someone else will step up and provide it (Wikipedia being a great example). SOMEONE will provide the content users want to see. Google's job is to serve the content that answers questions the best.

I don't get why webmasters think that everyone should adjust to meet their needs. They expect users to adjust to them, search engines to adjust to them.

Would this work in retail? Could Walmart insist people shop blindfolded and checkout in a foreign language of the cashiers choice? Would this be a practical business model? Webmasters seem to think so. Many have done this with awful unusable websites with poor content that are impossible to navigate and covered in ads that mask content.

Eventually webmasters will realize content is king and satisfying the visitor (customer) is key to making a website work. Until then, there will be bubble after a bubble.

/I'm a webmaster. Never bothered with SEO... never needed it. If people want the content, it's easily found regardless. If they don't: you can only force a handful to eat it. The rest will just move on.

Jun 28, 2009
Mario Sundar said...
Ahh...bummer. have you stopped cross-posting from Posterous to Micropersuasion?
Jun 28, 2009
Steve Rubel said...
@Mario yes. But I merged the feeds under feedburner. 

Jun 28, 2009
Yuri Aksyonov said...
Google won't penalize you. Google penalizes duplicate content (not a news, in fact). And duplicate content is not great thing for search reputation.
Jun 28, 2009
ajkohn said...
The duplicate content 'penalty' is so overblown that it borders on myth. Google is looking for bad actors with duplicate content (e.g. - scraped and stolen content). The type of syndication you refer to does not incur a penalty or a neutralization (which is the new terminology).

Instead, Google will be seeking to determine which of the syndicated pieces of content is the 'owner' so it can provide most authority to *that* item. So, it would be best to put a link back to your source material. So, a micropersuasion post might say 'As posted on steverubel.com/the-google-jail'

At worst all you're really doing would be confusing Google, and that's not hard. The algorithm isn't particularly smart. (http://www.blindfiveyearold.com/search-engines-are-like-blind-five-year-olds)

There's actually quite a bit of information out there about duplicate content. I recall Matt Cutts doing a video recently on this subject. Shouldn't be hard to find.

Jun 28, 2009
Daddy-O said...
Writers write. That being said, although I do sometimes duplicate posts, I never do it in the same place. Is that bad?
Jun 28, 2009
Stuart Foster said...
Agreed. Duplicate content is becoming the norm (not the exception). It would be in Google's best interest to create a "remix" tag or at least open up the information about how they rate duplicate content.
Jun 28, 2009
AlanBleiweiss said...
"Duplicate content is becoming the norm".

So what? Duplicate content is NOT, by nature, a positive thing from any perspective other than web publishers who want to find lazy ways to reach more people. That's a pretty sad concept from a quality perspective.

It's that exact same mentality that led the vast majority of main stream news media to regurgitate AP wire service articles without doing any intelligent, investigative footwork.

Lazy does not equal positive. The arrogance of the web publishing community really needs to be seen for what it is.

Jun 28, 2009
Don Campbell said...
In general Google will not "penalize" duplicate content. @ajkohn nailed it in his previous comment - Google tries to determine the original source so it can give authority to that item but will not penalize the other postings (unless the whole site is obviously stolen content.)

But it won't give them authority either. So I think the best practice is to have the original item posted on the place you want Google to recognize, and then syndicated to the other places.

Jun 28, 2009
Bob Morris said...
Why is it arrogant to re-post from my blog to, say, Facebook? Most of the readers are different. My blog is political and Facebook is for friends and families. I don't think anyone here is thinking about reposting verbatim to 50 other sites, only to a few.

For that matter, when you think about it, AP routinely reposts the same thing to thousands of newspapers and websites and no one cares. If may be the only way someone at a small town newspaper sees the story.

Also, blog reposts are generally different from the original anyway. FF and Twitter just post snippets of the blog post with a link, not the full text.

And this is useful. One trend I'm seeing is blog posts that are reposted onto FF or FB often get more comments there than at the blog itself. So clearly, such posts are reaching new people.

My blog is my homebase, FF is my lifestream. they serve different puroses.

Jun 28, 2009
AlanBleiweiss said...
Actually Bob, you're not doing what thousands of web publishers are - they really are trying to get their content on hundreds or thousands of sites through article syndication. To me that's just pollution. The whole web is now filled to overflowing with article distribution offerings. Most of the articles out there are crap to begin with, because it's supposedly become a great way to get links embedded in those pointing back to some site ppl are promoting, which is a whole different topic.

I can't tell you how often I see ppl willing to pay $5 for 1000 word articles regardless of how poorly written they are, how little value they offer anyone looking for truly relevant info on any number of topics - just for the ability to have those syndicated.

The snippet with a link is really what I believe is the right approach. It offers value all around. The original post gets content specific inbound links to it, the outside sites (FB, LinkedIn, wherever) get new content but in a way that doesn't cause duplicate content conflicts. More readers are reached that way as well, but in a non-polluting way.

ANd the concept of the AP is one of the things I mentioned previously. I think that's totally polluting when it's articles verbatim in their entirety. The whole nature of the web is about linking to outside sites so those people in a small town are not harmed - all they have to do is just click a link.

My blog article snippets show up on FB, on my company web site, on LinkedIn as well. And I get lots of clicks through from those to my blog.

Jun 28, 2009
Bob Morris said...
Then we agree! I practically never repost entire posts from my blogs, only snippets.

I also try to quote as little as possible from articles because a) of Fair Use issues and b) if you just quote a little then add thoughts, you've added value and made it yours.

Jun 28, 2009
AlanBleiweiss said...
Bingo! the "you've added value" aspect is what I think is truly contributing something worthwhile to the web from the perspective of people searching for information at Google. If all I do is flood the web with thousands of the same article, I'm not adding value, but just repeating the same thing over and over, I'm making it more difficult for Google to determine what to show in the SERPs, so anyone hating on Google is ignoring that issue and that's where some of us see it as being arrogant. But yes, Bob, you and I are on the same page.
Jun 28, 2009
@Robert which instance of http://www.google.com/search?hl=en&num=100&q=%22man+is+born+free%22+%22everywhere+he+is+in+chains%22&aq=f&oq=&aqi= is the one you feel Google should approve as the "validated" (or "approved") copy?

BTW: I LOLed when Andrew Keen said Rousseau probably grabbed his ideas out of some garbage can in Geneva. :D

People who care what Google does only have themselves to blame. The statistical methods Google uses were invalidated decades ago - no one with any expertise would base serious information retrieval on such obsolete technology (as I've indicated above, it's primarily used by people who want to save a couple keystrokes by simply typing "ebay" or "amazon" [or whatever]).

Jun 29, 2009
The whole purpose of syndication is to disseminate content to all the places people are reading what they care about. That is a good thing. It seems that search needs a way to deal with that.
Jun 29, 2009
AlanBleiweiss said...
Richard, that's the whole point. Google HAS found a way. That way happens to be based on the problem with duplicate content in a search environment. Publishers who don't want to accept that don't seem to be willing to accept the bigger issue, instead, disregarding the need for someone using search to come up with the most relevant content for a specific search given the nature and scale of the web.

Do you even comprehend how insanely complex doing that is in an algorithm when facing millions of spammers, black hat SEO, and general crap content on billions of pages. Don't get me wrong - I think Google's algorithm is hugely flawed. Yet since the hundreds of small and mid-size business clients I serve have literally millions of people they're trying to reach with their message, Google is, from that perspective, the best solution out there. And like I said in my first comment - from that perspective, duplicate content is not good. Disseminate content - by all means. But do it through snippets with links back.

Jun 29, 2009
Marc krisjanous said...
Sorry if someone has already mentioned this though I searched for "canonical" and nothing came up at the time of this posting. For duplicate content you should be using the metatag "canonical". We use it on our site.
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
Jul 01, 2009
DBL said...
Duplicate content is not considered 'duplicate content' when one of the duplicates links back to the original, which is what most of those above are debating about. This is really a non-issue unless you are trying to *disguise* the fact that you are reposting content from one of your sites to another, by *not linking back*, and that is generally something only someone with a spammer mentality would disguise. None of the 'duplicate content' advice floating out there has anything to do with services that repost with links back to the original. Check into it further if you don't believe.
Jul 01, 2009
Gregg Hinthorn said...
A Gmail trick for publishing to different blogs is to email yourself plus the name of the blog (name+blog@gmail.com) with a filter set up to forward and post to whatever blog, and then archive the email. Painless and easy to do.
Jul 13, 2009
sam said...
Duplicate content is Google's way of scaling back invalid SEO schemes and make way for Paid Search campaigns. Being that Google makes the most money via paid listings, which is why they are pushing more for paid methods.
Jul 19, 2009
Brent Hopkins said...
Man, you have hit it right on the head about kowtowing to Google. And here we have the seeds of Google's destruction (well, probably more like obsolescence). Kind of feels like the Evil Empire of Microsoft, doesn't it? I'm a nobody, so I don't care what Google thinks. But if you get enough nobody's not caring about what Google thinks, then there is an opportunity for someone to come and eat Google's lunch just like Google has done to Yahoo and is doing to Microsoft.

Leave a comment...

 
Got an account with one of these? Login here, or just enter your comment below.
Posterous-login    twitter