cmlabs

At cmlabs or PT CMLABS INDONESIA DIGITAL, we’ve built an ecosystem where marketing meets technology — and thrives. We don’t just stop at cmlabs itself; we’re the umbrella over brands like Sequence, VISUWISU, Sequence Stat, Sequence Media Monitoring & Trends, Traffic Farm, StoryLabs, and our favorite, cmlabs SEO Tools (yes, even the Chromium-based version). Our mission? To create a synergy so seamless and sustainable that even the algorithms are impressed. It’s all part of a bigger plan: to keep the tech world on its toes while delivering a marketing ecosystem that doesn’t just work — it evolves.

Collaborator

https://cmlabs.co/en-id/blog/webspam-threat

How Google Fights Webspam: Strategies & Solutions

Published at Jun 12, 2024 15:06 | Last updated at Jun 12, 2024 15:06 by Selsi Selvia

Webspam has become one of the common issues faced by search engines, including Google.

In general, webspam consists of web pages specifically created to manipulate search engines, primarily to achieve high rankings in the search engine results page (SERP).

This issue can certainly affect user experience in searching for and finding the information they need on search engines.

So, how does Google address and fight the threat of webspam? Let's find the answer in the following article.

Threat of Webspam on Google Search

Webspam refers to pages designed to manipulate search engines. It is important to note that webspam is generally not created naturally.

There are various reasons why webmasters or site owners create spam pages, often related to commercial interests.

For instance, through spam pages, webmasters can sell unsafe products. Alternatively, spam can be used for simpler purposes, such as gaining more clicks.

Additionally, spam pages prevent users from easily finding high-quality and relevant information.

Webspam is not something to be underestimated. In fact, in the Webspam Report 2020, Google revealed that they detect 40 billion spam pages daily, which can disrupt the user experience on search engines.

Figure 1: Spam detection report by Google.

In general, there are several types of spam known to violate Google's policies, including the following:

1. Cloaking

Cloaking is the practice of displaying different content to users and search engines. The goal is to manipulate search rankings and deceive users into visiting spam pages.

One example of this spam practice is displaying a page about investments to search engines while showing users a page about selling pharmaceuticals.

However, it's important to note that webmasters who operate paywalls or content-gating mechanisms are not considered to be engaging in cloaking if Google can view the full content behind the paywall.

2. Doorway

Doorway pages are a spam practice involving the creation of pages or sites designed to rank highly for specific, similar search queries.

Through this practice, users are directed to intermediary pages that are less useful compared to the final destination.

For example, this practice might involve creating a site with two similar homepage URLs, such as gronomy.com/product/SEO-agency-indonesia and agronomy.com/produk/SEO-services-indonesia, to maximize reach for the query “SEO Services Indonesia.”

3. Expired Domain Abuse

Expired domain abuse involves purchasing and repurposing expired domain names to manipulate search rankings.

In this practice, webmasters host content that offers little or no value to users.

An example of expired domain abuse is buying a domain previously used by a medical site and repurposing it to host low-quality gambling-related content to boost search rankings.

4. Hacked Content

Hacked content refers to content placed on a site without permission due to the site's vulnerability to hacking.

This type of spam is concerning as it can lead to poor search results for users and potentially prompt them to download harmful applications on their devices. Some examples of such hacks include:

Code Injection: When hackers gain access to a site, they may try to insert malicious code into existing pages.
Page Injection: Hackers can add new pages containing spam or harmful content to a site. These pages are often used to manipulate search engines or conduct phishing.
Content Injection: Hackers can manipulate existing pages on a site by adding specific content visible to search engines but harder to find by users or site owners.
Redirects: Hackers can insert malicious code into a site that redirects users to spammy or harmful content pages.

5. Hidden Text and Links

Hidden text and links involve placing content on a page solely to manipulate search engines while making it difficult for users to see.

Examples of this spam practice include using white text on a white background or hiding text behind an image.

However, some web design elements utilize dynamic content display to enhance user experience, which is not considered spam because it does not violate Google's policies.

For instance, slideshows that cycle through images or tooltip-like paragraphs that display additional content when users hover over an element.

6. Keyword Stuffing

Keyword stuffing involves filling a webpage with keywords or numbers in an unnatural and out-of-context manner to manipulate search rankings.

For instance, when a webmaster creates content by repeatedly using the same phrase, making it sound unnatural. Alternatively, a website owner might include a list of phone numbers without any substantial added value.

7. Link Spam

Links are a factor that can determine the relevance of a web page. However, if links are included on a web page to manipulate search rankings, they can be considered link spam.

One example of link spam is excessively exchanging links with other websites or creating dedicated pages for partners to engage in cross-linking.

However, buying and selling links is not considered a violation if certain conditions are met, such as including the attribute values rel="nofollow" or rel="sponsored" in the <a> tag. This practice can be conducted to maximize website revenue for advertising and sponsorship purposes.

8. Machine Generated Traffic

Machine-generated traffic is a spam practice that can interfere with Google's ability to provide the best service to users.

Examples of machine-generated traffic that violates Google's policies include sending automated queries to Google scraping results for rank checking or other unauthorized automated access to Google Search.

9. Malware and Malicious Behavior

Malware refers to any software or application specifically designed to damage the software it runs on, as well as mobile devices and computers.

Malware can also lead to malicious behavior, such as installing software without the user’s consent or installing harmful software like viruses.

Unwanted software is any file or mobile application that engages in deceptive behavior, is unexpected or has a negative impact on users.

Sometimes, site owners may not realize that files available for download on their site are considered malware or unwanted software.

Therefore, it is crucial for webmasters to ensure they do not violate Google’s unwanted software policies and to follow Google's guidelines carefully, such as:

Accurately informing users about the purpose and intent of the software.
Behaving as advertised.
Clearly and explicitly describing to users the system and browser changes that the software will make.
Using product recommendations only with proper consent.
Not scaring users.
Protecting user data.
Not compromising user browsing experience.

10. Misleading Functionally

Webmasters are required to create sites with high-quality, useful, and beneficial content for users.

However, in some cases, site owners may attempt to manipulate search rankings by deliberately creating sites with functions and services that mislead and deceive users.

This can result in users accessing content or services that are actually inaccessible. An example of this spam practice is a site with a fake generator claiming to provide app store credits, which it does not.

11. Scaled Content Abuse

Scaled content abuse involves creating numerous pages to manipulate search rankings rather than to help users.

For instance, a webmaster might create a large amount of non-original (plagiarized) blog content that provides little or no value to users.

Scaled content abuse is often related to the misuse of generative AI (artificial intelligence) to create numerous pages without adding value for users.

12. Scrapped Content

Scraped content is the practice of taking content from other sites without adding new, useful content or value for users. This practice can also be considered copyright infringement.

13. Sneaky Redirects

Sneaky redirects involve showing different content to users and search engines. This practice can also involve displaying unexpected content that does not meet the user's initial needs.

For example, a webmaster might show health-related content to search engines, but when clicked, it redirects users to a gambling page.

14. Site Reputation Abuse

Site reputation abuse involves publishing third-party pages with little or no supervision or involvement from the first party.

These third-party pages may include sponsored, partner, or ad pages that are unrelated to the host site's primary purpose. This practice exploits the ranking signals of the first-party site to manipulate search rankings.

An example of site reputation abuse is an educational site hosting pages about payday loan reviews created by third parties, which are then distributed across other sites on the web to manipulate search rankings.

15. Thin Affiliate Pages

Thin affiliate pages contain affiliate links with product descriptions and reviews copied directly from the original seller without any original content or added value.

A page can be considered a thin affiliate page and violate Google’s policies if it is part of a program that distributes its content across an affiliate network without providing additional information.

16. User Generated Spam

User-generated spam consists of spam content added to a site by users through channels intended for user content, such as spam comments on blogs, spam posts in forum threads, and so on. This type of spam is often unknown to the site owner.

What Does Google Do to Fight Webspam?

Webspam often disrupts users in finding useful and needed information. In more serious cases, webspam can lead users to scams or even worse situations.

To prevent this, Google employs various efforts to combat the threat of webspam in its search engine.

Google uses two main methods to detect webspam: automated systems (algorithmically) and manual reviews by the Spam Removal Team.

Google’s automated system can detect and remove most spam from search results. This system operates similarly to email systems that prevent spam from cluttering inboxes.

The remaining spam is manually reviewed by the Spam Removal Team, which examines pages on Google and flags them if they violate the Webmaster Guidelines.

In the "Search Off the Record Podcast," Duy Nguyen from Google’s Search Quality Team explained that sites specifically created for spam are immediately removed from search results and do not appear in relevant queries.

However, for sites mistakenly detected as spam but still containing high-quality content that does not violate Google’s policies, the Search Quality Team will first notify the webmaster and assist in addressing the spam issue.

The Presence of SpamBrain to Combat Webspam

Google's efforts to combat webspam threats are not random. In this context, Google has developed an AI-based spam prevention system known as SpamBrain.

SpamBrain is a Google algorithm capable of identifying and minimizing the presence of webspam in search results.

SpamBrain has been in use since 2018 to detect spam. According to the Webspam Report 2021, SpamBrain identified nearly six times more spam sites than in 2020.

This resulted in a 70% reduction in hacked spam, a common type of spam in 2020.

SpamBrain also succeeded in reducing nonsensical (gibberish) spam on hosting platforms by 75%.

Additionally, SpamBrain is designed to be a robust and continuously evolving platform to tackle all types of abuse.

Besides directly identifying spam, SpamBrain can also detect sites that buy links and sites used to pass on outbound links.

Figure 3: Google's statement on the uses of SpamBrain.

Latest Spam Update March 2024

In March 2024, Google once again released new spam policies to address spam practices that could negatively impact the quality of search results.

There are three new spam policies that have become increasingly popular and are the focus of Google's updates: expired domain abuse, site reputation abuse, and scaled content abuse.

In Google's official blog, Google's Director of Project Management, Elizabeth Tucker, revealed that this update could reduce the presence of low-quality and non-original content in search results by up to 45%.

Figure 4: Impact of the latest spam update in March 2024.

Google also explained that these spam policy updates, especially regarding site reputation abuse, will apply to both spam detection methods: automated systems and manual review by the spam removal team.

Figure 5: Google's comment on the implementation of the spam update in March 2024.

Additionally, one of the focuses of the March 2024 spam update is the use of generative AI in content creation.

In longstanding spam policies, Google explained that the use of automation, including generative AI, would be categorized as spam if its goal is to manipulate search result rankings.

These new policies are based on the same principles as the previous ones. Google considers large-scale content creation methods that are more complex, where it is not always clear whether low-quality content is created solely through automation.

These new policies are also intended to focus more on the fact that large-scale content production will be considered abuse if the content aims to manipulate search result rankings, regardless of whether it is created automatically or involves human input.

What Should Be Done to Avoid Website Detection as Webspam?

To prevent a website from being flagged as webspam by Google, site owners need to adhere to applicable policies as best as possible.

Ensure that your website optimization strategy aligns with the current policies. Also, always stay updated on policy updates released by Google periodically.

When creating content, don't overly focus on keyword density within an article. While keyword usage is crucial for SEO content creation, make sure it's not overused.

During the Google Webmaster Central office hours in March 2017, John Muller revealed that there are no specific limits on keyword density.

Overemphasizing keyword density can make content appear unnatural. Instead of focusing on keyword density, you should create content that is easily readable by users.

It's also important to avoid creating excessive, non-original content aimed at manipulating search rankings. However, there's no need to worry about publishing high-quality content in bulk at once.

According to John Muller, content on a website isn't considered spam solely based on how it's published.

Figure 6 - John Muller's comment on mass article publishing.

No matter how many articles you want to publish, quality content that meets user needs can still help optimize search rankings ethically.

How Does cmlabs Contribute to Addressing Webspam Issues?

As a professional SEO agency, cmlabs is committed to optimizing client websites to comply with applicable policies.

When there are updates to Google's algorithms and policies, including those related to spam, cmlabs assists clients in adjusting their SEO strategies to be compliant and free from abuse or specific violations.

In terms of content strategy, our team consistently produces content naturally and based on user needs, not just for search engines. This way, websites will be protected from harmful spam practices.

Looking to optimize your business safely in search engines? Let’s collaborate and see how SEO Services by cmlabs can help your business rank higher on SERPs. Free consultation now!

cmlabs

Thank you for taking the time to read my article! At cmlabs, we regularly publish new and insightful articles related to SEO almost every week. So, you'll always get the latest information on the topics you're interested in. If you really enjoy the content on cmlabs, you can subscribe to our email newsletter. By subscribing, you'll receive updates directly in your inbox. And hey, if you're interested in becoming a writer at cmlabs, don't worry! You can find more information here. So, come join the cmlabs community and stay updated on the latest SEO developments with us!

WDYT, you like my article?