Blog / If Baidu can't Crawl Dynamic Sites, do Pages heavy with Ajax and JS Perform Worse in SEO?

If Baidu can't Crawl Dynamic Sites, do Pages heavy with Ajax and JS Perform Worse in SEO?

We Thought. We Experimented. We Failed. 

You know how when you read a page, and they have a “Load More” button at the bottom? Or when you scroll say 80% down the page and more text, images or more content appears? This content is generated dynamically, and it’s often done so using ‘code’ called Ajax, or Javascript. These pages are generically called dynamic pages and they differ from static pages which print the text immediately when the pages load.

It’s common knowledge among SEOs that Google is able to read (to a large extent) dynamic pages - Baidu however, cannot. When you think of Baidu’s bot (or BaiduSpider as they call it), it’s made for a Web 1.0 era, where pages are largely static in nature and everything that appears on the page is present within the HTML itself. Often, much of the readable content is generated by dynamic means, and if this is the case - it supports the notion that the less content Baidu is able to read, the less able Baidu is to understand it.

Thesis: If Baidu can’t read dynamic pages, do sites with dynamic pages rank lower on Baidu?

Let’s find out.


The Setup

We set out to find a correlation between search engine Rank vs degree to which a page was dynamic. We generated [ xxx ] queries, and analyzed over [ 1800 pages ] looking at

• Google in English
• Baidu in English
• Google in Chinese
• Baidu in Chinese

We thought it wise to define a few terms along the way:

SNL: Static Number of Lines: number of lines of useful info displayed on the page before any Ajax (note: this isn’t funny)

DNL: Dynamic Number of Lines: number of lines of useful info displayed on the page after Ajax

We need a ratio now to determine just how ‘heavy’ or how much of the page is determined dynamically, let’s call this PC (or Percent Change in Content), whereby:

PC = (DNL-SNL)/SNL

We define a similar parameter:

PC_Ajax = (DNL - SNL) / DNL

We go on further to define strictly the percent of content added vs deleted on the pages, but for all intensive purposes, PC, and PC_Ajax as a generic metrics should point us in the right direction.


Edge Cases

The process looks sound so far - but we ran into a concrete wall almost immediately. After evaluating a set of results, we found pages comprised almost entirely dynamic content - this goes immediately counter to our thesis. Granted - there are a number of other factors that contribute to a page/site’s ranking on Baidu for example backlinks, meta tags - let’s come back to this - maybe.

Elsewhere, we found pages where there was neither text, nor dynamic content - the pages were images! Such a page was: http://www.grandhotelbeijing.com/ - proof that Baidu parses images well, that other factors were at play, or perhaps both.

There were 23 of these 'all-JS’ pages, which comprise 2.35% of our total data points. We’ll omit these as well as other results where PC is < 0, or where DNL equals 0.

Section 1: Diving into Data

The results indicate a positive relationship between rank and how dynamic pages are. That’s it! We’re done!

Not so fast - while the slope is positive for both, it’s somewhat meaningless given the amount of noise. Looking at the top 10 search results (I wonder at this point why we even went out to pages ranked 50th), we see an incredibly amount of dynamic content with 20-30x the amount of dynamic vs static content. What does this mean?

Google Slope: 0.2356
Baidu Slope: 0.0098

It could be that more highly ranked sites are more sophisticated - perhaps they inject a number of advertisements, and the static content is sufficient for Baidu to understand the nature of the page. These can’t be validated with the data at hand however.

Section 2: The Top 10

Honing in on the top 10, or generally ‘Page 1’ of Search, we find a slightly different trend. Here, we find that Baidu prefers more dynamic pages while Google still has a preference for less Ajax.

Google: Slope = 0.1541, R^2 = ??
Baidu: Slope = -0.0276, R^2 = ??

Section 3: PC_Ajax vs Rank for Top 10 Results

Next, we consider the impact that PC_Ajax has on rank. Notice the difference between PC_Ajax, and PC. PC_Ajax uses the final information (i.e. DYnamic Number of Lines) in the denominator - now, the values are between 0 and 1. At this point - it’s hard to make any sense of this data, there is no slope, nor is there any significance.

Google: Slope= 0.0037, R^2 = ???
Baidu: Slope = -0.0036, R^2 =???

These results, while consistent, with those found earlier with PC still don’t say much. To make more sense of the data, we plot the Median of PC_Ajax for each rank - and our conclusion is now, finally becoming more clear:


Section 4: PC_Ajax by Language

Setting out to run our final set of regressions, we split out the language of the query (e.g. Hong Kong Hotels vs 香港酒店) and evaluate if language has an impact on the amount of Ajax used, and thus ranking. After all, Baidu is a Chinese search engine, and Google is an English search engine (well - largely). Perhaps running English queries through Baidu, and Chinese queries through Google has been messing up the data all this time, but alas the answer is ‘No’ and our conclusion becomes even more clear.


Conclusion

We analyzed the relationship between Search Ranking, and the Degree to which pages were dynamic. Looking back, if we could better identify, or split out the nature of, the portion of, or the location to which dynamic content was generated, we could perhaps identify if the dynamic content was additive to the ‘understanding’ of the page as understood by a search engine, or if it was used for other purposes like advertisements.

At this point, it’s fair to say there is no correlation between how dynamic pages are, and how they rank on Baidu.

×

Notey will use the information you provide on this form to be in touch with you and to provide updates and marketing. Please let us know all the ways you would like to hear from us:

You can change your mind at any time by clicking the unsubscribe link in the footer of any email you receive from us, or by contacting us at community@notey.com. We will treat your information with respect. For more information about our privacy practices please visit our website. By clicking below, you agree that we may process your information in accordance with these terms.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.