This is part 4 of a 4-part series on studying how foreign sites load in China. For more information please check out:
For this analysis, looking more at the nuances of “Why Doesn’t a Site work in China” and we look specifically at harvard.edu for this illustration.
On the one hand, Harvard is a recognisable name and there are a number of channels through which Chinese students can learn about this institution. On the other hand, Harvard has a website they use to communicate with the world (for multiple purposes), and it doesn't really work in China.
We start by looking back at the summary information to see how the site performed over a week. As much as a few one-off tests may educate or indicate a handful of problems, what’s important is understanding how sites load on a repeated basis.
Time Series Analysis
Given the above, it appears the site is fast at ~3-5am, and slow otherwise.
Loading Time Histogram
% of Page Loaded
At the onset, we look visually at how the page loaded in China (after 30 seconds) vs the US when it was complete. Visually, we can see the following:
Images are failing (failed) to load
Youtube Video does not load
Running repeated tests allow us to understand averages - one of the difficulties of measuring site performance is that pages load differently almost every time.
Following is 3 sequential loads of Harvard.edu from China. As you can tell, the waterfall (i.e. sequence in which files load) looks different in each case - the number of files loaded is different, as well as which specific files were successful. We'll dive deeper into the Resource Waterfall in a future article.
In Trial 1 (far left), it appears that files just took incredibly long to load (long green lines) - this is symptomatic of a low/slow bandwidth issue. This experience harkens back to those of the 1990’s when images would load ever….so… gradually. That would be this particular user’s experience. More on speed below.
In the second case (middle), we see in yellow that the initiation or waiting time connecting to the foreign servers is slow, in addition to the to the low bandwidth issue in Trial 1.
In the last case (far right), it appears some of the earlier/initiating resources timed out leading to a cascading failure where no/few other resources could be retrieved..
One other difficulty in studying site performance is that when you look at how pages load, they often don’t show what doesn’t load. Said another way, if the site loads 100 resources in the US, and it loads 70 resources in China, you don’t know which 30 resources didn’t load unless you’re able to easily reconcile them.
Going to https://www.chinafy.com/en/tools/resource-test we’re able to compare which resources were retrieved from the US, Beijing, Shanghai, and Guangzhou, and reconcile them respectively. From a high-level there are a few issues:
Looking at Trial 1 data, you can see that the images at the top (in green) take 20-40 seconds to load. They aren’t particularly large, but this is emblematic of a far, distant, slow, or non-existent content delivery network (CDN). Doing a DNS / IP lookup, we can see that Harvard is likely using Google’s CDN. Google’s CDN, while vast and not blocked in China, performs comparable to most foreign CDNs which is - slowly.
When we look at CDN performance in China (care of Cedexis, now Citrix) we see that US networks perform poorly in China in which case, one needs to change, replace, or set up a multi-CDN configuration which make DevOps and Infrastructure a significantly more complex undertaking.
The other issue plaguing this site is that files aren’t loading. Files typically load unsuccessfully for three reasons:
They’re explicitly blocked or unavailable
They timed out - that is, the browser attempted to contact the server, and tried to load the file but the signal/response took too long
Indirect: The initiator (i.e. the file that triggers the loading of the said file) is didn’t load and the subsequent or dependent files could not be loaded.
In this case, Harvard.edu is affected by all three of these issues. As the sequence, or evolution of every page load is somewhat path-dependent, this is why we see such variance in pages loading (almost) fully some times versus not at all, at others.
When we look at Trial 3, we see that it takes about 38 seconds simply to load the primary HTML file. (Yes - it's quite small to see, you'll have to trust me on this!)
This is broken down as:
~15 secs to establish a connection (i.e. TCP handshake),
~14 seconds to validate the SSL cert,
5 seconds of waiting, and
4 seconds to download 18KB of data - that’s an incredibly slow, albeit common 4.5KBps throughput
This is the time it takes to generally establish a connection and load one file from one Domain. Harvard has third-party resources on 18 separate domains:
Given the difficulties in establishing, let alone maintaining, a stable connection in China, it’s critical that the number of domains is reduced. Unless dynamic information is loaded from these sites, static assets such as JS files, fonts, and other should be aggregated, and loaded from a single domain.
We’re really just scratching the surface with this analysis, and haven't even touched SEO - there’s far more involved in successfully Chinafying your site. To identify the problems, and apply resolutions takes considerable money, time, and complexity to set up, optimise, and streamline.
In these uncertain, and challenging times when the World seems to be pulling apart, we're excited to draw the Global Internet and the Chinese Internet closer together. Chinafy's incredibly powerful, and yet super simple. In the time it took you to read this article, we could have already processed your site and turned it 'live'.
There are immense opportunities for foreign companies entering China. We think Chinafy is pretty awesome - whether you're in Marketing, an Engineer, or an Agency, we're pretty sure you will too.