Summary: Chinafy is constantly scanning your site to ensure it stays updated with the latest, most freshest content. When updates are found, they’re typically ‘Chinafied’ and ready to go ‘live’ within seconds. There are also a few manual approaches you can take, although they’re normally not necessary.
This article discusses some of the approaches we take, as well as the different options you have to ensure the China version of your site is as fresh as can be. There are two aspects to the site staying updated, namely: i) Detection, and ii) Remediation
Before we dive into the detail, note that there is generally nothing needed on your end to stay updated. Our system scans your site regularly and you can trigger a manual update if you’d like.
The following highlights in a little more technical terms, a few more options you have, as it also takes you along a bit of an educational journey.
There are two layers to the infrastructure that require continually updating:
Layer 1: The CDN(s) - which ultimately delivers the files to end-users. The CDNs typically employ a cache control methodology (see more below) to synchronize and stay updated with Chinafy.
Layer 2: Chinafy - which adapts & prepares the files for the CDN. Chinafy similarly employs a cache control methodology, however there are additionally both automated and manual approaches that work alongside this.
The first step to keeping content fresh is in detecting updates - generally, you can tell us when you’ve updated your site, or we’ll detect it automatically whenever it’s relevant.
There are five methods of detection:
Manual Scan + Entry
Extension of manual processes
When we first scan your site, that is what we call a ‘Manual Scan’. That is, we, or you (by entering your website) trigger a scan of all 100...1,000 pages...or however many pages you have. In this process we load each page, checking each one for compatibility issues, and then automatically performing the associated remediation where possible.
Within the Dashboard, go to Page Manager and here you can trigger a manual rescan of your entire site. The manual scan captures both new pages, and updated pages.
The Manual scan ensures we capture all pages, but depending on the size of your site, this process may take quite a while. Depending on a number of factors, we may scan 100-500 pages per hour, and if your site has 20,000 pages, it may take days to actually find the new content.
Did we miss a page? Sometimes a site will have pages that aren’t connected to any other pages. Take for example a one-off campaign, that relied on a single page. Often such ‘temporary’ or campaign-driven pages aren’t connected to the overall site, and so they’re missed when we’re in the detection process.
In the Page Manager, simply search for a specific page, and if it’s not found, you are then prompted to “Add and Scan this Page”. By doing this, we’ll then immediately find and scan the new page.
Within the Dashboard under Configuration > Advanced, we’ve provided a webhook that you can use. This is occasionally used in your CMS (Content Management System) so that whenever you hit ‘Publish’ it automatically pings us (via this webhook) and tells us to search for the new page(s). Where your CMS supports webhooks, we suggest adding this in accordingly on your end.
Whenever you trigger a manual scan, we crawl as many pages as we can detect that are connected to that page, until any new page has already been accounted for. In this sense, even though you’re looking to update a single page, we may detect new pages not previously accounted for. Again, this happens on occasion where pages have i) never been accessed - at least not since joining Chinafy, or ii) pages are created where there are no internal links to those pages.
This is the most technical approach, but also the most commonly used. Every file typically comes with Cache Control settings that define the file’s expiry time. Caching simply refers to how long a file is saved, or copied, and therefore Cache Control is simply the process or rules governing how long files are saved.
While CDNs follow (or ‘respect’) the Cache Control settings, when Chinafy discovers a file, it keeps a copy of the file irrespective of whenever it expires. If future scans indicate that the page or file no longer exists, we then remove that file from our database. We do this to avoid re-remediating any pages which may involve considerable processing (e.g. a page with 20 videos).
When a CDN caches a file, it does so for as long as the file tells it to. The typical behavior is that if the file tells the CDN to cache it for 1 day, the CDN then stores the file for 1 day. After that time has passed, the file is expired (and deleted), so that when the ‘next’ person looks for the file, it won’t exist. In this case, the CDN will look to Chinafy for that file, and if Chinafy doesn’t have it, it looks to you - the Origin server (or similarly, your current hosting, or ‘global site’) for that file.
In this sense, whenever a page (or file) is requested, it ensures the end-user is getting the most up-to-date version of that file. For this reason:
1st Time: The first time a user in China tries to access a page, it may be slow as the CDN doesn’t have a copy, and Chinafy may not be aware - or have processed it yet. This process is slow as in 2-3 seconds unless there is significant processing or remediation on the Chinafy side.
2nd, 3rd, 4th, etc..Time: On each subsequent time (so long as the file hasn’t expired), the file will have been processed, and ready for speedy delivery on the CDN.
If the file or page has a 5 min Expiry Time, and it’s accessed every 6 minutes, then each user will get a new or fresh copy of the page, but the page will load slowly each time.
If the file or page has a 1 min Expiry Time, and it’s accessed every 59 secs, then it’s slow for the first person, and fast for the next 4 people.
This is the natural process around how CDNs stay updated, and Chinafy employs the same approach with your server.
On the one hand, we could pre-fetch every single page whenever it expires so that the most freshest-version is always updated, but this incurs a lot of bandwidth and is a more expensive process. If you require this, please reach out to us.
When we discover pages, we check first to see if they’ve been updated vis-a-vis the versions we have on hand. If there are no updates - then nothing happens. If there are updates, then we scan, and process the page accordingly.
If someone in China is requesting: www.yoursite.com/thispage.html, and it’s a new page, it may take a few seconds to find the new page (on your server), and perform the various remediations on our side. Where a page has video, it may take 15-30mins for the videos to be processed accordingly.
Want to ensure a page is cached, and ready to go on our China CDN? Simply go to the Visual Speed Test, and watch it load in China. When this happens, both the CDN, and our servers access the page, and perform the associated remediations.
Not sure which approach is best for you? As mentioned as the onset, if you’d like, you can use our dashboard to explicitly tell us which pages are updated, and even how to remediate them - but otherwise, sit back, relax and we’ll take care of everything automatically.
Here’s a handy dandy graphic to keep track: