This is a guest post by Igor Krestov and Dan Taylor. Igor is a lead software developer at SALT.agency, and Dan a lead technical SEO consultant, and has likewise been credited with coining the term “edge SEO” SALT.agency is a technical SEO agency with workplaces in London, Leeds, and Boston, offering bespoke consultancy to brands worldwide. You can reach them both by means of Twitter
With this post we highlight the prospective applications of Cloudflare Employees in relation to browse engine optimization, which is more commonly described as ‘SEO’ using our research and screening over the past year making Sloth.
This post is focused on readers who are both skilled in composing performant JavaScript, as well as total newcomers, and less technical stakeholders, who have not actually composed lots of lines of code before.
Endless practical applications to conquer barriers
Dealing with numerous clients and jobs throughout the years we have actually continually encountered the very same problems and obstacles in getting their websites to a point of “technical SEO excellence”. A lot of these issues come from platform constraint at an enterprise level, tradition tech stacks, incorrect builds, and years of patching together various services and infrastructures.
As a team of technical SEO consultants, we can typically be left frustrated by these barriers, that often cause important repairs and executions either being not possible or delayed for months (if not years) at a time– and in this time, the service is frequently losing traffic and revenue.
Workers provides us a hail Mary service to a great deal of common disappointments in getting technical SEO implemented, and our company believe in the long run can end up being an integral part of getting rid of legacy issues, reducing DevOps expenses, speeding up lead times, all in addition to using an internationally distributed serverless platform with blazing quick cold start times.
Developing availability at scale
When we initially started, we needed to carry out easy redirects, which ought to be simple to develop on most of platforms but wasn’t supported in this instance.
When the 2nd barrier arose, we needed to inject Hreflang tags, cross-linking an old multi-lingual site on a bespoke platform build to an outdated specification. This required experiments to find an efficient way of carrying out the tags without increasing latency or adding new code to the server– in a way befitting of search engine crawling.
At this moment we had a variety of other applications for Employees, with occurring requirement for non-developers to be able to customize and deploy new Employee code. This has given that ended up being a concept of Worker code generation, via Web UI or command line.
Having actually established a number of different use cases for Employees, we determined 3 processing stages:
- Incoming demand adjustment– altering origin demand URL or including permission headers.
- Outbound reaction adjustment – including security headers, Hreflang header injection, logging.
- Action body modification– injecting/changing content e.g. canonicals, robots and JSON-LD
We wanted to create lean Worker code, which would enable us to keep each performance contained and independent of another, and chose a concept of filter chains, which can be utilized to make up fairly complex request processing.
A key accessibility problem we identified from a non-technical viewpoint was the goal attempting of making this serverless innovation accessible to all in SEO, due to the fact that with understanding comes buy-in from stakeholders. In order to do this, we had to make Employees:
- Available to users who don’t comprehend how to compose JavaScript/ Performant JavaScript
- Process of implementation can complement existing deployment processes
- Process of execution is protected (internally and externally)
- Process of implementation is certified with data and personal privacy policies
- Applications must have the ability to be verified through existing processes and practices (BAU)
Before we dive into actual filters, here are partial TypeScript user interfaces to show filter APIs:
interface FilterExecutor
use( filterChain: next: (c: Context, obj: Type) => ReturnType >, context: Context, obj: Type): ReturnType
interface RequestFilterContext Promise): void;.
// Add additonal response filter.
appendResponseFilter( filter: ResponseFilter): void;.
// Include body filter.
appendBodyFilter( filter: BodyFilter): space;.
user interface RequestFilter extends FilterExecutor ;.
user interface ResponseFilterContext Promise): space;.
appendBodyFilter( filter: BodyFilter): space;.
interface ResponseFilter extends FilterExecutor ;.
interface BodyFilterContext
user interface ChunkChain null;.
public portion: Uint8Array;.
user interface BodyFilter extends MutableFilterExecutor ;.
Request filter– Simple Redirects
To Start With, we want to point out that this is extremely specific niche use case, if your platform supports redirects natively, you must absolutely do it through your platform, however there are a number of minimal, tradition or bespoke platforms, where redirects are not supported or are restricted, or are charged for (per line) by your hosting or platform. For instance, Github Pages just support reroutes through HTML refresh meta tag.
One of the most standard redirect filter, would appear like this:
class RedirectRequestFilter
contractor( redirects)
this.redirects = redirects;.
apply( filterChain, context, request)
const requestFilterHandle = self.slothRequire('./ worker.js');.
requestFilterHandle.append( brand-new RedirectRequestFilter( sloth.cloud ));.
You can see it live in Cloudflare’s play area here
The one carried out in Sloth supports basic path matching, hostname matching and question string matching, as well as wildcards.
It is all well and good when you do not have a lot of redirects to handle, however what do you do when size of redirects starts to take up significant memory available to Worker? This is where we faced another scaling problem, in taking a little handful of possible redirects, to the 10s of thousands.
Managing Redirects with Employees KV and Cuckoo Filters
Well, here is one way you can fix it by utilizing Employees KV – a key-value information shop.
Instead of tough coding redirects inside Employee code, we will store them inside Workers KV. Naive technique would be to examine reroute for each URL. However Workers KV, maximum efficiency is not reached until a key is being checked out on the order of once-per-second in any provided data center.
Option could be using a probabilistic information structure, like Cuckoo Filters, stored in KV, potentially divided between a number of keys as KV is restricted to 64 KB. Such filter circulation would be:
- Retrieve frequently check out filter key.
- Inspect whether complete url (or only pathname) is in the filter.
- Get reroute from Worker KV using URL as a secret.
In our tests, we managed to pack 20 thousand reroutes into Cuckoo Filter taking up 128 KB, split in between 2 keys, confirmed against 100 thousand active URLs with a false-positive rate of 0.5-1%.
Body filter – Hreflang Injection
Hreflang meta tags require to be put inside HTML aspect, so prior to really injecting them, we require to discover either start or end of the head HTML tag, which in itself is a streaming search issue.
The caution here is that naive technique decoding UTF-8 into JavaScript string, performing search, re-encoding back into UTF-8 is relatively slow. Rather, we attempted pure JavaScript search on bytes strings ( Uint8Array), which quickly revealed appealing results.
For our usage case, we chose the Boyer-Moore-Horspool algorithm as a base of our streaming search, as it is easy, has terrific average case performance and just needs a pre-processing search pattern, with manual prefix/suffix matching at piece limits.
Here is comparison of approaches we have checked on Node v10150:
|Piece Size|Approach|Ops/s |
| ------------|--------------------------------------|--------------------- |
||| |
| 1024 bytes|Boyer-Moore-Horspool over byte array|163,086 ops/sec |
| 1024 bytes|** precomputed BMH over byte array **|**424,948 ops/sec ** |
| 1024 bytes|translate utf8 into strings & indexOf()|91,685 ops/sec |
||| |
| 2048 bytes|Boyer-Moore-Horspool over byte range|119,634 ops/sec |
| 2048 bytes|** precomputed BMH over byte variety **|**232,192 ops/sec ** |
| 2048 bytes|translate utf8 into strings & indexOf()|52,787 ops/sec |
||| |
| 4096 bytes|Boyer-Moore-Horspool over byte selection|78,729 ops/sec |
| 4096 bytes|** precomputed BMH over byte variety **|**117,010 ops/sec ** |
| 4096 bytes|decipher utf8 into strings & indexOf()|25,835 ops/sec|
Can we do better?
Having actually accomplished good performance improvement with pure JavaScript search over naive technique, we desired to see whether we can do better. As Workers support WASM, we used rust to build an easy WASM module, which exposed basic rust string search.
|Piece Size|Technique|Ops/s |
| ------------|-------------------------------------|--------------------- |
||| |
| 1024 bytes|Rust WASM|348,197 ops/sec |
| 1024 bytes|** precomputed BMH over byte variety **|**424,948 ops/sec ** |
||| |
| 2048 bytes|Rust WASM|225,904 ops/sec |
| 2048 bytes|** precomputed BMH over byte range **|**232,192 ops/sec ** |
||| |
| 4096 bytes|** Rust WASM **|**129,144 ops/sec ** |
| 4096 bytes|precomputed BMH over byte selection|117,010 ops/sec|
As rust variation did not utilize precomputed search pattern, it should be significantly much faster, if we precomputed and cached search patterns.
In our case, we were browsing for a single pattern and stopping as soon as it was discovered, where pure JavaScript variation was fast enough, but if you need multi-pattern, innovative search, WASM is the way to go.
We could not record statistically substantial change in latency, between standard worker and one with a body filter deployed to production, due to unsteady network latency, with a mean action latency of 150 ms and 10%90 th percentile basic discrepancy.
What’s next?
Our company believe that Employees and serverless applications can open up new opportunities to overcome a lot of problems dealt with by the SEO neighborhood when working with legacy tech stacks, platform limitations, and heavily overloaded development queues.
We are likewise examining whether Workers can permit us to make a more effective Tag Supervisor, which packages and pushes only matching Tags with their code, to lessen number of external demands caused by trackers and therefore lower load on user browser.
You can try out Cloudflare Workers yourself through Sloth, even if you don’t know how to compose JavaScript.