Building and automating a scaled out Fredhopper/SmartTarget environment

While designing and building a Fredhopper/SmartTarget enterprise environment recently, a couple of interesting requirements came up. The first requirement, these days quite often asked for when building infrastructures, was that every Fredhopper component needed to be to be automatically deployed, configured and run. The second hard requirement was that in a production environment being under constant high load and being distributed across multiple data centres, the Fredhopper Index servers need to be in sync and highly available in each data centre, while having failover mechanisms causing the least amount of disruption time. After a lot of headache, trialling, erring and creating a mountain of broken Fredhopper instances in the process, we finally managed to meet requirements and this post shows the how of it.

If you have scrolled down already and think “TL;DR” or “enough talk, show me the code”, then jump straight to Creating a deployable SmartTarget/Fredhopper installation package. If you’re interested in the why of things, then please just carry on reading. :o)

In a nutshell, a Fredhopper installation consists of a couple of main logical components we will touch:

  • An Index instance: this is a file system based index server where data is stored
  • A Query instance: this is a separate process instance, which handles all querying into the Fredhopper Index
  • The Deployment Agent: this is the mother of all Fredhopper services. Through it, it is possible to manipulate all kinds of configuration settings as well as running processes and instances in your local Fredhopper installation.
  • The Business Manager: this is the Fredhopper administration interface.

When the SmartTarget component is installed into Fredhopper, essentially three things have happened:

  1. The possibility to push special Tridion data (SmartTarget Promotions ) into the Fredhopper Index is enabled.
  2. The SmartTarget GUI in Tridion can now communicate with the SmartTarget web service extension on the Fredhopper Index server to create and configure Promotions
  3. We can use the SmartTarget API to query the Fredhopper Index.

Altogether, the simplified logical architecture of a Fredhopper/SmartTarget stack then looks like this:

Simplified ST/FAS Logical Architecture

If this stack needs to be scaled out, the most common and fully supported way to do this, is by separating the Query instance from the Index instance. This way, one or more Query instance(s) on separate servers can be connected to a single Index instance, enabling distribution of traffic spread across the Query instances. In most cases this solves handling heavy load where it matters most: on the frond end. Architecturally, this scenario looks like this:

Normal scale out scenario

Things get a little more complicated when multiple data centres are involved and both data centres need to have Index servers serving data from the same Tridion source. The same is true when the requirement is that the Index server itself needs to be made highly available through clustering or load balancing. The main reason this is complicated for Index servers is because they are architected in such a way, that it is out of the box not possible to share the actual index and its business configuration in real time and automated over multiple active Index servers.

A further complication is the fact that the SmartTarget integration with Tridion actually only allows you to have one Index for one Publication Target; it’s only possible to configure only one Content Delivery Endpoint URL for one Publication Target. In addition, the SmartTarget Deployer extension only allows you to publish promotions to one Index.

All this is perfectly hackable though; we have experimented with extending the SmartTarget deployer and web service code in order to store data in multiple indexes at the same time. This however is obviously not ideal. It’s not supported and involves customization across multiple components. Even if this would be used, there would still be no way to automatically synchronize Fredhopper’s own business configuration settings, like setting Attributes live and so forth, across multiple indexes.

All of the above basically means that if you want to out-scale Index servers for one logical Publication Target, there’s no easy real-time way to update multiple Indexes simultaneously. Luckily, Fredhopper comes to the rescue here: it has tools to allow you to capture the state of the index, including its business configuration, in a zip-file and it allows you to import that zip-file into another Index instance. This means you can build a scenario where it’s possible to replicate all required data from one source Index server to one or more slave Index server(s), which subsequently can push newly received data out to their corresponding Query servers:

Scaling out the Index

The big caveat here is that if all Query servers are “active” in the load balancer, there will be a period of time where not all Query servers are in sync. However, given the fact that this is also the case in vanilla Fredhopper out-scaling scenarios, we took this as an acceptable limitation; even more so because using Fredhopper’s capture mechanism allows us to automatically synchronize the entire state of the source Index.

So, how did we build all this? We started out by creating a single Fredhopper deployment package as a zip file, with the SmartTarget integration already preconfigured in the zip. Next, we created a whole bunch of shell scripts which individually express what we needed for automated installs and replication. The following section shows this in detail.

Creating a deployable SmartTarget/Fredhopper installation package

Creating an installation package is straightforward. After downloading or otherwise obtaining the base Fredhopper deployment package, unzip it and then follow the setup guide located at the SDL LiveContent documentation site.

Before installing SmartTarget into Fredhopper, observe the following:

  1. Do not create a query instance. This is not needed, as we will create it on the fly.
  2. For the example scripts listed here, the initial Index instance is named to proto-smarttarget-index. Use this name when following the SmartTarget installation procedure or adapt the scripts as needed.

When following the SmartTarget installation procedure and you arrive at step 11, which loads the metadata.xml into the index, ensure this step runs successfully. This will be so when the metadata.xml file has actually disappeared from its install location.

Next, start your Index instance to check whether its running. If it is, shut everything down, including the Deployment Agent. Next, remove all log files and the topology.txt file. Finally, simply zip up everything under the Fredhopper base directory and upload that to a central distribution point like Nexus or BitBucket. Your deployment package is now ready for use.

Before jumping ahead on the deployment side of things, a note on the topology.txt file. In order to automatically deploy Fredhopper instances or failing over instances in case of malfunctioning Index servers, it is necessary to do either of two things:

  1. Explicitly create all topology.txt files for all your environments and store them separate from the deployment package. Copy the right topology.txt file in to the fredhopper config directory after extracting the deployment package.
  2. Extend the install.sh and createqueryserver.sh scripts below to programmatically write a correct topology.txt for the current Fredhopper installation you are performing.

Automate instance deployments

Now that we have a centrally stored installation package, we can use it to automatically deploy it on any server. To do this, an install script is created to automate all the steps. The script looks like this:

It does the following:

  1. Download zip file from distribution point
  2. Extract zip file
  3. Start the deployment agent
  4. Setup an Index instance

Before using the install script, first set two variables in the script:

  • BASE_PACKAGE_LOCATION: the HTTP location of the zip, e.g.: http://distribution-server.domain.net/fredhopper/fredhopper-st-base–1.0.0.zip
  • FREDHOPPER_ST_BASE_FILE: the name of the zip, e.g.: fredhopper-st-base–1.0.0.zip

The Fredhopper base can then be installed by using any or a combination of the following options:

Once the base installation is complete and the deployment agent is running, it is now possible to create actual instances in the base installation. This can either be an Index instance or a Query instance, or both. After that, the final step of setting up replication between Index instances completes a fully out-scaled and functionally working environment.

Creating Index Servers

Since we already have a proto-smarttarget-index in our installation package, we can simply use that to:

  1. rename the proto instance to the desired instance name
  2. rename the base index directory in the Fredhopper root directory to the same desired instance name.
  3. start it up once renamed.

We use the same install.sh script to “create” our Index, with the following command:

Creating Query Servers

Creating Query instances is easier to do for Fredhopper. We create it from scratch using the Fredhopper base installation package and point it to the Index instance. Depending on how you process the topology.txt file, you may need to add some more scripting to also automatically update it. To create query instances, we use the createqueryserver.sh script:

Synchronization of Index servers

The synchronization script listed below takes care of replicating Index instance data to other indexes. To automate running of this script, the following options are available:

  • Create a cron job to run this script periodically. The interval depends mainly on how long it takes for the index to be synchronized, how often the index changes and how long it takes to push index data to the live Query instances.
  • Trigger the synchronization every time a SmartTarget promotion is (un-)published. It’s for instance possible to create a post-processing Deployer module to trigger the script. Be careful though: if the script is already running, it should not run again until the previous run has ended!

The big caveat here is that a target Index instance has to be shut down in order to import data, but this will not bring down your Query instances; fortunately they can keep running even when their index server is down, so in normal situations they won’t notice a thing.

Synchronizing Index instances is pretty straightfoward:

#!/bin/sh
##
# Syncs index and business configuration. Run this script from the Fredhopper root!
#
# Command Line parameters:
# SOURCE_LOCATION: the hostname of the source index
# DESTINATION_LOCATION: the hostname of the target index
# SOURCE_INDEX_NAME: the name of the source index
# DESTINATION_INDEX_NAME: the name of the target index
# CAPTURE_FILE_NAME: the file name of the capture file
# FREDHOPPER_BASE_DIR: the base directory of the Fredhopper installation you are on
#
# example:
#
# ./sync.sh index-server-1.domain.net index-server-2.domain.net index-1 index-2 /tmp/captures/20151210-1100-capture.zip /usr/share/fredhopper

SOURCE_LOCATION=$1
DESTINATION_LOCATION=$2
SOURCE_INDEX_NAME=$3
DESTINATION_INDEX_NAME=$4
CAPTURE_FILE_NAME=$5
FREDHOPPER_BASE_DIR=$6

echo "Start exporting"
$FREDHOPPER_BASE_DIR/bin/deployment-agent-client --location http://$SOURCE_LOCATION:8177/ -O $CAPTURE_FILE_NAME export-capture -c -i -x $SOURCE_INDEX_NAME
echo "Finished exporting"
echo "Stopping index: $DESTINATION_INDEX_NAME"
$FREDHOPPER_BASE_DIR/bin/instance $DESTINATION_INDEX_NAME stop
echo "Start importing"
$FREDHOPPER_BASE_DIR/bin/deployment-agent-client --location http://$DESTINATION_LOCATION:8177/ import-capture $DESTINATION_INDEX_NAME $CAPTURE_FILE_NAME
echo "Finished importing"
echo "Starting stuff"
$FREDHOPPER_BASE_DIR/bin/instance $DESTINATION_INDEX_NAME start
echo "Finished starting stuff. If required, perform a fresh-index-to-live to get the new data to the query servers"

Wrap up

If you have made it this far, then you have definitely earned a handful of cookies. I hope this post has shown you ways to take the path to enterprise automation of the SmartTarget / Fredhopper stack. When we started this, we weren’t even sure whether automating deployments and scaling out Fredhopper environments was even possible in the way we needed it. Fortunately, it turns out that it perfectly is.

One thought on “Building and automating a scaled out Fredhopper/SmartTarget environment

  1. Very cool stuff.

    For most customers, the Fredhopper syncserver and syncclient(s) should be your first stop to outscaling. They are made to synchronize indexes and configuration between various instances. And SmartTarget will by default automatically trigger the synchronization whenever you save a Promotion or Experiment.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>