Hengshi Sense Engine Upgrade

Hengshi Sense 3.0 upgrades the built-in engine from GP5 to GP6 as compared to the 2.x version.

  • Because the engine major version upgrade requires manual operations, the default upgrade installation of Hengshi Sense 3.0+ will retain the old version's GP5 engine. If you need to upgrade to GP6, please follow the manual process below.
  • A fresh installation of Hengshi Sense 3.0 will directly install the new GP6 query engine.
  • During the upgrade process: All GP machines, including Master and Segments, must reserve twice the space of the current GP node's data capacity (once for backup and once for new cluster storage). Check the current node data usage with "du /opt/hengshi/engine-cluster/data -sch"
  • When upgrading, the number of Segments must be the same as the number of nodes before the upgrade.
  • Since backup and restoration depend on ssh, it's best to configure passwordless logins.

Main New Features of GP6

  • Improved query performance (about 20% for aggregate calculations)
  • Online expansion without stopping the system
  • Support for jsonb data type

Disk Preparation

Assuming the desired installation directory is /opt/hengshi

  • Check the current size of GP data
HENGSHI_HOME=/opt/hengshi
du ${HENGSHI_HOME}/engine-cluster -sch

During the migration process, an additional twice the amount of free space of the data size found above is required, with once the space for exporting data and once the space for importing into GP6.

GP5 to GP6 Upgrade Process

The following operations are performed on the Master machine of GP. During the gpbackup phase, all Segments will parallelly write data into the local directory specified by the backup-dir parameter on their respective machines; likewise, gprestore will also have all Segments locate the backup data files on their own machines for loading.

1 Stop all services and only start the engine

HENGSHI_HOME=/opt/hengshi
${HENGSHI_HOME}/bin/hengshi-sense-bin stop all
${HENGSHI_HOME}/bin/hengshi-sense-bin start engine

2 Prepare the hengshi-sense installation package and unzip to hengshi-[version] directory, directory preparation process
3 Prepare migration tools

cd hengshi-[version]
wget http://download.hengshi.io/3rd/pivotal_greenplum_backup_restore-1.15.0-1.tar.gz

4 Update the configuration about the engine

HENGSHI_HOME=/opt/hengshi
cd ${HENGSHI_HOME}
test -f conf/hengshi-sense-env.sh || cp conf/hengshi-sense-env.sh.sample conf/hengshi-sense-env.sh
set_kv_config() {
    local config_file="$1"
    local param="$2"
    local val="$3"
    # edit param=val if exist or insert new param=val
    grep -E "^\s*${param}\s*=" "${config_file}" > /dev/null \
                || sed -i "$ a ${param}=${val}" "${config_file}"
}
set_kv_config conf/hengshi-sense-env.sh HS_PG_DB postgres
set_kv_config conf/hengshi-sense-env.sh HS_PG_USR postgres
set_kv_config conf/hengshi-sense-env.sh HS_PG_PWD postgres
set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_DB postgres
set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_USR postgres
set_kv_config conf/hengshi-sense-env.sh HS_ENGINE_PWD postgres

5 Export GP5 data, select the directory for storing exported data, which requires free space that is once the size of the current data. For example: ${HENGSHI_HOME}/gpbackup

export HENGSHI_HOME=/opt/hengshi
cd hengshi-[version]
tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, unzip to the current GP5 symlink directory
bash #launch a new bash
source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh
psql postgres -c "drop function if exists public.safe_to_number(text)"
# backup
gpbackup --dbname postgres --backup-dir ${HENGSHI_HOME}/gpbackup --compression-level 9
exit #exit new bash

Notes:

  • If the database name is not postgres, specify according to the actual situation.
  • If there are multiple databases, each one needs to be backed up separately, and specify a different --backup-dir.
  • The value for --compression-level is 1-9, the higher the value, the higher the compression ratio and the longer it takes. Based on self-test results, when the level is 6, it takes about 1 hour for 100G, and the size of the backup is nearly 30G (for reference only).
  • For other parameters of the gpbackup command, refer to gpbackup.

1 Stop GP5 and start GP6

HENGSHI_HOME=/opt/hengshi
cd hengshi-[version]
cp -r lib/gpdb-6* ${HENGSHI_HOME}/lib
cd ${HENGSHI_HOME}
bin/hengshi-sense-bin stop engine
mv engine-cluster engine-cluster.gp5.bak
gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-* -dvr --color=never| head -n 1)
gpdb_name=${gpdb_name##*/}
rm -f ${HENGSHI_HOME}/lib/gpdb
cd ${HENGSHI_HOME}/lib
ln -sf ${gpdb_name} ${HENGSHI_HOME}/lib/gpdb
cd ${HENGSHI_HOME}
bin/hengshi-sense-bin init engine
bin/hengshi-sense-bin start engine

2 Import data

export HENGSHI_HOME=/opt/hengshi
cd hengshi-[version]
tar -xf pivotal_greenplum_backup_restore-1.15.0-1.tar.gz -C ${HENGSHI_HOME}/lib/gpdb/gpdb/ #must execute, unzip to the current GP6 symlink directory
bash #launch a new bash
source ${HENGSHI_HOME}/engine-cluster/export-cluster.sh
# find all timestamp (14chars)
find ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/ -maxdepth 2 | sort
# restore with a timestamp
gprestore --backup-dir ${HENGSHI_HOME}/gpbackup --timestamp xxxxxxxxxxxxxx
exit #exit new bash

Notes:

  • If there are problems with the import and you need to re-import, you can perform the following steps to re-initialize and start again.
cd ${HENGSHI_HOME}
bin/hengshi-sense-bin stop engine
rm -rf engine-cluster
bin/hengshi-sense-bin init engine
bin/hengshi-sense-bin start engine
  • Using this method of import will not import global objects, including Tablespaces, Databases, Database-wide configuration parameter settings (GUCs), Resource group definitions, Resource queue definitions, Roles, GRANT assignments of roles to databases, refer to Parallel Backup with gpbackup and gprestore. Therefore, there may be situations where roles or queues do not exist. The solutions are:
    • Specify the --with-globals option, but you might be prompted that roles or queues, etc., already exist, so you need to check before importing and delete them; or you can ignore by specifying the --on-error-continue option, but this option will ignore all errors, so use with caution.
    • Manually create them, open the ${HENGSHI_HOME}/gpbackup/SegDataDir-1/backups/YYYYMMDD/YYYYMMDDHHMMSS/gpbackup_YYYYMMDDHHMMSS_metadata.sql file to see which roles, queues, etc., have been created (this information is usually at the beginning), and then manually execute the creation. Existing roles and queues can be ignored, and if there are authorization operations for roles and queues, the corresponding authorizations also need to be executed. Generally, these statements are at the beginning of the file. Using this method may miss certain operations, so be aware.
  • If prompted that safe_to_number does not exist, manually create one
CREATE OR REPLACE FUNCTION SAFE_TO_NUMBER(text)
RETURNS numeric IMMUTABLE STRICT AS
$$
BEGIN
  RETURN $1::numeric;
EXCEPTION WHEN OTHERS THEN
  RETURN NULL;
END
$$ LANGUAGE plpgsql;
  • If the database name does not exist, you can specify the --create-db option to automatically create the database. If it already exists, do not specify it, or else an error will be thrown.
  • You can specify --metadata-only to only import metadata, including table creation but not data.
  • You can specify --data-only to only import data and not include table creation.
  • Based on self-test results, the time taken is approximately 1.5 times the backup when the compression level is 6.
  • For the gprestore command, refer to gprestore.

If Upgrade Fails GP6 -> GP5 Rollback

1 Stop all services

HENGSHI_HOME=/opt/hengshi
${HENGSHI_HOME}/bin/hengshi-sense-bin stop all

2 Delete the GP6 data directory

HENGSHI_HOME=/opt/hengshi
cd ${HENGSHI_HOME}
test -d engine-cluster.gp5.bak && rm -rf engine-cluster

3 Restore GP5

HENGSHI_HOME=/opt/hengshi
cd ${HENGSHI_HOME}
mv engine-cluster.gp5.bak engine-cluster
gpdb_name=$(ls ${HENGSHI_HOME}/lib/gpdb-5* -dvr --color=never| head -n 1)
gpdb_name=${gpdb_name##*/}
rm -f ${HENGSHI_HOME}/lib/gpdb
cd ${HENGSHI_HOME}/lib
ln -sf ${gpdb_name} ${HENGSHI_HOME}/lib/gpdb

Clean up data after a successful upgrade

HENGSHI_HOME=/opt/hengshi
cd ${HENGSHI_HOME}
rm -rf engine-cluster.gp5.bak
rm -rf lib/gpdb-5*