Migrating from previous HBase versions
To migrate data from a previous HBase version, see Upgrading
Migrating to Amazon EMR version 7.4.0 or later
Note
Follow these guidelines if you're migrating from an EMR release earlier than 7.4.0 to a release greater than 7.3.0.
If you are currently running an EMR version with Amazon's Store File Tracking feature enabled, which is included in versions 6.2.0 to 7.3.0, and you want to upgrade to a version with OSS Store File Tracking, which is available on EMR versions later than 7.3.0, follow these steps:
In the existing cluster:
Disable the
hbase:storefiletable.Drop the
hbase:storefiletable.Flush
hbase:meta.Wait for the metadata to be updated.
In the new cluster:
Set the same Amazon S3 directory as the root directory.
Start the cluster with the
DefaultStoreFileTrackerimplementation:{ "Classification": "hbase-site", "Properties": { hbase.store.file-tracker.impl: "org.apache.hadoop.hbase.regionserver.storefiletracker.DefaultStoreFileTracker" } }At the table or column family level, use the following commands to change the store file tracker:
Change the table's or table column family's Store File Tracker:
hbase> change_sft 't1','FILE' hbase> change_sft 't2','cf1','FILE'Change all of the table's Store File Tracker matching the given regular expression (regex):
hbase> change_sft_all 't.*','FILE' hbase> change_sft_all 'ns:.*','FILE' hbase> change_sft_all 'ns:t.*','FILE'
Migrating HBase on Amazon S3 clusters to Amazon EMR Version 7.12.0 or later using Read-Replica clusters
Starting with EMR 7.12.0, you can switch a read-replica HBase on Amazon S3 cluster from read-only mode to active mode, enabling both read and write operations. This functionality is provided through two new HBase shell commands.
readonly_stateRetrieves the current read-write operational state of the cluster.
Output:
INACTIVE - Cluster is in read-only mode and write is in-active.
ACTIVE - Cluster supports both read and write operations.
readonly_switchEnables or disables read-only mode with configurable options for the switching process.
Syntax:
readonly_switch <readonly>,<force_flush>,<force_refresh_meta>,<force_refresh_hfile>Parameters:
readonly (required) - Boolean value to enable (true) or disable (false) read-only mode
force_flush (optional) - Forces data flush before switching from active to read-only mode (default: true)
force_refresh_meta (optional) - Forces meta table refresh when switching from read-only to active mode (default: true)
force_refresh_hfile (optional) - Forces HFile refresh when switching from read-only to active mode (default: true)
Migration Steps
If you are currently running an EMR 6.0.0+ HBase on Amazon S3 cluster and want to migrate to an EMR 7.12.0 or later cluster, follow these steps:
Ensure your source cluster is in a stable state with no inconsistencies using the hbck report or stuck procedures from the HBase master UI.
sudo -u hbase hbase hbck > hbck_report.txtEnsure there are no regions in the SPLIT state on the source cluster:
If there are regions in SPLIT state, run major compactions on the respective tables and wait for them to complete
major_compact <table_name>Run
catalogjanitor_runin the HBase shell after the compaction is complete
Create a new EMR 7.12.0+ cluster configured as a read-replica pointing to the same Amazon S3 location as your source cluster. Refer to this blog
for more details on how to set up a read replica cluster. Launch the new cluster with the DefaultStoreFileTracker configuration as mentioned in the above steps if you want to upgrade to the OSS Store file tracking. Wait for the master node to initialize completely. Verify data accessibility by reading the tables and confirm the new cluster is in read-only mode
hbase:001:0> readonly_state Took 0.4612 seconds => "INACTIVE"Disable balancing and compactions on the source cluster:
echo "balance_switch false" | hbase shell echo "compaction_switch false" | hbase shellEnsure there are no overlaps/inconsistencies showing up in the read-replica cluster UI and verify that regions show OPEN status and are properly assigned.
Convert the Store file tracking using the commands on the read-replica cluster mentioned in the section above if you want to change to FileBasedTracker.
Stop the jobs pointing to the source cluster, flush all the tables, and shut down the source cluster. Wait for complete termination before proceeding.
echo "flush 'usertable'" | hbase shell echo "flush 'hbase:meta'" | hbase shell echo "flush 'hbase:namespace'" | hbase shellSwitch the read-replica cluster to active mode to enable write operations. After completing this step, your new cluster will support both read and write operations, and the migration is complete.
hbase:010:0> readonly_switch false Took 38.1568 secondsValidate writes on the new cluster and ensure all regions are serving requests.
Note
There can be only one active cluster pointing to an Amazon S3 location at any point in time. Therefore, switching the read-replica to active should be done only after the source cluster is terminated.