HBase regions - Split
-
Hot
Spotting: Where uneven key-space distribution can funnel a huge
number of requests to a single HBase region, bombarding the RegionServer
process and cause slow response time.
-
Pre
Regions: If the key distribution is known in advance, it is
always good to create pre-regions while creating HBase tables. This will help
in uniform distribution and faster access of the data.
-
Please note that too many regions can cause
degraded performance too. It should be a good mix of data, cluster resources
available.
-
How to create pre-regions: HBase pre regions
can be created in two ways
o While
creating HBase tables
create ‘sample_hbase_table’,’c’,{SPLITS=>['a','b']}
create ‘sample_hbase_table’,’c’,{SPLITS=>['a','b']}
o Splitting
regions when HBase is online
Regions can be split even after table is created and data is loaded. (This will be most typical when we do not know the distribution of data beforehand). There are many ways of doing it.
Regions can be split even after table is created and data is loaded. (This will be most typical when we do not know the distribution of data beforehand). There are many ways of doing it.
§ Via
HBase Terminal : ‘split’ command
eg: split ‘sample_hbase_table’
eg: split ‘sample_hbase_table’,’c’
eg: split ‘sample_hbase_table’
eg: split ‘sample_hbase_table’,’c’
§ Via
HBase GUI: The HBase mater page will contain list of all tables (both catalog
and user table), list of region servers, dead regions, backup masters, etc. The
URL to UI is http://<HBase-Master-Name> :60010/master-status
The GUI, will have link to each HBase table the master has.
eg: http:// <HBase-Master-Name> :60010/table.jsp?name= sample_hbase_table
Here, there will be option to split the regions.
The GUI, will have link to each HBase table the master has.
eg: http:// <HBase-Master-Name> :60010/table.jsp?name= sample_hbase_table
Here, there will be option to split the regions.
HBase regions - Merge
-
There can be scenarios when the regions go
out of hand and we need to merge regions. HBase provides a way to merge two
regions into one.
-
Go into the HBase bin directory and issue the
following command (eg)
-
./bin/hbase
org.apache.hadoop.hbase.util.Merge sample_hbase_table sample_hbase_table,,1447622532624.5c6b185e5aa64c61f5915a4aa1ed96e4. sample_hbase_table,00,1447685933570.dcb1b5c1dedf69bd7bab08e12427f6ae.
-
Controlling the requests send to HBase server by client
-
Controlling the number of requests send by
the HBase client to the server is a good practice. There can be scenarios when
the HBase server is loaded and is not able to handle all the requests received
by it. The client program should be intelligent enough to delay sending its
request in such conditions rather than bombarding the region servers with more
and more requests. If the client keeps sending more requests, they will be kept
in queue and it might take longer than the normal wait time and hence resulting
in failures with ‘HBaseRetriesExhaust exception’.
HBase Write Path
-
The write path is how HBase completes the
PUT/DELETE operations. This path starts with the client, moves to the region
server and eventually ends in HBase data file known as the HFile. Region
servers handles the HBase tables. HBase tables can be large and hence they are
partitioned down to regions. Region servers handles one or more regions. The
client contacts the region servers for any requests. The write requests
received by region server cannot be fulfilled instantly by the HBase because
the data in HFile is sorted (also they are immutable). So they are stored in
‘memstore’ until enough data accumulates in the memstore and then write happens
to HDFS.
-
-
The Write Ahead Log (WAL) is present to prevent
any data loss if the system crashes. The memstore is in memory (volatile) and
data will be lose if the server crashes.
HBase Memstore
-
When region server receives write request, it
directs to specific region. The data is getting written into memstore first.
Memstore is kept in the main memory of region server.
-
The main reason to use memstore is that we need
to store the data in HDFS in a sorted manner.
-
When the memstore reaches a limit, the data is
flushed to HFile.
-
Each memstore flush will create new HFile for
each column family.
-
While reading, the HBase first checks the
requested data in memstore and then goes into HFile.
HBase Table Migration
-
The use case involves replicating a big HBase
table in one cluster to another cluster.
-
There are multiple ways to accomplish this task:
1.
Sequence
Files: Taking backup of existing HBase table using ‘Export’ API by HBase,
copying the data to new cluster and then loading new HBase tabe using ‘Import’.
(The ‘Export’ will generate sequence files, which will be used by ‘Import’ to load
data)
2.
HFiles: Generating
HFiles of original HBase table, copying the data to new cluster and then doing
a bulk load to new table. (one of the efficient way)
3.
Copy
Command: Directly using ‘copy table’ command to copy source table to
destination table in another cluster.
4.
Snapshots:
Taking a ‘snapshot’ of original table and creating the new table in destination
cluster. (efficient way)
-
By default the snapshots for HBase is turned off
for HBase 0.94. Also, API for generating HFiles is not available in 0.94 version.
-
Restoring table was giving lot of pain, this is
because the normal writes to HBase follow the write path. This might causing
blocking writes by the region server if the load is heavy.
‘org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException’