【原】HBase逻辑结构

BIGDATA云 2018-07-13

展开全文

HBase逻辑结构

RowKey第一位

ColumnFamily

ColumnQuiauer

value(TimeStamps)

Cell

物理结构

HMaster ----->NameNode

管理节点，用于管理HBase中的Table和Region的结构操作，比如用户增、删、修改表的操作。

在HBase集群中，可以启动多个HMaster，但是只能有一个HMaster属于Active的状态，通过ZooKeeper和其它standby状态

的HMaster进程完成，一个状态的切换，或者选举。

我们可以使用HMaster的shutdown放关闭整个集群，在关闭整个集群的时候，需要向通知HRegionServer进行关闭，并反馈给HMaster，HMaster才自行关闭。

HRegionServer----->DataNode

存放Region的服务器，需要在HMaster进行注册，如此才能在HMaster中对其进行管理，在HBase集群中，可以部署多个HRegionServer

HRegionServer是HBase中最核心的模块，直接负责对用户I/O的响应，直接向HDFS读写数据，一个HRegionServer中拥有一个HLog和多个HRegion，

那么这些HRegion共享一个HLog，也就是说多张表共享着一个HLog，HLog接受用户的操作数据，当然HRegion也写数据，当HRegion数据写入成功之后，

会发送一条删除数据的指令(数据写入成功的指令)，HLog的相关数据会首先被标记删除，等到一个阈值之后，HLog会执行一次文件的大合并major Compact,

会再删除数据。

HRegion

存放hbase中数据的一个概念，可以简单的理解为表的一个分区，存放一张表中的一部分数据，当该region中的数据超过一定量的时候，会自动进行分裂，

分裂成两个region(一分为二)，从这个角度上而言，Region是对hbase中表的一个横向的划分。

每一个HRegion有多个HStore组成，每一个HRegion是一张表中所有的列做成部分数据（也就是说部分记录），每一个region都有一个

startKey和endKey

假设，我一张表里面有100条记录，我要把它分别存放到10个region里面，又因为存放在hbase里面的数据都是有序的，是能够进行一个高速随机读写的，

也就是说有序能够保证我的快读，就需要能够通过rowkey，快速的定位到当前记录在哪一个region里面，然后当定位到region之后，再去扫描当前region，获取数据，

为了满足于此，我们就对这些region进行划分，编号，也是为了方便管理。这里每一个region的范围：[startKey, endKey),需要注意一定最后一个region的endkey是需要被包含进去的。

region 0 [null, 10)

region 1 [10, 20)

region 2 [20, 30)

region ... ...

region 9 [90, null]

HStore

每一个HRegion由多一个HStore来组成，一个HStore对应HRegion中的一个列族，一个HStore有一个MemStore和一个系列StoreFiles组成。

HStore级别不会持有锁，以及事务，锁和事务在更高一个级别或者说HRegion持有的，HStore最核心的一个service就是合并memstore刷新到

到磁盘里面的storefiles，把多个storefiles合并成为一个storefile，写到hdfs里面，写到hdfs里面的文件称之为hfile。

在写的过程中，唯一设计到hlog的部分就是关于hlog日志的重建的过程，当hstore将用户提交的数据最终写到了hdfs之后，会反馈给hlog，

将hlog里面冗余的数据删除掉。

hbase.hstore.compactionThreshold=3，当hstore个数超过3个之后就要开启hstore合并的工作

Compaction：

minjor compaction(小合并):

就是将多个HFile合并成为一个大的HFile，然后对之前的HFile做清除处理。

常见的会在执行删除数据的动作、以及达到hbase.hstore.compactionThreshold触发条件的时候发生，

删除数据：不会立即删除，做一个标记(标记删除)，等到执行合并操作的时候，才进行数据的处理。

marjor compaction(大合并):

将一个列族中的所有的HFile合并成为一个HFile，然后对之前的HFile做清除处理。

大合并非常消耗性能，非常耗时，不建议操作，当然是直接可以在shell执行操作的。

把在HBase中的HLog存在的意义称之为WAL(write ahead log,预写日志)机制，这种机制有对应的专业的数据结构SLM-Tree(Structured Log Merge-Tree),

这是HBase能够达到高速随机写，而且能够保障数据不丢失根本原因。

MemStore

注意：在memstore写的过程中，必须不能是多线程的(并行)调用的，hstore在调用的过程中必须持有一个读锁和写锁

在写的过程中，预先数据在memstore中进行排序，因为数据最终是有序存放，当memstore中的数据量超过阈值之后就会刷新到磁盘文件storefile中。

hbase.hregion.memstore.flush.size=128M,storeFile默认的大小就是128M---->刚好对应了一个datanode的block块的大小

StoreFile

最终保存HStore数据的文件，数据是由MemStore不断向磁盘刷新过程中产生的，当storefile达到一定量的时候，会将这些storefile组成一个storefiles。

这个storefiles有可能持有其他store里面的storefile。

HFile

在hdfs上存放数据之前的一个物理结构，用于接收从客户端提交过来的数据。

HFile中的数据都Key-Value键值对儿的方式存储，并且key和value都是字节数组。并且因为数据已经在memstore中排序过了，在hfile中也是有序的。

hfile同时是由一个个的block来组成的，最终k-v实际上是在这一个个的block中的，block的推荐的大小在8k~1M之间，默认大小65536byte-->16kb。

每一个block都有索引，hfile有由索引

官方建议：

blocksize在8k~1M之间，默认是64k

如果执行顺序读的操作，建议将blocksize调大一点点，这个会影响随机访问的效率

如果执行随机读的操作，建议将blocksize调小一点点，用默认就可以了

在扫描全表数据的时候，一定要指定start key和end key，不然容易操作OOM异常

HRegionServer

|---一个HLog

|---多个HRegion（一张表对应多个HRegion，一个HRegion存放了一张表中的一部分行，是对hbase表的一个横向的划分 scale out）

|---多个HStore(一个HStore对应一个列族，反之一个列族对应多个HStore，列族是对HBase表的纵向的划分)

|--一个MemStore

|--多个StoreFile

HFile

|---多个data block

HBase如何做到高速随机读？

rowkey---->region

先到memstore中去找，如果有，则直接取出<rowkey, <cf, <col, <ts, value>>>>

如果没有，则就去在hfile中找，通过索引定位到具体的block，然后遍历该block块，找到相应的数据

--------------------------------------------------------------------------

把在想memstore写数据的过程中，同时向hlog中写数据的这种解决问题的方式称之为LSM-Tree(Log Structure merge tree)

这种数据结构和B-Tree有些类似，也是引自于Google BigTable

--------------------------------------------------------------------------

行健的设计问题

行健的热点问题

是由于行健相似、连续且数据量过大操作成单region的数据量过大，进而影响读写效率

行健应该尽量的随机、不要出现连续行健。

常见的行健设计就是，比如手机号码倒置+时间戳，比如随机前缀+关系型数据库中的主键（以存放在mr中电信日志案例为例）

因为hbase提供的查询内容非常非常low，但是所有关于hbase的查询只能通过rowkey，所以

在设计行健的时候，应该考虑将尽量多的查询条件放到rowkey中去，形成的行健就成为复合键

列族的设计：

cf1----->"columnFamily"

cf2----->"cf"

建议hbase表是高表，不建议宽表，因为宽表拥有的列族很多，操作并跨越的文件(HFile)就很多，效率会有相应影响，

反之建议使用高表，列族不宜过多。

在设计表的时候，各个列/列族名称不宜过长，因为hbase需要对这些数据在内存中做缓存，做索引，进而影响内存容量，

所以建议不易过长，以便能够在内存中容纳更多的数据。至于阅读性，有项目文档搞定。

-------------------------------------------------------------------------

使用Hive来访问HBase

启动hive，进入hive的终端

hive --auxpath /opt/hive/lib/hive-hbase-handler-2.1.0.jar,/opt/hive/lib/zookeeper-3.4.6.jar --hiveconf hbase.master=uplooking02:16010 --hiveconf hbase.zookeeper.quorum=uplooking01,uplooking02,uplooking03

在Hive里面操作HBase

创建一张表：

如果hbase中不存在该表

我们只能在hive中使用创建内部表的方式，来创建一张表，同会在hbase中也会创建相关的表。

eg.

create table h2hb_1(

id int,

name string,

age int

)row format delimited

fields terminated by ','

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (

"hbase.columns.mapping" = ":key,cf:name,cf:age",

"hbase.table.name" = "t"

);

在hive中创建了一张表h2hb_1，有三列内容id, name, age,同时映射到hbase中的表t，其中id对应行健

name对应hbase中列族cf中的name，age同理

如果hbase中已经存在该表

如果使用上述建表语句创建的时候，则会报错，因为在hbase中已经存在了一张表为t，所以这时只能创建外部表去映射hbase中的一张表。

create external table h2hb_2

(id int,

name string,

age int

)row format delimited

fields terminated by ','

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (

"hbase.columns.mapping" = ":key,cf:name,cf:age",

"hbase.table.name" = "t"

);

我们即可对hbase中的表，使用hql来进行常见的分析操作，比较便利。

-------------------------------------------------------------------------

HBase和Phoenix的整合

安装Phoenix

约定安装到/opt目录下面

解压：

soft]# tar -zxvf phoenix-4.7.0-HBase-1.1-bin.tar.gz -C ../

重命名 opt]# mv phoenix-4.7.0-HBase-1.1 phoenix

拷贝lib目录下面jar包到regionserver机器的lib($HBASE_HOME/lib)目录

phoenix]# scp *.jar root@uplooking02:/opt/hbase/lib/

phoenix]# scp *.jar root@uplooking03:/opt/hbase/lib/

重启regionserver

hbase-daemon.sh stop regionserver

hbase-daemon.sh start regionserver

将phoenix中的client拷贝到hbase的client中,重启master

phoenix]# cp phoenix-4.7.0-HBase-1.1-client.jar /opt/hbase/lib/

另外，为了防止出错，将phoenix-4.7.0-HBase-1.1-client.jar添加到HBASE_CLASSPATH

vim /opt/hbase/conf/hbase-env.sh

export HBASE_CLASSPATH=$HBASE_CLASSPATH:/opt/hbase/lib/phoenix-4.7.0-HBase-1.1-client.jar

官网测试案例：

./psql.py uplooking01:2181 us_population.sql us_population.csv us_population_queries.sql

/**

* File format for hbase.

* A file of sorted key/value pairs. Both keys and values are byte arrays.

* <p>

* The memory footprint of a HFile includes the following (below is taken from the

* <a

* href=https://issues./jira/browse/HADOOP-3315>TFile</a> documentation

* but applies also to HFile):

* <ul>

* <li>Some constant overhead of reading or writing a compressed block.

* <ul>

* <li>Each compressed block requires one compression/decompression codec for

* I/O.

* <li>Temporary space to buffer the key.

* <li>Temporary space to buffer the value.

* </ul>

* <li>HFile index, which is proportional to the total number of Data Blocks.

* The total amount of memory needed to hold the index can be estimated as

* (56+AvgKeySize)*NumBlocks.

* </ul>

* Suggestions on performance optimization.

* <ul>

* <li>Minimum block size. We recommend a setting of minimum block size between

* 8KB to 1MB for general usage. Larger block size is preferred if files are

* primarily for sequential access. However, it would lead to inefficient random

* access (because there are more data to decompress). Smaller blocks are good

* for random access, but require more memory to hold the block index, and may

* be slower to create (because we must flush the compressor stream at the

* conclusion of each data block, which leads to an FS I/O flush). Further, due

* to the internal caching in Compression codec, the smallest possible block

* size would be around 20KB-30KB.

* <li>The current implementation does not offer true multi-threading for

* reading. The implementation uses FSDataInputStream seek()+read(), which is

* shown to be much faster than positioned-read call in single thread mode.

* However, it also means that if multiple threads attempt to access the same

* HFile (using multiple scanners) simultaneously, the actual I/O is carried out

* sequentially even if they access different DFS blocks (Reexamine! pread seems

* to be 10% faster than seek+read in my testing -- stack).

* <li>Compression codec. Use "none" if the data is not very compressable (by

* compressable, I mean a compression ratio at least 2:1). Generally, use "lzo"

* as the starting point for experimenting. "gz" overs slightly better

* compression ratio over "lzo" but requires 4x CPU to compress and 2x CPU to

* decompress, comparing to "lzo".

* </ul>

* For more on the background behind HFile, see <a

* href=https://issues./jira/browse/HBASE-61>HBASE-61</a>.

* <p>

* File is made of data blocks followed by meta data blocks (if any), a fileinfo

* block, data block index, meta data block index, and a fixed size trailer

* which records the offsets at which file changes content type.

* <pre><data blocks><meta blocks><fileinfo><data index><meta index><trailer></pre>

* Each block has a bit of magic at its start. Block are comprised of

* key/values. In data blocks, they are both byte arrays. Metadata blocks are

* a String key and a byte array value. An empty file looks like this:

* <pre><fileinfo><trailer></pre>. That is, there are not data nor meta

* blocks present.

* <p>

* TODO: Do scanners need to be able to take a start and end row?

* TODO: Should BlockIndex know the name of its file? Should it have a Path

* that points at its file say for the case where an index lives apart from

* an HFile instance?

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自： BIGDATA云 > 《HBase列式存储》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

BIGDATA云

关注对话

TA的最新馆藏

MongoDB社区版jar
银河麒麟V10 Linux 内核版本
[转] mongodb大法好，社区版在CentOS7云服务器上的安装教程
软著升级清单
[转] 治喉癌老偏方
[转] 9种中医体质的饮食调理

喜欢该文的人也喜欢更多

热门阅读换一换