HBase权威指南中文版第三章翻译

闲来看看 2013-08-14

展开全文

第三章客户端API: 基础篇(第二部分)

Get操作

Get方法是用来从HBase中取出相应的数据。可以根据它们一次取出的条数分成两类：单条Get、多条Get。

单条Get

可以使用如下的接口从HBase出取出特定的数据出来：

Result get(Get
get) throws IOException

与Put类类似，Get类提供了一个get方法，在调用该方法时，您同样需要提供一个Get实例，该实例必须指定一个rowkey，它有如下的两种创造函数：

Get(byte[] row)

Get(byte[] row,
RowLock rowLock)

一个get方法从来取一个特殊的行，但可以取出这一行中的多个列。Get类的构造函数必须指定一个row，第二个构造函数还可以指定一个RowLock，允许您使用一个自己定义的行锁。与Put类相似，Get类也指供了大量的方法从来设置您所要找的行，或者精确到一个具体的Cell：

Get
addFamily(byte[] family)

Get
addColumn(byte[] family, byte[] qualifier)

Get
setTimeRange(long minStamp, long maxStamp) throws IOException

Get
setTimeStamp(long timestamp)

Get
setMaxVersions()

Get
setMaxVersions(int maxVersions) throws IOException

addFamily将查询的行限制到特定的Column Family上。它可以被调用多次来添加多个Column Family。对于addColumn方法也是一样的。您可以给Get实例添加更多的限制条件，比如时间戳范围、版本数目等。

一次get操作允许取回一行记录的多个版本，在不设置取回的版本数目下，默认返回最近的一个版本。如果您有所怀疑，可以通过接口getMaxVersions()查看。对于无参的setMaxVersions()调用，会将版本数设为Integer.MAX_VALUE，从而取回这一行对应的所有版本。Get类也提供了其它的一些函数调用，在表3-4中列出了他们的用法。

表3-4 Get类部分方法列举

Method	Description
getRow()	Returns the row key as specified when creating the Get instance.
getRowLock()	Returns the row RowLock instance for the current Get instance.
getLockId()	Returns the optional lock ID handed into the constructor using the rowlock parameter. Will be -1L if not set.
getTimeRange()	Retrieves the associated timestamp or time range of the Get instance. Note that there is no getTimeStamp() since the API converts a value assigned with set TimeStamp() into a TimeRange instance internally, setting the minimum and maximum values to the given timestamp.
setFilter()/getFilter()	Special filter instances can be used to select certain columns or cells, based on a wide variety of conditions. You can get and set them with these methods.
setCacheBlocks()/ getCacheBlocks()	Each HBase region server has a block cache that efficiently retains recently accessed data for subsequent reads of contiguous information. In some events it is better to not engage the cache to avoid too much churn when doing completely random gets. These methods give you control over this feature.
numFamilies()	Convenience method to retrieve the size of the family map, containing the families added using the addFamily() or addColumn() calls.
hasFamilies()	Another helper to check if a family—or column—has been added to the current instance of the Get class.
familySet()/ getFamilyMap()	These methods give you access to the column families and specific columns, as added by the addFamily() and/or addColumn() calls. The family map is a map where the key is the family name and the value a list of added column qualifiers for this particular family. The familySet() returns the Set of all stored families, i.e., a set containing only the family names.

表3-4中表出的getter方法，只能取出对Get设置过的值。因此，他们很少被用到。

在前面提到过，HBase提供了一个名为Bytes的帮助类，该类提供了很多的静态方法实现Java中的类型与byte数组的转换。它也提供了反向的转换，即从byte数组，解析出相应的Java类型。下面给出了Bytes类的一些方法：

static String
toString(byte[] b)

static boolean
toBoolean(byte[] b)

static long
toLong(byte[] bytes)

static float
toFloat(byte[] bytes)

static int
toInt(byte[] bytes)

示例3-8演示了如何使用它们：

示例3-8 从HBase中获取数据

Configuration conf
= HBaseConfiguration.create();

HTable table = new
HTable(conf, “testtable”);

Get get = new
Get(Bytes.toBytes(“row1″));

get.addColumn(Bytes.toBytes(“colfam1″),
Bytes.toBytes(“qual1″));

Result result =
table.get(get);

byte[] val = result.getValue(Bytes.toBytes(“colfam1″),

Bytes.toBytes(“qual1″));

System.out.println(“Value:
” + Bytes.toString(val));

首先创建一个HBase的配置文件，初始化一个HTable的实例。创建一个指定向row1的Get实例，向Get中添加一个colfam1的列，和一个qual1的qualifier。然后从HBase中取出这一行的对应的数据，最后将数据转化成相应的格式并打印出来。如果您运行上述的示例代码，应该打印出：

Value: val1

Result类

如果调用get()函数取出数据，您将会得到一个Result对象，它持有所有相符的Cell。当您使用特定的行、特定的查询条件（如column family,
column qualifier, timestamp等）从HBase服务器上取出一个Result对象后，您可以利用它来取出所有您想要的结果。

就像前面示例3-8给出的一样，您可以得到更多的维度信息。比如，要求服务器返回指定column family的所有列，这样您在客户端侧就可以通过get方法取得所有的列信息。下面给出了Result类提供的一些方法：

byte[]
getValue(byte[] family, byte[] qualifier)

byte[] value()

byte[] getRow()

int size()

boolean isEmpty()

KeyValue[] raw()

List<KeyValue>
list()

getValue方法从HBase中一个特定的Cell中取出数据。您可以不设定时间戳、版本数，从而获得最近的一个版本数据。由于服务器上同一行的数据按照版本由新到旧的顺序排度，因此，总是服务器查到的第一次记录就是最新的一个版本。

前面已经介绍过getRow()：它返回rowkey，即在创建Get实例时指定的rowkey。size()方法可以得到服务器返回的KeyValue实例的数目。isEmpty可以判断KeyValue实例的数目是否为空。

通过row方法，可以得到当前Result实例后存储的一组KeyValue实例数组，list方法可以将KeyValue数组对象转化成一个List对象，从而可以简单地通过迭代器进行访问。

raw()方法返回的数组已经经过了排序，排序的维度是Column family
> Column qualifier > timetamp > type。

还有一组面向Column的方法：

List<KeyValue>
getColumn(byte[] family, byte[] qualifier)

KeyValue
getColumnLatest(byte[] family, byte[] qualifier)

boolean containsColumn(byte[]
family, byte[] qualifier)

要得到一个列对应的一组KeyValue，您必须先调用setMaxVersions设定要取得多个版本，否则只能得到一个KeyValue。getCOlumnLatest返回这个列对应的最新的一个Cell。getValue()方法并不返回一个raw字节数组，而是返回KeyValue对象。containsColumn可以非常便捷的查看返回的Cell中是否指定的Column列。

所有的方法中的qualifier字段都可以设置为null，这样可以匹配qualifier为空的列。Qualifier为空意味着列没有label。当查看一个表中的数据时，比如使用shell命令，您必须了解表中有哪些列。很少会使用到空的qualifier，在这些情况下，意味着只有一个column，这时column family便起到了column的作用。

还有另一个方法集，可以对请求得到的数据进行访问。它们是面向map的访问方式：

NavigableMap<byte[],
NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

NavigableMap<byte[],
avigableMap<byte[], byte[]>> getNoVersionMap()

NavigableMap<byte[],
byte[]> getFamilyMap(byte[] family)

getMap是更通用的调用方式，以Java Map的方法，返回整个result集合，可以通过迭代的方式遍历所有的值。getNoVersionMap()方法只返回最新的一个版本的数据。第三个getMap方法返回指定family下的所有版本的value值。

使用哪组接口访问Result对象取决于你的习惯; 数据已经通过网络从服务器转输到了客户端，并不存在效率的差别。

批量Get

put方法中，可以一次插入一组Put对象。类似地，get操作也允许一次从服务器上取一组Get对象。批量Get是一种高效地访问HBase的方法，但同样不能保证多条数据之间的顺序。

从前面的图3-1可以看出来，请求不只会发送到一台服务器上，但从客户端看来，仿佛只有一条请求发出。

批量Get的API定义如下：

Result[]
get(List<Get> gets) throws IOException

跟前面的批量put一样，您需要先创建一个队列来存储Get实例，这些实例保存要请求的条件，而服务器端返回查询出来的Result结果。示例3-9给出了如何使用两种不同的方式取数据。

示例3-9 批量从HBase中取数据

byte[] cf1 =
Bytes.toBytes(“colfam1″);

byte[] qf1 =
Bytes.toBytes(“qual1″);

byte[] qf2 =
Bytes.toBytes(“qual2″);

byte[] row1 =
Bytes.toBytes(“row1″);

byte[] row2 =
Bytes.toBytes(“row2″);

List<Get>
gets = new ArrayList<Get>();

Get get1 = new
Get(row1);

get1.addColumn(cf1,
qf1);

gets.add(get1);

Get get2 = new
Get(row2);

get2.addColumn(cf1,
qf1);

gets.add(get2);

Get get3 = new
Get(row2);

get3.addColumn(cf1,
qf2);

gets.add(get3);

Result[] results =
table.get(gets);

System.out.println(“First
iteration…”);

for (Result result
: results) {

String
row = Bytes.toString(result.getRow());

System.out.print(“Row:
” + row + ” “);

byte[]
val = null;

if
(result.containsColumn(cf1, qf1)) {

val
= result.getValue(cf1, qf1);

System.out.println(“Value:
” + Bytes.toString(val));

}

if
(result.containsColumn(cf1, qf2)) {

val
= result.getValue(cf1, qf2);

System.out.println(“Value:
” + Bytes.toString(val));

}

System.out.println(“Second
iteration…”);

for (Result result
: results) {

for
(KeyValue kv : result.raw()) {

System.out.println(“Row:
” + Bytes.toString(kv.getRow()) +

” Value:
” + Bytes.toString(kv.getValue()));

}

示例中首先定义了一组byte数据，用来存放column family的名字、column qualifier的名字、row的名字。然后创建一个List保存所有的Get请求对象。最后调用HTable的get方法，从服务器上批量读取数据。在第一个迭代遍历的过程中，只打印出colfam1、qual1对应的列和colfam1、qual2对应的值。第二个迭代遍历的过程中打印出所有取到的值。

假设您在运行了示例3-4之后，运行示例3-9，那么您将得到如下的输出：

First iteration…

Row: row1 Value:
val1

Row: row2 Value:
val2

Row: row2 Value:
val3

Second
iteration…

Row: row1 Value:
val1

Row: row2 Value:
val2

Row: row2 Value:
val3

两次迭代过程会打印相同的值。示例告诉您，如何访问批量get的结果。您现在还不了解的便是出错如何通知到您。这和前面讲到的put是有所不同的，get操作要么取出与Get实例大小相等的结果，要么抛出一个异常。示例3-10给出一个例子：

示例3-10 读出一个错误的column family

List<Get>
gets = new ArrayList<Get>();

Get get1 = new
Get(row1);

get1.addColumn(cf1,
qf1);

gets.add(get1);

Get get2 = new
Get(row2);

get2.addColumn(cf1,
qf1);

gets.add(get2);

Get get3 = new
Get(row2);

get3.addColumn(cf1,
qf2);

gets.add(get3);

Get get4 = new Get(row2);

get4.addColumn(Bytes.toBytes(“BOGUS”),
qf2);

gets.add(get4);

Result[] results =
table.get(gets);

System.out.println(“Result count: ” + results.length);

上述代码首先将Get实例插入到一个List中，其中一个Get实例指定了一个虚假的column family值。因此，一个异常将会被抛出，最后的打印记录永远不会输出来。执行上述代码，将会得到一个如下的异常：

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:

Failed 1 action:
NoSuchColumnFamilyException: 1 time,

servers with
issues: 10.0.0.57:51640,

batch()是一种更有控制力的API，它能处理部分出错的情况。在后文的批量操作部分将会介绍到这个API。

相关解析数据方法

很多的函数都可以取出服务器返回的结果数据，先介绍下面这个：

boolean exists(Get
get) throws IOException

该函数需先创建一个Get对象，它并不从服务器上取Get对象对应的结果数据，而是要求服务器返回是否存在对应记录的标志。

Resion Server使用相同的处理流程来处理exists请求，包括加载文件块来确定是否含对指定的column或rowkey。但数据并不经过网络传输。因此，对于一些大的column列，该方法还是很有用的。

有时，您或许想请一个确定的row，或者在它前面的row，并取出相应的数据。此时，您可以调用下面的接口：

Result
getRowOrBefore(byte[] row, byte[] family) throws IOException

您必须指定相应的row和一个column family，当然column family不是必需的。结果返回您要找的row，或者是位于它前面的一个。如果一个也没有找到，该方法返回null。示例3-11给出如何查询前面put示例中插入的数据。

示例3-11 使用一个专门的读取方法

Result result1 =
table.getRowOrBefore(Bytes.toBytes(“row1″),

Bytes.toBytes(“colfam1″));

System.out.println(“Found:
” + Bytes.toString(result1.getRow()));

Result result2 =
table.getRowOrBefore(Bytes.toBytes(“row99″),

Bytes.toBytes(“colfam1″));

System.out.println(“Found:
” + Bytes.toString(result2.getRow()));

for (KeyValue kv :
result2.raw()) {

System.out.println(”
Col: ” + Bytes.toString(kv.getFamily()) +

“/”
+ Bytes.toString(kv.getQualifier()) +

“,
Value: ” + Bytes.toString(kv.getValue()));

}

Result result3 =
table.getRowOrBefore(Bytes.toBytes(“abc”),

Bytes.toBytes(“colfam1″));

System.out.println(“Found:
” + result3);

示例中首先去寻找一个存在的row，并打印出找到的记录。然后试图去寻找一个不存在的row，打印出了表的最后一行记录。最后，去寻找一个排在row1前的“abc”，将会打印出来null。

上述代码的输出如下：

Found: row1

Found: row2

Col:
colfam1/qual1, Value: val2

Col:
colfam1/qual2, Value: val3

Found: null

第一次调用直接请求一个已经存在的记录会返回这条记录。第二次调用请求了一个很大的row值“row99”，它显然排在row2的后面，因此调用getRowOrBefore会得到row2的值。最后请求“abc”，按字母序，abc排在row1的前面。因此，调用getRowOrBefore会得到null。在循环打印第二次请求“row99”的结果时，可以发现row2对应的所有列均被取出，并按照qualifier的顺序进行排列。

Delete操作

您现在已经可以创建、读取和更新HBase表中的记录了，下面将介绍删除操作。您大概也可以猜出来，HTable对象一定提供了一个类叫做Delete。

单条删除

delete()方法有一种单条删除接口形式，如下：

void delete(Delete
delete) throws IOException

与get、put调用相似，该调用首先要创建一个Delete实例，并将要删除的详细信息添加到这个实例中。Detele的创建函数如下：

Delete(byte[] row)

Delete(byte[] row,
long timestamp, RowLock rowLock)

您必须提供要删除row的rowkey，或者额外提供一个RowLock。这和前面的put、get单条操作都是完全相同的。有时，您或许不想将rowkey对应的整行删除，您只想删除某个版本的记录，或者是某个column family、某个column
qualifier下的记录，您可以调用如下的方法：

Delete
deleteFamily(byte[] family)

Delete
deleteFamily(byte[] family, long timestamp)

Delete
deleteColumns(byte[] family, byte[] qualifier)

Delete
deleteColumns(byte[] family, byte[] qualifier, long timestamp)

Delete
deleteColumn(byte[] family, byte[] qualifier)

void
setTimestamp(long timestamp)

第一个函数可以删除指定的column family下的记录，当然也包括了它对应的所有的column。第二个函数删除指定的column family对应的timestamp以前的所有版本的数据。第三个函数删除指定的column family和指定的column
qualifier的记录。第四个函数在第三个函数的基础上，删除timestamp以前的所有版本。最后一个函数，可以对Delete对象设定时间戳，删除其以前的版本记录。第五个方法仅删除最后一个版本的数据。表3-5以表格的形式给出了几个delete函数及其用法，更加直观。

表3-5 delete()调用

Method	Description	Delete with timestamp
none	Entire row, i.e., all columns, all versions.	All versions of all columns in all column families, whose timestamp is equal to or older than the given timestamp.
deleteColumn()	Only the latest version of the given column; older versions are kept.	Only exactly the specified version of the given column, with the matching timestamp. If nonexistent, nothing is deleted.
deleteColumns()	All versions of the given column.	Versions equal to or older than the given timestamp of the given column.
deleteFamily()	All columns (including all versions) of the given family.	Versions equal to or older than the given timestamp of all columns of the given family.

表3-6列出了Delete类提供的另外一些方法。

表3-6
Delete类的另外几个方法

Method	Description
getRow()	Returns the row key as specified when creating the Delete instance.
getRowLock()	Returns the row RowLock instance for the current Delete instance.
getLockId()	Returns the optional lock ID handed into the constructor using the rowLock parameter. Will be -1L if not set.
getTimeStamp()	Retrieves the associated timestamp of the Delete instance.
isEmpty()	Checks if the family map contains any entries. In other words, if you specified any column family, or column qualifier, that should be deleted.
getFamilyMap()	Gives you access to the added column families and specific columns, as added by the deleteFamily() and/or deleteColumn()/deleteColumns() calls. The returned map uses the family name as the key, and the value it points to is a list of added column qualifiers for this particular family.

示例3-12给出了如何客户端如何调用一个单条删除方法。

示例3-12 HBase数据删除

Delete delete =
new Delete(Bytes.toBytes(“row1″));

delete.setTimestamp(1);

delete.deleteColumn(Bytes.toBytes(“colfam1″),
Bytes.toBytes(“qual1″), 1);

delete.deleteColumns(Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual1″));

delete.deleteColumns(Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual3″), 15);

delete.deleteFamily(Bytes.toBytes(“colfam3″));

delete.deleteFamily(Bytes.toBytes(“colfam3″),
3);

table.delete(delete);

table.close();

首先创建一个指向row1的Delete对象，为该Delete对象设定timestamp值为1。然后调用delete删除Column的接口，并指定一个版本。然后删除另一列colfam2：qual1下的所有版本，然后删除该列colfam2:qual2下指定版本以前的所有版本。接着删除colfam3下整个family，包含所有版本。再删除colfam3下版本3以前的所有版本。最后执行删除操作。

在上面的示例给出了您可以使用的delete操作的所有函数形式。您可以依次执行并观察它们的输出。

为Delete对象设定timestamp，使用deleteColumn接口只能删除特定的cell，使用deleteColumns接口可以删除精确的timestamp匹配得到比这个时间戳旧的所有版本（包括相等的）。

批量删除

批量删除操作同批量插入很相似，您需要创建一个Delete实例的列表，然后通过delete方法删除它们。

void
delete(List<Delete> deletes) throws IOException

示例3-13给出了一个批量删除的例子。当您运行这个例子时，将会打印删除前后delete的状态。打印原生的KeyValue对象，可以使用KeyValue.toString()方法（前面介绍过该方法）。

同前面介绍的批量操作相类，您不能保证服务器端收到的Delete实例的顺序。API可能会打乱它们的顺序，进行重排。如果您很在意一组Delete实例执行的前后顺序，您必须把它们分成小的队列，按照您所期望的顺序依次执行它们。最坏的情况便是，您一条一条的进行删除操作。

示例3-13 批量删除

List<Delete>
deletes = new ArrayList<Delete>();

Delete delete1 =
new Delete(Bytes.toBytes(“row1″));

delete1.setTimestamp(4);

deletes.add(delete1);

Delete delete2 =
new Delete(Bytes.toBytes(“row2″));

delete2.deleteColumn(Bytes.toBytes(“colfam1″),
Bytes.toBytes(“qual1″));

delete2.deleteColumns(Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual3″), 5);

deletes.add(delete2);

Delete delete3 =
new Delete(Bytes.toBytes(“row3″));

delete3.deleteFamily(Bytes.toBytes(“colfam1″));

delete3.deleteFamily(Bytes.toBytes(“colfam2″),
3);

deletes.add(delete3);

table.delete(deletes);

table.close();

首先创建了一个Delete对象的列表，然后为它们设置timestamp，删除一个指定的版本。然后删除colfam2:qual3下等于或小于版本5的所有数据。delete3删除row3下的整个colfam1，删除colfam2下等于或旧于版本3的所有记录。最后通过HTable执行批量删除。

下面给出了HBase删除数据前后对应的数据：

Before delete
call…

KV:
row1/colfam1:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam1:qual1/1/Put/vlen=4, Value: val1

KV:
row1/colfam1:qual2/4/Put/vlen=4, Value: val4

KV: row1/colfam1:qual2/3/Put/vlen=4,
Value: val3

KV:
row1/colfam1:qual3/6/Put/vlen=4, Value: val6

KV:
row1/colfam1:qual3/5/Put/vlen=4, Value: val5

KV:
row1/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam2:qual1/1/Put/vlen=4, Value: val1

KV: row1/colfam2:qual2/4/Put/vlen=4,
Value: val4

KV:
row1/colfam2:qual2/3/Put/vlen=4, Value: val3

KV:
row1/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row1/colfam2:qual3/5/Put/vlen=4, Value: val5

KV:
row2/colfam1:qual1/2/Put/vlen=4, Value: val2

KV:
row2/colfam1:qual1/1/Put/vlen=4, Value: val1

KV:
row2/colfam1:qual2/4/Put/vlen=4, Value: val4

KV:
row2/colfam1:qual2/3/Put/vlen=4, Value: val3

KV:
row2/colfam1:qual3/6/Put/vlen=4, Value: val6

KV:
row2/colfam1:qual3/5/Put/vlen=4, Value: val5

KV:
row2/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row2/colfam2:qual1/1/Put/vlen=4, Value: val1

KV:
row2/colfam2:qual2/4/Put/vlen=4, Value: val4

KV:
row2/colfam2:qual2/3/Put/vlen=4, Value: val3

KV:
row2/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row2/colfam2:qual3/5/Put/vlen=4, Value: val5

KV:
row3/colfam1:qual1/2/Put/vlen=4, Value: val2

KV:
row3/colfam1:qual1/1/Put/vlen=4, Value: val1

KV:
row3/colfam1:qual2/4/Put/vlen=4, Value: val4

KV:
row3/colfam1:qual2/3/Put/vlen=4, Value: val3

KV:
row3/colfam1:qual3/6/Put/vlen=4, Value: val6

KV: row3/colfam1:qual3/5/Put/vlen=4,
Value: val5

KV:
row3/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row3/colfam2:qual1/1/Put/vlen=4, Value: val1

KV:
row3/colfam2:qual2/4/Put/vlen=4, Value: val4

KV:
row3/colfam2:qual2/3/Put/vlen=4, Value: val3

KV: row3/colfam2:qual3/6/Put/vlen=4,
Value: val6

KV:
row3/colfam2:qual3/5/Put/vlen=4, Value: val5

After delete
call…

KV:
row1/colfam1:qual3/6/Put/vlen=4, Value: val6

KV:
row1/colfam1:qual3/5/Put/vlen=4, Value: val5

KV:
row1/colfam2:qual3/6/Put/vlen=4, Value: val6

KV: row1/colfam2:qual3/5/Put/vlen=4,
Value: val5

KV:
row2/colfam1:qual1/1/Put/vlen=4, Value: val1

KV:
row2/colfam1:qual2/4/Put/vlen=4, Value: val4

KV:
row2/colfam1:qual2/3/Put/vlen=4, Value: val3

KV:
row2/colfam1:qual3/6/Put/vlen=4, Value: val6

KV: row2/colfam1:qual3/5/Put/vlen=4,
Value: val5

KV:
row2/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row2/colfam2:qual1/1/Put/vlen=4, Value: val1

KV:
row2/colfam2:qual2/4/Put/vlen=4, Value: val4

KV:
row2/colfam2:qual2/3/Put/vlen=4, Value: val3

KV:
row2/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row3/colfam2:qual2/4/Put/vlen=4, Value: val4

KV:
row3/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row3/colfam2:qual3/5/Put/vlen=4, Value: val5

可以执行如下的代码，打印出上述的结果：

System.out.println(“KV:
” + kv.toString() +

“,
Value: ” + Bytes.toString(kv.getValue()))

现在您应该很熟悉Bytes类的使用方法，特别是对KeyValue结构的处理。KeyValue.toString()方法只会打印出key部分的值，而并不会打印出对应的value值（因为value值可以非常大）。在示例中，由于表中的value值，是我们自己插入的，我们知道它的长度，因此可以放心用终端打印出来，在您的应用中，也可以用类似的办法进行调试。

如果从本书所配的代码库整体来看，您便会明白数据是如何入库和读取的。下面将讨论批量删除操作的出错处理。delete方法接收的一组Delete对象，当服务器端删除出错时，将删除出错的Delete对象依然通过该参数返回到客户端。

示例3-14 从HBase上删除错误的记录

Delete delete4 =
new Delete(Bytes.toBytes(“row2″));

delete4.deleteColumn(Bytes.toBytes(“BOGUS”),
Bytes.toBytes(“qual1″));

deletes.add(delete4);

try {

table.delete(deletes);

} catch (Exception
e) {

System.err.println(“Error:
” + e);

}

table.close();

System.out.println(“Deletes
length: ” + deletes.size());

for (Delete delete
: deletes) {

System.out.println(delete);

}

首先创建一个指向一个不存在列的Delete对象，然后在删除该对象时，会得到一个异常。查看返回的List的大小，然后打印出删除出错的Delete对象。

示例3-14在示例3-13的基础上添加了一个出错的delete对象，可以得到类似3-13的输出结果，但会有几条增加的输出：

Before delete
call…

KV:
row1/colfam1:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam1:qual1/1/Put/vlen=4, Value: val1

…

KV:
row3/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row3/colfam2:qual3/5/Put/vlen=4, Value: val5

Error:
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:

Failed
1 action: NoSuchColumnFamilyException: 1 time,

servers
with issues: 10.0.0.43:59057,

Deletes length: 1

row=row2, ts=9223372036854775807,
families={(family=BOGUS, keyvalues= \

(row2/BOGUS:qual1/9223372036854775807/Delete/vlen=0)}

After delete
call…

KV:
row1/colfam1:qual3/6/Put/vlen=4, Value: val6

KV:
row1/colfam1:qual3/5/Put/vlen=4, Value: val5

…

KV: row3/colfam2:qual3/6/Put/vlen=4,
Value: val6

KV:
row3/colfam2:qual3/5/Put/vlen=4, Value: val5

正如所期望的一样，队列会返回一个Delete实例，并指向“BOGUS”对应的column family。我们使用toString()方法打印出出错的一行。Family名字是出错的根本原因。您在应用程序中也可以使用这种方法来判断删除操作出错的行，一般情况下，删除出错都是因为这个原因。

最后，出错抛出的异常和前面例子中的异常类似，RetriesExhaustedWithDetailsException已经是第二次看到了。它打印出了出错时重做的次数。后面的章节将会讲到如何监控monitor server，因此，异常中返回的服务器IP是非常有用的。

原子性的compare-and-delete

在前面的“原子性的compare-and-set”一节中，讲到了如使用原子性的比较、插入数据到HBase中的表中。对于删除操作也给出了一个类似的方法，以原子的形式，在服务器端进行比较，满足条件后进行删除：

boolean
checkAndDelete(byte[] row, byte[] family, byte[] qualifier,

byte[]
value, Delete delete) throws IOException

上述方法需要指定rowkey、column family、qualifier和value来检查，如果有满足条件的值，则执行delete。如果检查失败，则delete不执行。当delete成功执行后返回true。示例3-15给出了一个使用的例子：

示例3-15 原子例的compare-and-set删除操作

Delete delete1 =
new Delete(Bytes.toBytes(“row1″));

delete1.deleteColumns(Bytes.toBytes(“colfam1″),
Bytes.toBytes(“qual3″));

boolean res1 =
table.checkAndDelete(Bytes.toBytes(“row1″),

Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual3″), null, delete1);

System.out.println(“Delete
successful: ” + res1);

Delete delete2 =
new Delete(Bytes.toBytes(“row1″));

delete2.deleteColumns(Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual3″));

table.delete(delete2);

boolean res2 = table.checkAndDelete(Bytes.toBytes(“row1″),

Bytes.toBytes(“colfam2″),
Bytes.toBytes(“qual3″), null, delete1);

System.out.println(“Delete
successful: ” + res2);

Delete delete3 =
new Delete(Bytes.toBytes(“row2″));

delete3.deleteFamily(Bytes.toBytes(“colfam1″));

try{

boolean
res4 = table.checkAndDelete(Bytes.toBytes(“row1″),

Bytes.toBytes(“colfam1″),
Bytes.toBytes(“qual1″),

Bytes.toBytes(“val1″),
delete3);

System.out.println(“Delete
successful: ” + res4);

} catch (Exception
e) {

System.err.println(“Error:
” + e);

}

首先创建了一个Delete实例，指向row1对应的colfam1:qual3列。然后检查是否存在rowkey为row1，具含有colfam2:qual3列，不存该列，则执行Delete实例，并打印出执行的结果。由于检查的列是存在的，运行会打印出来“Delete
successful: false”。手工要检查的列后，再执行checkAndDelete，成功执行，打印出“Delete successful:
true”。最后，尝试check和delete的rowkey不一致，则会抛出一个异常。整个代码执行的输出如下：

Before delete
call…

KV:
row1/colfam1:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam1:qual1/1/Put/vlen=4, Value: val1

KV:
row1/colfam1:qual2/4/Put/vlen=4, Value: val4

KV:
row1/colfam1:qual2/3/Put/vlen=4, Value: val3

KV: row1/colfam1:qual3/6/Put/vlen=4,
Value: val6

KV:
row1/colfam1:qual3/5/Put/vlen=4, Value: val5

KV:
row1/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam2:qual1/1/Put/vlen=4, Value: val1

KV:
row1/colfam2:qual2/4/Put/vlen=4, Value: val4

KV: row1/colfam2:qual2/3/Put/vlen=4,
Value: val3

KV:
row1/colfam2:qual3/6/Put/vlen=4, Value: val6

KV:
row1/colfam2:qual3/5/Put/vlen=4, Value: val5

Delete successful:
false

Delete successful:
true

After delete
call…

KV:
row1/colfam1:qual1/2/Put/vlen=4, Value: val2

KV: row1/colfam1:qual1/1/Put/vlen=4,
Value: val1

KV:
row1/colfam1:qual2/4/Put/vlen=4, Value: val4

KV:
row1/colfam1:qual2/3/Put/vlen=4, Value: val3

KV:
row1/colfam2:qual1/2/Put/vlen=4, Value: val2

KV:
row1/colfam2:qual1/1/Put/vlen=4, Value: val1

KV: row1/colfam2:qual2/4/Put/vlen=4,
Value: val4

KV:
row1/colfam2:qual2/3/Put/vlen=4, Value: val3

Error:
org.apache.hadoop.hbase.DoNotRetryIOException:

org.apache.hadoop.hbase.DoNotRetryIOException:

Action’s getRow
must match the passed row

…

使用null作为check操作的value，值不存在则check成功。由于在执行前check的值已经存在了，所以第一次check返回了false，delete操作未执行。而后手工删除了要check的值，再执行checkAndDelete时，check成功，delete操作被成功执行。

这和前面介绍的put操作对应的CAS调用是一样的，您只能在同一行上执行check-and-modify操作，若行不一致，将会抛出一个异常。当然，check-and-modify操作允许check和modify操作的是同一行中两个不同的column family。

示例没有说明check-and-modify操作有多么重要，在分布式系统中，在不加锁、不损失性能的情况下保证操作的可靠性和原子性是很难的。因此，客户端使用了一个排它锁占有整个行。若客户端在加锁的阶段出现了故障了，集群必须要保证被加锁的行要重新unlock。在check-and-modify操作时，会出现额外的RPC调用，因此，肯定会比单独的服务器端操作慢一些。

上一部分：HBase权威指南(中文版)——第三章(第一部分)

下一部分：HBase权威指南(中文版)——第三章(第三部分)