thrift 编码解码详解
thrift结构体按照一定格式编码成字符串,供网络传输;通常上层会对解码和编码的调用做封装,把object编码为string,从string解码object:
boost::shared_ptr<TMemoryBuffer> mem_buffer(new TMemoryBuffer());
mem_buffer->resetBuffer(
reinterpret_cast<uint8_t*>(const_cast<char*>(buffer.c_str())),
buffer.size());
TBinaryProtocol protocol(mem_buffer);
data.read(&protocol);
boost::shared_ptr<TMemoryBuffer> mem_buffer(new TMemoryBuffer());
TBinaryProtocol protocol(mem_buffer);
data.write(&protocol);
buffer = mem_buffer->getBufferAsString();
以下面的这个结构体为例:
struct Object
{
1:i64 a;
2:double b;
3:optional binary c;
4:bool d;
5:optional list<i32> e;
}
thrift编译器生成的class有read和write方法,分别用来解码和编码对象:
uint32_t read(::apache::thrift::protocol::TProtocol* iprot);
uint32_t write(::apache::thrift::protocol::TProtocol* oprot) const;
先看如何把一个object编码成TProtocol:
uint32_t Object::write(::apache::thrift::protocol::TProtocol* oprot) const {
uint32_t xfer = 0;
oprot->incrementRecursionDepth(); // 防止栈溢出
xfer += oprot->writeStructBegin("Object"); // 什么都不做
xfer += oprot->writeFieldBegin("a", ::apache::thrift::protocol::T_I64, 1); // "a"这个参数没用,看下面的writeFieldBegin
xfer += oprot->writeI64(this->a); // 二进制打包到oprot
xfer += oprot->writeFieldEnd(); // 什么都不做
if (this->__isset.c) {
...
}
if (this->__isset.e) {
... 遍历列表,写入
}
xfer += oprot->writeFieldStop(); // 写入T_STOP = 0
xfer += oprot->writeStructEnd(); // 什么都不做
oprot->decrementRecursionDepth(); // 栈计数减少
return xfer;
}
template <class Transport_, class ByteOrder_>
uint32_t TBinaryProtocolT<Transport_, ByteOrder_>::writeFieldBegin(const char* name,
const TType fieldType,
const int16_t fieldId) {
(void)name;
uint32_t wsize = 0;
wsize += writeByte((int8_t)fieldType);
wsize += writeI16(fieldId);
return wsize;
}
Object(100, 34.1, "0123456789", false,vector<int>(3,4))
编码后的16进制如下。每个字段,需要1byte的类型,2byte的field_id,value本身(下面的-为了方便阅读才加的)。
0A00010000000000000064-04000240410CFFFFFFCCFFFFFFCCFFFFFFCCFFFFFFCCFFFFFFCD-0B00030000000A30313233343536373839-02000400-0F00050800000003000000040000000400000004-00
- T_I64类型是10(0x0A),field_id是1(0x0001),值是100(0x0000000000000064)
- T_DOUBLE类型是4(0x04),field_id是2(0x0002),值是34.1,二进制编码(一大堆,double双精度似乎占23个字节)
- T_STRING类型是11(0x0B),field_id是3(0x0003),值是"0123456789",10(0x0000000A)位长的字符串.
- T_BOOL类型是2(0x02),field_id是4(0x0004),值是false(0x00)
- T_LIST类型是15(0x0F),field_id是5(0x0005),有3个元素,都是(00000004),
- 最后以0x00结尾。
再看read方法,会好理解很多:
uint32_t Object::read(::apache::thrift::protocol::TProtocol* iprot) {
xfer += iprot->readStructBegin(fname); // do nothing
while (true)
{
xfer += iprot->readFieldBegin(fname, ftype, fid);
if (ftype == ::apache::thrift::protocol::T_STOP) {
break;
}
switch (fid)
{
case 1:
if (ftype == ::apache::thrift::protocol::T_I64) {
xfer += iprot->readI64(this->a);
this->__isset.a = true;
} else {
xfer += iprot->skip(ftype);
}
break;
... 省略几项
case 5:
if (ftype == ::apache::thrift::protocol::T_LIST) {
{
this->e.clear();
uint32_t _size0;
::apache::thrift::protocol::TType _etype3;
xfer += iprot->readListBegin(_etype3, _size0);
this->e.resize(_size0);
uint32_t _i4;
for (_i4 = 0; _i4 < _size0; ++_i4)
{
xfer += iprot->readI32(this->e[_i4]);
}
xfer += iprot->readListEnd();
}
this->__isset.e = true;
}
xfer += iprot->readFieldEnd();
}
xfer += iprot->readStructEnd();
return xfer;
}
编码解码不复杂,最关键的点就是field_id,这是一定不能改的,就算缺了某些字段名,或者增删字段,只要field_id兼容,也不会影响解析。
再说optional字段,增加必要字段,只需保证field_id不冲突,尽量不要用 optional。 对解析(read方法)来说,optional存不存在都无所谓,而在编码(write)的时候,如果用a.b=c而不是set方法,反而会丢掉这个字段,得不偿失。
修改thrift文件不是洪水猛兽,没必要每次新增字段都加optional。