Libbson

php中文网
发布: 2016-06-07 16:30:56
原创
1143人浏览过

libbson is a new shared library written in c for developers wanting to work with the bson serialization format. Its API will feel natural to C programmers but can also be used as the base of a C extension in higher-level MongoDB drivers. T

libbson is a new shared library written in c for developers wanting to work with the bson serialization format.

Its API will feel natural to C programmers but can also be used as the base of a C extension in higher-level MongoDB drivers.

The library contains everything you would expect from a BSON implementation. It has the ability to work with documents in their serialized form, iterating elements within a document, overwriting fields in place, Object Id generation, JSON conversion, data validation, and more. Some lessons were learned along the way that are beneficial for those choosing to implement BSON themselves.

Improving small document performance

A common use case of BSON is for relatively small documents. This has a profound impact on the memory allocator in userspace, causing what is commonly known as “memory fragmentation”. Memory fragmentation can make it more difficult for your allocator to locate a contiguous region of memory.

In addition to increasing allocation latency, it increases the memory requirements of your application to overcome that fragmentation.

To help with this issue, the bson_t structure contains 120 bytes of inline space that allows BSON documents to be built directly on the stack as opposed to the heap.

When the document size grows past 120 bytes it will automatically migrate to a heap allocation.

Additionally, bson_t will grow it’s buffers in powers of two. This is standard when working with buffers and arrays as it amortizes the overhead of growing the buffer versus calling realloc() every time data is appended. 120 bytes was chosen to align bson_t to the size of two sequential cachelines on x86_64 (each 64 bytes).

This may change based on future research, but not before a stable ABI has been reached.

Single allocation for nested documents

One strength of BSON is it’s ability to nest objects and arrays. Often times when serializing these nested documents, each sub-document is serialized independently and then appended to the parents buffer.

As you might imagine, this takes quite the toll on the allocator. It can generate many small allocations which were only created to have been immediately discarded after appending to the parents buffer. Libbson allows for building sub-documents directly into the parent documents buffer.

Doing so helps avoid this costly fragmentation. The topmost document will grow its underlying buffers in powers of two each time the allocation would overflow.

Parsing BSON documents from network buffers

Another common area for allocator fragmentation is during BSON document parsing. Libbson allows parsing and iteration of BSON documents directly from your incoming network buffer.

This means the only allocations created are those needed for your higher level language such as a PyDict if writing a Python extension.

Developers writing C extensions for their driver may choose to implement a “generator” style parsing of documents to help keep memory fragmentation low.

A technique we’re yet to explore is implementing a hashtable-esque structure backed by BSON, only deserializing the entire buffer after a threshold of keys have been accessed.

Generating BSON documents into network buffers

Much like parsing BSON documents, generating documents and placing them into your network buffers can be hard on your memory allocator. To help keep this fragmentation down, Libbson provides support for serializing your document to BSON directly within a buffer of your choosing.

This is ideal for situations such as writing a sequence of BSON documents into a MongoDB message.

Generating Object Ids without Synchronization

Applications are often doing ObjectId generation, especially in high insert environments. The uniqueness of generated ObjectIds is critical to avoiding duplicate key errors across multiple nodes.

Highly threaded environments create a local contention point slowing the rate of generation. This is because the threads must synchronize on the increment counter of each sequential ObjectId. Failure to do so could cause collisions that would not be detected until after a network round-trip. Most drivers implement the synchronization with an atomic increment or a mutex if atomics are not available.

Libbson will use atomic increments and in some cases avoid synchronization altogether if possible. One such case is a non-threaded environment.

Another is when running on Linux as both threads and processes are in the same namespace.

This allows the use of the thread identifier as the pid within the ObjectId.

You can find Libbson at https://github.com/mongodb/libbson and discuss design choices with its author, Christian Hergert, who can be found on twitter as @hergertme.

最佳 Windows 性能的顶级免费优化软件
最佳 Windows 性能的顶级免费优化软件

每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。

下载
来源:php中文网
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
最新问题
开源免费商场系统广告
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责申明 意见反馈 讲师合作 广告合作 最新更新
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送
PHP中文网APP
随时随地碎片化学习
PHP中文网抖音号
发现有趣的

Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号