The read latency of random Optane DC memory loads is 305 ns This latency is about 3× slower than local DRAM
Optane DC memory latency is significantly better (2×) when accessed in a sequential pattern. This result indicates that Optane DC PMMs merge adjacent requests into a single 256 byte access
Our six interleaved Optane DC PMMs’ maximum read bandwidth is 39.4 GB/sec, and their maximum write bandwidth is 13.9 GB/sec. This experiment utilizes our six interleaved Optane DC PMMs, so accesses are spread across the devices
Optane DC reads scale with thread count; whereas writes do not. Optane DC memory bandwidth scales with thread count, achieving maximum throughput at 17 threads. However, four threads are enough to saturate Optane DC memory write bandwidth
The application-level Optane DC bandwidth is affected by access size. To fully utilize the Optane DC device bandwidth, 256 byte or larger accesses are preferred
Optane DC is more affected than DRAM by access patterns. Optane DC memory is vulnerable to workloads with mixed reads and writes
Optane DC bandwidth is significantly higher (4×) when accessed in a sequential pattern. This result indicates that Optane DC PMMs contain access to merging logic to merge overlapping memory requests — merged, sequential, accesses do not pay the write amplification cost associated with the NVDIMM’s 256 byte access size
如果你正在开发针对 PMem 的程序,一定要仔细理解。
PMem 的好处当然很多,延迟低、峰值带宽高、按字节访问,这没什么好说的,毕竟是 DIMM 接口,价格也在那里摆着。
这里我想给大家提几个 PMem 的坑,这可能是很多测试报告里面不会提到的,而是我们亲身经历过的,供大家参考:
尽管峰值带宽高,但单线程吞吐率低,甚至远比不上通过 SPDK 访问 NVMe 设备。举例来说,Intel 曾发布过一个报告,利用 SPDK,在 Block Size 为 4KB 的情况下,单线程可以达到 10 Million 的 IOPS。而根据我们测试的结果,在 PMem 上,在 Block Size 为 4KB 时,单线程只能达到 1 Million 的 IOPS。这是由于目前 PMDK 统一采用同步的方式访问数据,所有内存拷贝都由 CPU 来完成,导致 CPU 成为了性能瓶颈。为了解决这个问题,我们采用了 DMA 的方式进行访问,极大的提高了单线程访问的效率。关于这块信息,我们将在未来用单独的文章进行讲解。