fio

发表于 2022-10-02

引子

最近在阅读Kafka的文档时有这么一段话：7200rpm的STAT硬盘的顺序写的带宽可以达到600MB/s，而随机写只有100kB/s，差距为6000倍

The key fact about disk performance is that the throughput of hard drives has been diverging from the latency of a disk seek for the last decade. As a result the performance of linear writes on a JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec but the performance of random writes is only about 100k/sec—a difference of over 6000X. These linear reads and writes are the most predictable of all usage patterns, and are heavily optimized by the operating system. A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes. A further discussion of this issue can be found in this ACM Queue article; they actually find that sequential disk access can in some cases be faster than random memory access!

FIO数据对比

环境准备

组件	版本
ubuntu	16.04
fio	2.2.10
硬盘	ST1000DM010-2EP102 （7200 rpm）（1TB）

实验数据

1. 同步顺序写(DIRECT)

采用fio作为测试工具，strace观察系统调用。这里采取的是O_DIRECT的方式来屏蔽pagecache，直接测试硬盘的性能。

strace -f -tt -o /tmp/write.log -D /usr/bin/fio --name=write_test --filename=write_test --bs=4k --size=4G --readwrite=write -direct=1

bw=53MB/s, iops=13.3k

write_test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
write_test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/45303KB/0KB /s] [0/11.4K/0 iops] [eta 00m:00s]
write_test: (groupid=0, jobs=1): err= 0: pid=9975: Sun Sep 18 20:35:12 2022
  write: io=4096.0MB, bw=53281KB/s, iops=13320, runt= 78720msec
    clat (usec): min=48, max=234668, avg=73.83, stdev=555.55
     lat (usec): min=48, max=234669, avg=74.11, stdev=555.56
    clat percentiles (usec):
     |  1.00th=[   51],  5.00th=[   52], 10.00th=[   52], 20.00th=[   53],
     | 30.00th=[   53], 40.00th=[   55], 50.00th=[   58], 60.00th=[   59],
     | 70.00th=[   64], 80.00th=[   67], 90.00th=[   83], 95.00th=[  125],
     | 99.00th=[  211], 99.50th=[  290], 99.90th=[  628], 99.95th=[  812],
     | 99.99th=[23936]
    bw (KB  /s): min= 2640, max=72216, per=100.00%, avg=53293.60, stdev=17445.27
    lat (usec) : 50=0.01%, 100=93.13%, 250=6.12%, 500=0.44%, 750=0.25%
    lat (usec) : 1000=0.01%
    lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%
  cpu          : usr=3.60%, sys=20.13%, ctx=3145866, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1


Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=53281KB/s, minb=53281KB/s, maxb=53281KB/s, mint=78720msec, maxt=78720msec

Disk stats (read/write):
  sda: ios=0/1045929, merge=0/681, ticks=0/65092, in_queue=64968, util=52.42%

2. 同步随机写(DIRECT)

strace -f -tt -o /tmp/rand_write.log -D /usr/bin/fio --name=randwrite_test --filename=randwrite_test --bs=4k --size=1G --readwrite=randwrite -direct=1
bw=938KB/s, iops=229
可以看出，随机写的性能直接掉到底了

randwrite_test: (groupid=0, jobs=1): err= 0: pid=28561: Sun Sep 18 16:05:04 2022
  write: io=53028KB, bw=938500B/s, iops=229, runt= 57859msec
    clat (usec): min=146, max=610291, avg=4252.29, stdev=20251.09
     lat (usec): min=146, max=610291, avg=4253.50, stdev=20251.11
    clat percentiles (usec):
     |  1.00th=[  235],  5.00th=[  322], 10.00th=[  394], 20.00th=[  996],
     | 30.00th=[ 1880], 40.00th=[ 2288], 50.00th=[ 2512], 60.00th=[ 2704],
     | 70.00th=[ 2928], 80.00th=[ 3248], 90.00th=[ 4048], 95.00th=[ 9664],
     | 99.00th=[39680], 99.50th=[62720], 99.90th=[419840], 99.95th=[528384],
     | 99.99th=[610304]
    bw (KB  /s): min=   13, max= 1912, per=100.00%, avg=950.38, stdev=400.21
    lat (usec) : 250=1.26%, 500=14.15%, 750=3.65%, 1000=0.97%
    lat (msec) : 2=12.08%, 4=57.65%, 10=5.49%, 20=2.08%, 50=1.87%
    lat (msec) : 100=0.55%, 250=0.08%, 500=0.12%, 750=0.05%
  cpu          : usr=0.31%, sys=2.70%, ctx=66374, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=13257/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=53028KB, aggrb=916KB/s, minb=916KB/s, maxb=916KB/s, mint=57859msec, maxt=57859msec

Disk stats (read/write):
  sda: ios=9/13798, merge=0/2448, ticks=1964/120220, in_queue=122120, util=93.91%

3. 同步顺序写（Buffer）

删除同步写中的--direct=1，则会使用到系统的pagecache。

strace -f -tt -o /tmp/writeb.log -D /usr/bin/fio --name=write_btest --filename=write_btest --bs=4k --size=4G --readwrite=write

bw=210MB/s, iops=52k
因此，性能直接提升了4倍

write_btest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
write_btest: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/209.9MB/0KB /s] [0/53.8K/0 iops] [eta 00m:00s]
write_btest: (groupid=0, jobs=1): err= 0: pid=3605: Sun Sep 18 17:39:47 2022
  write: io=4096.0MB, bw=210631KB/s, iops=52657, runt= 19913msec
    clat (usec): min=14, max=138101, avg=18.49, stdev=161.31
     lat (usec): min=14, max=138102, avg=18.57, stdev=161.31
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   17], 10.00th=[   17], 20.00th=[   17],
     | 30.00th=[   17], 40.00th=[   17], 50.00th=[   17], 60.00th=[   18],
     | 70.00th=[   18], 80.00th=[   18], 90.00th=[   19], 95.00th=[   20],
     | 99.00th=[   32], 99.50th=[   45], 99.90th=[   94], 99.95th=[  123],
     | 99.99th=[  398]
    bw (KB  /s): min=129976, max=221720, per=99.93%, avg=210473.85, stdev=17392.80
    lat (usec) : 20=92.31%, 50=7.30%, 100=0.29%, 250=0.08%, 500=0.01%
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (msec) : 100=0.01%, 250=0.01%
cpu          : usr=9.32%, sys=28.28%, ctx=2097220, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=210631KB/s, minb=210631KB/s, maxb=210631KB/s, mint=19913msec, maxt=19913msec

Disk stats (read/write):
  sda: ios=0/3023, merge=0/269, ticks=0/1938236, in_queue=1995224, util=74.12%

4. 同步随机写（Buffer）

strace -f -tt -o /tmp/rand_bwrite.log -D /usr/bin/fio --name=randwrite_btest --filename=randwrite_btest --bs=4k --size=1G --readwrite=randwrite

bw=115MB/s, iops=28k，同步随机写+pagecache的性能几乎不受随机写的影响，后面将详细介绍这个的原因。

randwrite_btest: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
randwrite_btest: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/111.6MB/0KB /s] [0/28.6K/0 iops] [eta 00m:00s]
randwrite_btest: (groupid=0, jobs=1): err= 0: pid=6334: Sun Sep 18 17:42:31 2022
  write: io=1024.0MB, bw=115993KB/s, iops=28998, runt=  9040msec
    clat (usec): min=15, max=3232, avg=18.65, stdev=13.70
     lat (usec): min=15, max=3235, avg=18.79, stdev=13.72
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   17], 10.00th=[   17], 20.00th=[   17],
     | 30.00th=[   17], 40.00th=[   18], 50.00th=[   18], 60.00th=[   18],
     | 70.00th=[   18], 80.00th=[   19], 90.00th=[   20], 95.00th=[   21],
     | 99.00th=[   32], 99.50th=[   44], 99.90th=[   89], 99.95th=[  114],
     | 99.99th=[  378]
    bw (KB  /s): min=103496, max=120216, per=99.96%, avg=115941.33, stdev=3889.19
    lat (usec) : 20=88.05%, 50=11.56%, 100=0.31%, 250=0.06%, 500=0.01%
    lat (usec) : 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%
  cpu          : usr=6.34%, sys=29.83%, ctx=1047354, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=115992KB/s, minb=115992KB/s, maxb=115992KB/s, mint=9040msec, maxt=9040msec

Disk stats (read/write):
  sda: ios=1/19, merge=0/91, ticks=24/308, in_queue=332, util=3.26%

5. 异步顺序写（DIRECT）

strace -f -tt -o /tmp/awrite.log -D /usr/bin/fio --name=write_atest --filename=write_atest --bs=4k --size=1G --readwrite=write --direct=1 --ioengine=libaio --iodepth=64

bw=91.2MB/s IOPS=22.8k
avg_req_sz=8, avg_queue_sz=3

write_atest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/98907KB/0KB /s] [0/24.8K/0 iops] [eta 00m:00s]
write_atest: (groupid=0, jobs=1): err= 0: pid=10341: Sun Sep 18 20:58:42 2022
  write: io=1024.0MB, bw=91292KB/s, iops=22822, runt= 11486msec
    slat (usec): min=17, max=12234, avg=23.52, stdev=61.15
    clat (usec): min=89, max=44052, avg=2778.77, stdev=2219.49
     lat (usec): min=114, max=44070, avg=2802.45, stdev=2225.84
    clat percentiles (usec):
     |  1.00th=[ 2288],  5.00th=[ 2352], 10.00th=[ 2352], 20.00th=[ 2384],
     | 30.00th=[ 2416], 40.00th=[ 2416], 50.00th=[ 2448], 60.00th=[ 2480],
     | 70.00th=[ 2576], 80.00th=[ 2640], 90.00th=[ 2832], 95.00th=[ 2992],
     | 99.00th=[12096], 99.50th=[18048], 99.90th=[36608], 99.95th=[42240],
     | 99.99th=[43776]
    bw (KB  /s): min=48376, max=104344, per=100.00%, avg=91593.09, stdev=16068.66
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=97.38%, 10=1.17%, 20=1.00%, 50=0.43%
  cpu          : usr=8.90%, sys=25.21%, ctx=1050271, majf=0, minf=12
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=91291KB/s, minb=91291KB/s, maxb=91291KB/s, mint=11486msec, maxt=11486msec

Disk stats (read/write):
  sda: ios=0/256033, merge=0/2488, ticks=0/22460, in_queue=22824, util=77.94%


Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00   180.00    0.00 24053.00     0.00    94.75     8.07     3.32    0.14    0.00    0.14   0.03
  84.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    61.00    0.00 24510.00     0.00    95.98     8.02     2.14    0.09    0.00    0.09   0.03
  73.60

6. 异步随机写（DIRECT）

strace -f -tt -o /tmp/rand_awrite.log -D /usr/bin/fio --name=randwrite_atest --filename=randwrite_atest --bs=4k --size=1G --readwrite=randwrite --direct=1 --ioengine=libaio --iodepth=64
bw=1MB/s, iops=248

fio-2.2.10
Starting 1 process
randwrite_atest: Laying out IO file(s) (1 file(s) / 1024MB)
^Cbs: 1 (f=1): [w(1)] [4.9% done] [0KB/1258KB/0KB /s] [0/314/0 iops] [eta 16m:48s]s]
fio: terminating on signal 2

randwrite_atest: (groupid=0, jobs=1): err= 0: pid=14846: Sun Sep 18 21:03:13 2022
  write: io=51728KB, bw=995.96KB/s, iops=248, runt= 51983msec
    slat (usec): min=23, max=532231, avg=274.92, stdev=5089.58
    clat (msec): min=1, max=1302, avg=256.93, stdev=194.37
     lat (msec): min=1, max=1303, avg=257.21, stdev=194.42
    clat percentiles (msec):
     |  1.00th=[    9],  5.00th=[   24], 10.00th=[   78], 20.00th=[  123],
     | 30.00th=[  281], 80.00th=[  375], 90.00th=[  537], 95.00th=[  668],
     | 99.00th=[  906], 99.50th=[  963], 99.90th=[ 1106], 99.95th=[ 1172],
        | 99.99th=[ 1303]
    bw (KB  /s): min=   88, max= 2138, per=100.00%, avg=1040.46, stdev=467.51
    lat (msec) : 2=0.02%, 4=0.20%, 10=1.52%, 20=2.66%, 50=4.04%
    lat (msec) : 100=5.58%, 250=50.22%, 500=23.85%, 750=8.47%, 1000=3.05%
    lat (msec) : 2000=0.38%
  cpu          : usr=0.51%, sys=3.17%, ctx=60541, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=12932/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=51728KB, aggrb=995KB/s, minb=995KB/s, maxb=995KB/s, mint=51983msec, maxt=51983msec

Disk stats (read/write):
  sda: ios=1/13257, merge=0/2653, ticks=460/3398116, in_queue=3400912, util=100.00%

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   100.00    1.00  301.00     0.00     1.56    10.60    63.82  271.62  460.00  270.99   3.31 100.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00   225.00    0.00  160.00     0.00     1.81    23.20    63.35  339.27    0.00  339.27   6.25 100.00

7. 异步顺序写（BUFFER）

strace -f -tt -o /tmp/a-bwrite.log -D /usr/bin/fio --name=write_abtest --filename=write_abtest --bs=4k --size=1G --readwrite=write --ioengine=libaio --iodepth=64

bw=110MB/s, iops=27k

Sun Sep 18 21:06:43 CST 2022
write_abtest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/109.7MB/0KB /s] [0/28.8K/0 iops] [eta 00m:00s]
write_abtest: (groupid=0, jobs=1): err= 0: pid=19904: Sun Sep 18 21:06:53 2022
  write: io=1024.0MB, bw=110353KB/s, iops=27588, runt=  9502msec
    slat (usec): min=16, max=5570, avg=19.10, stdev=15.72
    clat (usec): min=75, max=13502, avg=2273.02, stdev=405.49
     lat (usec): min=93, max=13522, avg=2292.26, stdev=408.06
    clat percentiles (usec):
     |  1.00th=[ 2096],  5.00th=[ 2160], 10.00th=[ 2160], 20.00th=[ 2192],
     | 30.00th=[ 2192], 40.00th=[ 2224], 50.00th=[ 2224], 60.00th=[ 2256],
     | 70.00th=[ 2256], 80.00th=[ 2256], 90.00th=[ 2352], 95.00th=[ 2480],
     | 99.00th=[ 2800], 99.50th=[ 3216], 99.90th=[ 9152], 99.95th=[11456],
     | 99.99th=[12992]
    bw (KB  /s): min=86312, max=114144, per=99.98%, avg=110333.89, stdev=6024.11
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.03%, 4=99.57%, 10=0.31%, 20=0.08%
  cpu          : usr=9.77%, sys=24.76%, ctx=1048469, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=110353KB/s, minb=110353KB/s, maxb=110353KB/s, mint=9502msec, maxt=9502msec

Disk stats (read/write):
  sda: ios=0/41, merge=0/29, ticks=0/172, in_queue=172, util=1.84%
Sun Sep 18 21:06:53 CST 2022

09/18/2022 09:06:50 PM                                                                                    [60/1211]
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00
  0.00

09/18/2022 09:06:51 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00
  0.00


09/18/2022 09:06:54 PM                                                                                    [36/1211]
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    14.00    0.00  128.00     0.00    20.54   328.69    28.26   45.22    0.00   45.22   1.72
 22.00

09/18/2022 09:06:55 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    15.00    0.00  281.00     0.00   136.46   994.59   137.54  389.08    0.00  389.08   3.56
100.00

8. 异步随机写（BUFFER）

strace -f -tt -o /tmp/rand_abwrite.log -D /usr/bin/fio --name=randwrite_abtest --filename=randwrite_abtest --bs=4k --size=1G --readwrite=randwrite --ioengine=libaio --iodepth=64

bw=82MB/s, iops=20k

Sun Sep 18 21:10:22 CST 2022
randwrite_abtest: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
randwrite_abtest: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [92.3% done] [0KB/100.9MB/0KB /s] [0/25.9K/0 iops] [eta 00m:01s]
randwrite_abtest: (groupid=0, jobs=1): err= 0: pid=23631: Sun Sep 18 21:10:35 2022
  write: io=1024.0MB, bw=82964KB/s, iops=20740, runt= 12639msec
    slat (usec): min=17, max=117863, avg=29.43, stdev=389.30
    clat (usec): min=90, max=275442, avg=3054.22, stdev=6057.16
     lat (usec): min=109, max=275818, avg=3083.81, stdev=6120.28
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    3],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    3],
     | 70.00th=[    3], 80.00th=[    3], 90.00th=[    3], 95.00th=[    4],
     | 99.00th=[   24], 99.50th=[   36], 99.90th=[   75], 99.95th=[   82],
     | 99.99th=[  273]
    bw (KB  /s): min= 5859, max=107656, per=100.00%, avg=83332.52, stdev=38980.53
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=97.06%, 10=0.81%, 20=0.61%, 50=1.24%
    cpu          : usr=7.35%, sys=23.67%, ctx=1052914, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=82963KB/s, minb=82963KB/s, maxb=82963KB/s, mint=12639msec, maxt=12639msec

Disk stats (read/write):
  sda: ios=0/6667, merge=0/6056, ticks=0/5484, in_queue=5480, util=22.75%
Sun Sep 18 21:10:35 CST 2022

09/18/2022 09:10:30 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00  1665.00    0.00 1782.00     0.00    33.20    38.16     1.57    0.89    0.00    0.89   0.54
 95.60

09/18/2022 09:10:31 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00  1909.00    0.00 2098.00     0.00    39.14    38.20     0.76    0.36    0.00    0.36   0.32
 66.80

09/18/2022 09:10:32 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00
  0.00

  09/18/2022 09:10:35 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    17.00    0.00    2.00     0.00     0.07    76.00     0.02   12.00    0.00   12.00  12.00
  2.40

09/18/2022 09:10:36 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    10.00    0.00   60.00     0.00     8.87   302.67     2.31    8.87    0.00    8.87   0.60
  3.60

09/18/2022 09:10:37 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    14.00    0.00  358.00     0.00   180.20  1030.84   142.27  317.41    0.00  317.41   2.79
100.00
09/18/2022 09:10:38 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00     7.00    0.00  239.00     0.00   119.79  1026.48   143.82  408.74    0.00  408.74   4.18
100.00

09/18/2022 09:10:39 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    19.00    0.00  311.00     0.00   153.39  1010.11   145.40  595.92    0.00  595.92   3.22
100.00

09/18/2022 09:10:40 PM
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm
 %util
sda               0.00    14.00    0.00  357.00     0.00   179.40  1029.15   143.63  428.44    0.00  428.44   2.80
100.00

数据总结

模式	direct+sync	buffer+sync	direct + libaio	buffer + libaio
write	bw=53MB/s, iops=13k	bw=210MB/s, iops=52k	bw=91MB/s, iops=22k	bw=110MB/s, iops=27k
randwrite	bw=938KB/s, iops=229	bw=115MB/s, iops=28k	bw=1MB/s, iops=248	bw=82MB/s, iops=20k

数据分析：为何randwrite+buffer+sync的iops可以是20k+

原因在于操作系统欺骗了应用，在pagecache下，内核层的write操作只将相关buffer写入到pagecache中，然后就通知应用write成功；然后系统的IO子系统可以根据写入到pagecache的page，进行一定的重排，优化性能，再写入到磁盘中。参考文档
在上面的例子8中，可以看到在fio进程结束后，仍然有大量的io写操作，这就是操作系统在flush pagecache的dirty数据到硬盘中。

参考

kakfa doc

reads vs writes

引子