0%

最近在研究QUIC协议,如何让笔者的博客网站也支持QUIC呢?

背景

梳理下本网站的架构

1
client -> nginx(https) -> hexo server(127.0.0.1:4000)
  1. 负载均衡器: 使用nginx作为反向代理
  2. 博客服务: 使用hexo,并监听在127.0.0.1
  3. SSL证书: 使用acme.sh安装证书到对应的路径

实践

1. Nginx支持Quic

Nginx官方已在1.25.0以后支持Quic,需要手动编译和部署。
本物理机仍然是ubuntu16.04,综合考虑下,直接使用Docker的方式来运行,那么就需要考虑acme.sh如何将证书更新后reload docker中的nginx服务

acme.sh支持nginx docker

参考acme.sh的wiki部署相关acme.sh的docker和nginx docker

  • 启动nginx docker

    1
    docker run --rm -it -d  --label=sh.acme.autoload.domain=example.com   nginx:latest
  • 启动acme.sh docker

    1
    2
    3
    4
    5
    6
    docker run --rm  -itd  \
    -v "$(pwd)/out":/acme.sh \
    --net=host \
    --name=acme.sh \
    -v /var/run/docker.sock:/var/run/docker.sock \
    neilpang/acme.sh daemon
  • 申请证书

    1
    2
    3
    4
    docker  exec \
    -e CF_Email=xxx@exmaple.com \
    -e CF_Key=xxxxxxxxxx \
    acme.sh --issue -d example.com --dns dns_cf
  • 发布证书并reload nginx

    1
    2
    3
    4
    5
    6
    7
    8
    docker  exec \
    -e DEPLOY_DOCKER_CONTAINER_LABEL=sh.acme.autoload.domain=example.com \
    -e DEPLOY_DOCKER_CONTAINER_KEY_FILE=/etc/nginx/certs/example-com.key.pem \
    -e DEPLOY_DOCKER_CONTAINER_CERT_FILE="/etc/nginx/certs/example-com.one.cert.pem" \
    -e DEPLOY_DOCKER_CONTAINER_CA_FILE="/etc/nginx/certs/example-com.ca.pem" \
    -e DEPLOY_DOCKER_CONTAINER_FULLCHAIN_FILE="/etc/nginx/certs/example-com.cert.pem" \
    -e DEPLOY_DOCKER_CONTAINER_RELOAD_CMD="service nginx force-reload" \
    acme.sh --deploy -d example.com --deploy-hook docker

    大家这里可以猜测下acme.sh的docker是如何将nginx docker中的服务reload的?

  • docker.sock是docker间通讯的关键

  • 通过DEPLOY_DOCKER_CONTAINER_LABEL找到对应的nginx docker

  • 发送DEPLOY_DOCKER_CONTAINER_RELOAD_CMD命令将nginx reload

  • 整体的配置如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    version: "2.0"
    services:
    nginx:
    image: nginx:1.25.2
    container_name: docker_nginx
    restart: unless-stopped
    labels:
    - "docker_nginx"
    volumes:
    - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    - ./nginx/conf.d:/etc/nginx/conf.d
    ports:
    - "80:80"
    - "443:443"
    - "443:443/udp"
    extra_hosts:
    - "host.docker.internal:xxx.xxx.xxx.xxx"

    acme-blogs:
    image: neilpang/acme.sh
    container_name: acme-blogs
    command: daemon
    volumes:
    - /data/acmeout:/acme.sh
    - /var/run/docker.sock:/var/run/docker.sock
    environment:
    - DEPLOY_DOCKER_CONTAINER_LABEL=docker_nginx
    - DEPLOY_DOCKER_CONTAINER_KEY_FILE="/etc/nginx/certs/example-com.key.pem"
    - DEPLOY_DOCKER_CONTAINER_CERT_FILE="/etc/nginx/certs/example-com.one.cert.pem"
    - DEPLOY_DOCKER_CONTAINER_CA_FILE="/etc/nginx/certs/example-com.ca.pem"
    - DEPLOY_DOCKER_CONTAINER_FULLCHAIN_FILE="/etc/nginx/certs/example-com.cert.pem"
    - DEPLOY_DOCKER_CONTAINER_RELOAD_CMD="service nginx force-reload"

2. Nginx支持QUIC

参考nginx的文档配置server

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
server {
server_name example.com;
listen 443 ssl;
listen 443 quic reuseport;
listen 80;

ssl_certificate /etc/nginx/certs/example-com.cert.pem;
ssl_certificate_key /etc/nginx/certs/example-com.key.pem;
ssl_session_timeout 1d;
ssl_session_cache shared:MozSSL:10m;
ssl_session_tickets off;

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;

root /usr/local/share/webroot/example.com;
# resolver 127.0.0.1;

location ~ ^/.well-known/(.*)$ {

}

location ~ ^/ind/.*$ {
}

location ~ ^/ok.html$ {
add_header Alt-Svc 'h3=":443"; ma=86400';

return 200 "ok";
}


location ~ ^/(.*)$ {
add_header Alt-Svc 'h3=":443"; ma=86400';

proxy_pass http://hexo-backend/$1;
}
}

3. Nginx访问宿主机的hexo服务

方法一

一般docker是使用bridge的方式来启动的,这时会在宿主机中启动一个docker0的网卡,将nginx中的upstream的server ip改为这里的172.17.0.1即可实现docker访问宿主机网络

1
2
3
4
5
6
7
8
$ ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:fd:53:99:02
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

nginx upstream配置如下:

1
2
3
upstream hexo-backend {
server 172.17.0.1:4000;
}

方法二

docker v20.10+提供了一种支持方案,可通过指向 host.docker.internal 来指向宿主机的 IP。参见文档:从容器连接到主机上的服务
配置 docker-compose.yaml

1
2
3
4
5
6
7
8
9
10
version: '2.0'

services:

nginx:
image: nginx:1.25.2
container_name: docker_nginx
restart: unless-stopped
extra_hosts:
- "host.docker.internal:host-gateway"

在nginx docker中执行相关语句后,应该是可以联通到宿主机。由于笔者的docker环境是v20.4,无法实验,暂无法确定本方法是否可行

1
$ curl http://host.docker.internal:4000

方法三

host方式启动nginx docker,这样nginx仍然在宿主机网络下

总结

笔者最后使用方法一来打通docker和宿主机的网络。至此,整体架构改为以下方式。

1
client -> nginx docker(443:443, 443:443/udp) -> hexo(172.17.0.1:4000)

butex可以说是与brpc架构紧密结合

简介

butex是brpc协程(bthread)间的一种同步机制,与mutex/pthread关系类似。linux中的mutex是由futex来实现的,butex也是参考futex的设计的。

例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Mutex {
public:
typedef bthread_mutex_t* native_handler_type;
Mutex() {
int ec = bthread_mutex_init(&_mutex, NULL);
if (ec != 0) {
throw std::system_error(std::error_code(ec, std::system_category()), "Mutex constructor failed");
}
}
~Mutex() { CHECK_EQ(0, bthread_mutex_destroy(&_mutex)); }
native_handler_type native_handler() { return &_mutex; }
void lock() {
int ec = bthread_mutex_lock(&_mutex);
if (ec != 0) {
throw std::system_error(std::error_code(ec, std::system_category()), "Mutex lock failed");
}
}
void unlock() { bthread_mutex_unlock(&_mutex); }
bool try_lock() { return !bthread_mutex_trylock(&_mutex); }
// TODO(chenzhangyi01): Complement interfaces for C++11
private:
DISALLOW_COPY_AND_ASSIGN(Mutex);
bthread_mutex_t _mutex;
};


// bthread_mutex_t define
typedef struct {
unsigned* butex;
bthread_contention_site_t csite;
} bthread_mutex_t;
  1. 创建Mutex对象时,会调用bthread_mutex_init(&_mutex, NULL)函数来初始化_mutex,与pthread中的pthread_mutex_init一样
  2. Mutex::lock时调用bthread_mutex_lockMutex::unlock则调用bthrad_mutex_unlock,基本与pthread的mutex的lock与unlock方法一致
阅读全文 »

引子

最近在阅读Kafka的文档时有这么一段话:7200rpm的STAT硬盘的顺序写的带宽可以达到600MB/s,而随机写只有100kB/s,差距为6000倍

The key fact about disk performance is that the throughput of hard drives has been diverging from the latency of a disk seek for the last decade. As a result the performance of linear writes on a JBOD configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec but the performance of random writes is only about 100k/sec—a difference of over 6000X. These linear reads and writes are the most predictable of all usage patterns, and are heavily optimized by the operating system. A modern operating system provides read-ahead and write-behind techniques that prefetch data in large block multiples and group smaller logical writes into large physical writes. A further discussion of this issue can be found in this ACM Queue article; they actually find that sequential disk access can in some cases be faster than random memory access!

FIO数据对比

环境准备

组件 版本
ubuntu 16.04
fio 2.2.10
硬盘 ST1000DM010-2EP102 (7200 rpm) (1TB)

实验数据

1. 同步顺序写(DIRECT)

采用fio作为测试工具,strace观察系统调用。这里采取的是O_DIRECT的方式来屏蔽pagecache,直接测试硬盘的性能。

strace -f -tt -o /tmp/write.log -D /usr/bin/fio --name=write_test --filename=write_test --bs=4k --size=4G --readwrite=write -direct=1

bw=53MB/s, iops=13.3k

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
write_test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
write_test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/45303KB/0KB /s] [0/11.4K/0 iops] [eta 00m:00s]
write_test: (groupid=0, jobs=1): err= 0: pid=9975: Sun Sep 18 20:35:12 2022
write: io=4096.0MB, bw=53281KB/s, iops=13320, runt= 78720msec
clat (usec): min=48, max=234668, avg=73.83, stdev=555.55
lat (usec): min=48, max=234669, avg=74.11, stdev=555.56
clat percentiles (usec):
| 1.00th=[ 51], 5.00th=[ 52], 10.00th=[ 52], 20.00th=[ 53],
| 30.00th=[ 53], 40.00th=[ 55], 50.00th=[ 58], 60.00th=[ 59],
| 70.00th=[ 64], 80.00th=[ 67], 90.00th=[ 83], 95.00th=[ 125],
| 99.00th=[ 211], 99.50th=[ 290], 99.90th=[ 628], 99.95th=[ 812],
| 99.99th=[23936]
bw (KB /s): min= 2640, max=72216, per=100.00%, avg=53293.60, stdev=17445.27
lat (usec) : 50=0.01%, 100=93.13%, 250=6.12%, 500=0.44%, 750=0.25%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=3.60%, sys=20.13%, ctx=3145866, majf=0, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1


Run status group 0 (all jobs):
WRITE: io=4096.0MB, aggrb=53281KB/s, minb=53281KB/s, maxb=53281KB/s, mint=78720msec, maxt=78720msec

Disk stats (read/write):
sda: ios=0/1045929, merge=0/681, ticks=0/65092, in_queue=64968, util=52.42%

2. 同步随机写(DIRECT)

strace -f -tt -o /tmp/rand_write.log -D /usr/bin/fio --name=randwrite_test --filename=randwrite_test --bs=4k --size=1G --readwrite=randwrite -direct=1
bw=938KB/s, iops=229
可以看出,随机写的性能直接掉到底了

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
randwrite_test: (groupid=0, jobs=1): err= 0: pid=28561: Sun Sep 18 16:05:04 2022
write: io=53028KB, bw=938500B/s, iops=229, runt= 57859msec
clat (usec): min=146, max=610291, avg=4252.29, stdev=20251.09
lat (usec): min=146, max=610291, avg=4253.50, stdev=20251.11
clat percentiles (usec):
| 1.00th=[ 235], 5.00th=[ 322], 10.00th=[ 394], 20.00th=[ 996],
| 30.00th=[ 1880], 40.00th=[ 2288], 50.00th=[ 2512], 60.00th=[ 2704],
| 70.00th=[ 2928], 80.00th=[ 3248], 90.00th=[ 4048], 95.00th=[ 9664],
| 99.00th=[39680], 99.50th=[62720], 99.90th=[419840], 99.95th=[528384],
| 99.99th=[610304]
bw (KB /s): min= 13, max= 1912, per=100.00%, avg=950.38, stdev=400.21
lat (usec) : 250=1.26%, 500=14.15%, 750=3.65%, 1000=0.97%
lat (msec) : 2=12.08%, 4=57.65%, 10=5.49%, 20=2.08%, 50=1.87%
lat (msec) : 100=0.55%, 250=0.08%, 500=0.12%, 750=0.05%
cpu : usr=0.31%, sys=2.70%, ctx=66374, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=13257/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: io=53028KB, aggrb=916KB/s, minb=916KB/s, maxb=916KB/s, mint=57859msec, maxt=57859msec

Disk stats (read/write):
sda: ios=9/13798, merge=0/2448, ticks=1964/120220, in_queue=122120, util=93.91%

3. 同步顺序写(Buffer)

删除同步写中的--direct=1,则会使用到系统的pagecache。

strace -f -tt -o /tmp/writeb.log -D /usr/bin/fio --name=write_btest --filename=write_btest --bs=4k --size=4G --readwrite=write

bw=210MB/s, iops=52k
因此,性能直接提升了4倍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
write_btest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
write_btest: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/209.9MB/0KB /s] [0/53.8K/0 iops] [eta 00m:00s]
write_btest: (groupid=0, jobs=1): err= 0: pid=3605: Sun Sep 18 17:39:47 2022
write: io=4096.0MB, bw=210631KB/s, iops=52657, runt= 19913msec
clat (usec): min=14, max=138101, avg=18.49, stdev=161.31
lat (usec): min=14, max=138102, avg=18.57, stdev=161.31
clat percentiles (usec):
| 1.00th=[ 16], 5.00th=[ 17], 10.00th=[ 17], 20.00th=[ 17],
| 30.00th=[ 17], 40.00th=[ 17], 50.00th=[ 17], 60.00th=[ 18],
| 70.00th=[ 18], 80.00th=[ 18], 90.00th=[ 19], 95.00th=[ 20],
| 99.00th=[ 32], 99.50th=[ 45], 99.90th=[ 94], 99.95th=[ 123],
| 99.99th=[ 398]
bw (KB /s): min=129976, max=221720, per=99.93%, avg=210473.85, stdev=17392.80
lat (usec) : 20=92.31%, 50=7.30%, 100=0.29%, 250=0.08%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=9.32%, sys=28.28%, ctx=2097220, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: io=4096.0MB, aggrb=210631KB/s, minb=210631KB/s, maxb=210631KB/s, mint=19913msec, maxt=19913msec

Disk stats (read/write):
sda: ios=0/3023, merge=0/269, ticks=0/1938236, in_queue=1995224, util=74.12%

4. 同步随机写(Buffer)

strace -f -tt -o /tmp/rand_bwrite.log -D /usr/bin/fio --name=randwrite_btest --filename=randwrite_btest --bs=4k --size=1G --readwrite=randwrite

bw=115MB/s, iops=28k,同步随机写+pagecache的性能几乎不受随机写的影响,后面将详细介绍这个的原因。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
randwrite_btest: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
fio-2.2.10
Starting 1 process
randwrite_btest: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/111.6MB/0KB /s] [0/28.6K/0 iops] [eta 00m:00s]
randwrite_btest: (groupid=0, jobs=1): err= 0: pid=6334: Sun Sep 18 17:42:31 2022
write: io=1024.0MB, bw=115993KB/s, iops=28998, runt= 9040msec
clat (usec): min=15, max=3232, avg=18.65, stdev=13.70
lat (usec): min=15, max=3235, avg=18.79, stdev=13.72
clat percentiles (usec):
| 1.00th=[ 16], 5.00th=[ 17], 10.00th=[ 17], 20.00th=[ 17],
| 30.00th=[ 17], 40.00th=[ 18], 50.00th=[ 18], 60.00th=[ 18],
| 70.00th=[ 18], 80.00th=[ 19], 90.00th=[ 20], 95.00th=[ 21],
| 99.00th=[ 32], 99.50th=[ 44], 99.90th=[ 89], 99.95th=[ 114],
| 99.99th=[ 378]
bw (KB /s): min=103496, max=120216, per=99.96%, avg=115941.33, stdev=3889.19
lat (usec) : 20=88.05%, 50=11.56%, 100=0.31%, 250=0.06%, 500=0.01%
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=6.34%, sys=29.83%, ctx=1047354, majf=0, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=115992KB/s, minb=115992KB/s, maxb=115992KB/s, mint=9040msec, maxt=9040msec

Disk stats (read/write):
sda: ios=1/19, merge=0/91, ticks=24/308, in_queue=332, util=3.26%

5. 异步顺序写(DIRECT)

strace -f -tt -o /tmp/awrite.log -D /usr/bin/fio --name=write_atest --filename=write_atest --bs=4k --size=1G --readwrite=write --direct=1 --ioengine=libaio --iodepth=64

bw=91.2MB/s IOPS=22.8k
avg_req_sz=8, avg_queue_sz=3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
write_atest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/98907KB/0KB /s] [0/24.8K/0 iops] [eta 00m:00s]
write_atest: (groupid=0, jobs=1): err= 0: pid=10341: Sun Sep 18 20:58:42 2022
write: io=1024.0MB, bw=91292KB/s, iops=22822, runt= 11486msec
slat (usec): min=17, max=12234, avg=23.52, stdev=61.15
clat (usec): min=89, max=44052, avg=2778.77, stdev=2219.49
lat (usec): min=114, max=44070, avg=2802.45, stdev=2225.84
clat percentiles (usec):
| 1.00th=[ 2288], 5.00th=[ 2352], 10.00th=[ 2352], 20.00th=[ 2384],
| 30.00th=[ 2416], 40.00th=[ 2416], 50.00th=[ 2448], 60.00th=[ 2480],
| 70.00th=[ 2576], 80.00th=[ 2640], 90.00th=[ 2832], 95.00th=[ 2992],
| 99.00th=[12096], 99.50th=[18048], 99.90th=[36608], 99.95th=[42240],
| 99.99th=[43776]
bw (KB /s): min=48376, max=104344, per=100.00%, avg=91593.09, stdev=16068.66
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=97.38%, 10=1.17%, 20=1.00%, 50=0.43%
cpu : usr=8.90%, sys=25.21%, ctx=1050271, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=91291KB/s, minb=91291KB/s, maxb=91291KB/s, mint=11486msec, maxt=11486msec

Disk stats (read/write):
sda: ios=0/256033, merge=0/2488, ticks=0/22460, in_queue=22824, util=77.94%


Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 180.00 0.00 24053.00 0.00 94.75 8.07 3.32 0.14 0.00 0.14 0.03
84.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 61.00 0.00 24510.00 0.00 95.98 8.02 2.14 0.09 0.00 0.09 0.03
73.60

6. 异步随机写(DIRECT)

strace -f -tt -o /tmp/rand_awrite.log -D /usr/bin/fio --name=randwrite_atest --filename=randwrite_atest --bs=4k --size=1G --readwrite=randwrite --direct=1 --ioengine=libaio --iodepth=64
bw=1MB/s, iops=248

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
fio-2.2.10
Starting 1 process
randwrite_atest: Laying out IO file(s) (1 file(s) / 1024MB)
^Cbs: 1 (f=1): [w(1)] [4.9% done] [0KB/1258KB/0KB /s] [0/314/0 iops] [eta 16m:48s]s]
fio: terminating on signal 2

randwrite_atest: (groupid=0, jobs=1): err= 0: pid=14846: Sun Sep 18 21:03:13 2022
write: io=51728KB, bw=995.96KB/s, iops=248, runt= 51983msec
slat (usec): min=23, max=532231, avg=274.92, stdev=5089.58
clat (msec): min=1, max=1302, avg=256.93, stdev=194.37
lat (msec): min=1, max=1303, avg=257.21, stdev=194.42
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 24], 10.00th=[ 78], 20.00th=[ 123],
| 30.00th=[ 281], 80.00th=[ 375], 90.00th=[ 537], 95.00th=[ 668],
| 99.00th=[ 906], 99.50th=[ 963], 99.90th=[ 1106], 99.95th=[ 1172],
| 99.99th=[ 1303]
bw (KB /s): min= 88, max= 2138, per=100.00%, avg=1040.46, stdev=467.51
lat (msec) : 2=0.02%, 4=0.20%, 10=1.52%, 20=2.66%, 50=4.04%
lat (msec) : 100=5.58%, 250=50.22%, 500=23.85%, 750=8.47%, 1000=3.05%
lat (msec) : 2000=0.38%
cpu : usr=0.51%, sys=3.17%, ctx=60541, majf=0, minf=11
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=12932/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: io=51728KB, aggrb=995KB/s, minb=995KB/s, maxb=995KB/s, mint=51983msec, maxt=51983msec

Disk stats (read/write):
sda: ios=1/13257, merge=0/2653, ticks=460/3398116, in_queue=3400912, util=100.00%
1
2
3
4
5
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda 0.00 100.00 1.00 301.00 0.00 1.56 10.60 63.82 271.62 460.00 270.99 3.31 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 225.00 0.00 160.00 0.00 1.81 23.20 63.35 339.27 0.00 339.27 6.25 100.00

7. 异步顺序写 (BUFFER)

strace -f -tt -o /tmp/a-bwrite.log -D /usr/bin/fio --name=write_abtest --filename=write_abtest --bs=4k --size=1G --readwrite=write --ioengine=libaio --iodepth=64

bw=110MB/s, iops=27k

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Sun Sep 18 21:06:43 CST 2022
write_abtest: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/109.7MB/0KB /s] [0/28.8K/0 iops] [eta 00m:00s]
write_abtest: (groupid=0, jobs=1): err= 0: pid=19904: Sun Sep 18 21:06:53 2022
write: io=1024.0MB, bw=110353KB/s, iops=27588, runt= 9502msec
slat (usec): min=16, max=5570, avg=19.10, stdev=15.72
clat (usec): min=75, max=13502, avg=2273.02, stdev=405.49
lat (usec): min=93, max=13522, avg=2292.26, stdev=408.06
clat percentiles (usec):
| 1.00th=[ 2096], 5.00th=[ 2160], 10.00th=[ 2160], 20.00th=[ 2192],
| 30.00th=[ 2192], 40.00th=[ 2224], 50.00th=[ 2224], 60.00th=[ 2256],
| 70.00th=[ 2256], 80.00th=[ 2256], 90.00th=[ 2352], 95.00th=[ 2480],
| 99.00th=[ 2800], 99.50th=[ 3216], 99.90th=[ 9152], 99.95th=[11456],
| 99.99th=[12992]
bw (KB /s): min=86312, max=114144, per=99.98%, avg=110333.89, stdev=6024.11
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.03%, 4=99.57%, 10=0.31%, 20=0.08%
cpu : usr=9.77%, sys=24.76%, ctx=1048469, majf=0, minf=11
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=110353KB/s, minb=110353KB/s, maxb=110353KB/s, mint=9502msec, maxt=9502msec

Disk stats (read/write):
sda: ios=0/41, merge=0/29, ticks=0/172, in_queue=172, util=1.84%
Sun Sep 18 21:06:53 CST 2022
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
09/18/2022 09:06:50 PM                                                                                    [60/1211]
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00

09/18/2022 09:06:51 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00


09/18/2022 09:06:54 PM [36/1211]
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 14.00 0.00 128.00 0.00 20.54 328.69 28.26 45.22 0.00 45.22 1.72
22.00

09/18/2022 09:06:55 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 15.00 0.00 281.00 0.00 136.46 994.59 137.54 389.08 0.00 389.08 3.56
100.00

8. 异步随机写 (BUFFER)

strace -f -tt -o /tmp/rand_abwrite.log -D /usr/bin/fio --name=randwrite_abtest --filename=randwrite_abtest --bs=4k --size=1G --readwrite=randwrite --ioengine=libaio --iodepth=64

bw=82MB/s, iops=20k

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Sun Sep 18 21:10:22 CST 2022
randwrite_abtest: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
randwrite_abtest: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [w(1)] [92.3% done] [0KB/100.9MB/0KB /s] [0/25.9K/0 iops] [eta 00m:01s]
randwrite_abtest: (groupid=0, jobs=1): err= 0: pid=23631: Sun Sep 18 21:10:35 2022
write: io=1024.0MB, bw=82964KB/s, iops=20740, runt= 12639msec
slat (usec): min=17, max=117863, avg=29.43, stdev=389.30
clat (usec): min=90, max=275442, avg=3054.22, stdev=6057.16
lat (usec): min=109, max=275818, avg=3083.81, stdev=6120.28
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 3],
| 30.00th=[ 3], 40.00th=[ 3], 50.00th=[ 3], 60.00th=[ 3],
| 70.00th=[ 3], 80.00th=[ 3], 90.00th=[ 3], 95.00th=[ 4],
| 99.00th=[ 24], 99.50th=[ 36], 99.90th=[ 75], 99.95th=[ 82],
| 99.99th=[ 273]
bw (KB /s): min= 5859, max=107656, per=100.00%, avg=83332.52, stdev=38980.53
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=97.06%, 10=0.81%, 20=0.61%, 50=1.24%
cpu : usr=7.35%, sys=23.67%, ctx=1052914, majf=0, minf=11
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
WRITE: io=1024.0MB, aggrb=82963KB/s, minb=82963KB/s, maxb=82963KB/s, mint=12639msec, maxt=12639msec

Disk stats (read/write):
sda: ios=0/6667, merge=0/6056, ticks=0/5484, in_queue=5480, util=22.75%
Sun Sep 18 21:10:35 CST 2022
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
09/18/2022 09:10:30 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 1665.00 0.00 1782.00 0.00 33.20 38.16 1.57 0.89 0.00 0.89 0.54
95.60

09/18/2022 09:10:31 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 1909.00 0.00 2098.00 0.00 39.14 38.20 0.76 0.36 0.00 0.36 0.32
66.80

09/18/2022 09:10:32 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00

09/18/2022 09:10:35 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 17.00 0.00 2.00 0.00 0.07 76.00 0.02 12.00 0.00 12.00 12.00
2.40

09/18/2022 09:10:36 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 10.00 0.00 60.00 0.00 8.87 302.67 2.31 8.87 0.00 8.87 0.60
3.60

09/18/2022 09:10:37 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 14.00 0.00 358.00 0.00 180.20 1030.84 142.27 317.41 0.00 317.41 2.79
100.00
09/18/2022 09:10:38 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 7.00 0.00 239.00 0.00 119.79 1026.48 143.82 408.74 0.00 408.74 4.18
100.00

09/18/2022 09:10:39 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 19.00 0.00 311.00 0.00 153.39 1010.11 145.40 595.92 0.00 595.92 3.22
100.00

09/18/2022 09:10:40 PM
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm
%util
sda 0.00 14.00 0.00 357.00 0.00 179.40 1029.15 143.63 428.44 0.00 428.44 2.80
100.00

数据总结

模式 direct+sync buffer+sync direct + libaio buffer + libaio
write bw=53MB/s, iops=13k bw=210MB/s, iops=52k bw=91MB/s, iops=22k bw=110MB/s, iops=27k
randwrite bw=938KB/s, iops=229 bw=115MB/s, iops=28k bw=1MB/s, iops=248 bw=82MB/s, iops=20k

数据分析: 为何randwrite+buffer+sync的iops可以是20k+

原因在于操作系统欺骗了应用,在pagecache下,内核层的write操作只将相关buffer写入到pagecache中,然后就通知应用write成功;然后系统的IO子系统可以根据写入到pagecache的page,进行一定的重排,优化性能,再写入到磁盘中。参考文档
在上面的例子8中,可以看到 在fio进程结束后,仍然有大量的io写操作,这就是操作系统在flush pagecache的dirty数据到硬盘中。

参考

kakfa doc

reads vs writes

前段时间有个需求要对pdf进行拆分,由于资料比较私密,所以并没有使用在线工具。

市面上本地PDF拆分工具大部分都是需要付费的,比如福昕,WPS等。找寻大半个互联网,得到一款开源免费软件: PDFsam Basic

该软件包含各个版本:windowsmacdebain/ubuntu