Discussion:
tuning for mixed rate 800 stream write application
Stan Hoeppner
2014-10-04 02:22:52 UTC
Permalink
Hello fellow bcache users/developers,

A couple of questions.

1. How do I disable write caching?
2. How do I increase the sequential IO tracking window from 128 IOs to say 4096 IOs or nmore?

Our application does small random reads and data is never read twice, so we don't want any read caching. Reads comprise less than 20% of the IO workload. It writes ~800 streams to hundreds of preallocated files in parallel using O_DIRECT and AIO. The stream rates vary from ~50MB/s to less than 2KB/s. bcache currently seems to be writing a lot of data to cache that should be going directly to the two RAID LUNs, bcache0 and bcache1. These each show over 150GB cache used or 300GB total of a 400GB SSD, with only 10GB bypassed. It seems to be writing sequential IO to cache because it's unable to properly classify it due to the small 128 entry tracking window. Thus throughput is actually about 20 lower than direct to LUN. It seems clear bcache isn't doing the right thing with classificat
ion due to the large number of mixed sequential/random IOs in flight.

The boxes have 32 cores and 256GB of RAM so we have plenty of horsepower and memory to dedicate to bcache use. These boxes are totally IO bound with little CPU/memory use.

Please advise.

Thanks,
Stan
Stan Hoeppner
2014-10-06 00:10:45 UTC
Permalink
One would think with a sequential cutoff of only 1024 bytes, 2 sectors, that every sector would bypass the write cache:

# cat /sys/block/bcache0/bcache/sequential_cutoff; cat /sys/block/bcache1/bcache/sequential_cutoff
1.0k
1.0k

But as we can see a massive number of sectors are being cached:

# cat /sys/block/bcache0/bcache/dirty_data; cat /sys/block/bcache1/bcache/dirty_data
101G
101G

Am I misunderstanding the (sparse) documentation and not tuning correctly, or is there another problem? I suspect bcache's 128 entry tracking table is insufficient for our workload. If this is correct, is there any way to increase the size of this table?

Thanks,
Stan
Post by Stan Hoeppner
Hello fellow bcache users/developers,
A couple of questions.
1. How do I disable write caching?
2. How do I increase the sequential IO tracking window from 128 IOs to say 4096 IOs or nmore?
Our application does small random reads and data is never read twice, so we don't want any read caching. Reads comprise less than 20% of the IO workload. It writes ~800 streams to hundreds of preallocated files in parallel using O_DIRECT and AIO. The stream rates vary from ~50MB/s to less than 2KB/s. bcache currently seems to be writing a lot of data to cache that should be going directly to the two RAID LUNs, bcache0 and bcache1. These each show over 150GB cache used or 300GB total of a 400GB SSD, with only 10GB bypassed. It seems to be writing sequential IO to cache because it's unable to properly classify it due to the small 128 entry tracking window. Thus throughput is actually about 20 lower than direct to LUN. It seems clear bcache isn't doing the right thing with classific
ation due to the large number of mixed sequential/random IOs in flight.
Post by Stan Hoeppner
The boxes have 32 cores and 256GB of RAM so we have plenty of horsepower and memory to dedicate to bcache use. These boxes are totally IO bound with little CPU/memory use.
Please advise.
Thanks,
Stan
Kai Krakow
2014-10-04 19:30:05 UTC
Permalink
Post by Stan Hoeppner
Hello fellow bcache users/developers,
A couple of questions.
1. How do I disable write caching?
2. How do I increase the sequential IO tracking window from 128 IOs to
say 4096 IOs or nmore?
I don't know which of the various knobs bcache considers but you can tune
the IO window for the kernel IO scheduler with something like this:

#!/bin/sh

for nr_requests in
/sys/bus/scsi/devices/[012]:0:0:0/block/*/queue/nr_requests; do
echo -n 4096 >$nr_requests
done

I suggest switching to deadline also if you haven't done so yet.

Please note that my script tunes my SATA devices. I don't use bcache yet.
You may want to adjust the line to work for the bcache devices.
--
Replies to list only preferred.
Loading...