A few things to know before running Ceph on small clusters

I’ve been using Ceph (and more particularly CephFS) on a 3-node production cluster (3 nodes with 4 x 6 TB hard drives each) for more than one year now, and I have mixed feelings about it. Good news are that I did not experiment any long time outage, nor any data loss. The whole cluster worked more or less as expected, but maintenance tasks proved to be very time consuming: a lot more than what I had foreseen.

Yesterday, I had to replace a hard drive, which had failed some days ago. Data recovery had completed ; the cluster was calm and quiet. Then I removed the failed OSD from the CRUSH map without adding another as a replacement. Shouldn’t have done that. Instantly, Ceph began moving a third of the cluster data (around 9 TB of data) to different nodes. Client operations took minutes to complete, and my nightmare began. I think that was the straw that broke the camel’s back.

Now I’m in the middle of deciding whether I’ll keep on using Ceph or move the dataset to something less sensitive. I decided to write this post to inform people like me of the drawbacks of Ceph when it is deployed on small-scale clusters (i.e.: less than 5 nodes). Ceph is a great piece of software, but the fact is that it was designed to operate in middle- to large-sized datacenters, with hundreds or thousands of nodes. There may be more suitable solutions for mini-clusters like mine or yours.
Continue reading →