moecat

moecat

摸鱼大师

True.Nas high reliability enterprise-level solution

Prerequisites:

  • Three or more NAS devices
  • Each NAS has dual network ports
  • Each NAS has similar physical disk sizes
  • NAS BIOS supports power-on auto-start
  • NAS system disk is SSD or NVMe, and data disk is HDD, or you can distinguish between system and data disks to avoid mistakes during high availability testing

A simple home network diagram is shown below:
Simple Home Network Diagram

The optical modem is connected to NAS network port 1 via a gigabit switch.

The network ports 2 of the three NAS devices are connected to a 2.5G switch to form a local network.

Ensure that each NAS has a separate vmbr0 and vmbr1 set in Proxmox.
The bound network ports should also be consistent on each NAS.

image

vmbr0 is shared with virtual machines, so it is for LAN.
vmbr1 is for WAN, connected to the optical modem.

Ensure that the LAN IP of each NAS is in the same network segment to form a cluster.

image

After forming the cluster, choose one node to create an ikuai virtual machine.
Add LAN and WAN, with LAN on top (which is vmbr0).
The virtual machine disk can be set to a smaller size, such as 8GB, and can be placed on any of the three NAS devices.
Install ikuai and set the LAN network segment to be consistent with the NAS. Then, establish a connection. After setting up the connection, all three NAS devices should have internet access. Update the system on each device and then restart.

The above steps are only to ensure that there is internet access when installing Ceph.

After forming the cluster, follow the steps to install Ceph in the data center's Ceph section. It is recommended to choose the following options:
osd_pool_default_min_size = 2
osd_pool_default_size = 2

image

Add the administrator and monitor to all three nodes.

image

After installing large-capacity mechanical disks on each node, initialize them.

image

Add them as OSDs.

image

Create a Ceph FS.

image

In the data center storage, create an RBD.

image

Another important point to note is that the time of each node in the Ceph storage needs to be synchronized regularly to avoid inconsistent clocks. Therefore, you need to install systemd-timesyncd on each node in the system page.

image

Log in to each node and run the command crontab -e.

*/10 * * * *  ntpdate ntp.aliyun.com

This will synchronize the time every ten minutes.

Now, you can enjoy playing with hyper-convergence. For example, for the ikuai virtual machine we created earlier, we can move its disk from the local storage to the newly created distributed storage Ceph. After moving, delete the source and restart.

image

Due to the consistent NAS models and network ports, we can achieve a magical effect. My ikuai virtual machine for dial-up can achieve seamless switching between physical nodes within seconds (other virtual machines can also achieve this). The main point is to migrate the data in memory, and since the storage is on the distributed storage Ceph, there is no need to migrate it. For LXC in template mode, a restart is required after migration due to limitations. This can be considered as achieving seamless live migration.

Configure High Availability (HA)

image

Next, it's simple. You can freely experiment, such as creating a virtual machine (with virtual machine disks and data disks stored on Ceph RBD) and setting up SMB sharing.

Test High Availability:
Randomly unplug one data disk, and you will find that your SMB sharing is still working without any issues. Virtual machines are running smoothly, and you can still read and write data through SMB. In the Ceph tab, there will be a warning indicating that one OSD is offline. After a while, the data will be written and the data disk can be plugged back in. The OSD will refresh its status as available and automatically balance the recently written data.

Ultimate Test:
Randomly shut down one NAS device to simulate a failure. If HA is configured and there are virtual machines or LXC containers on the shutdown device, they will be automatically migrated to the surviving nodes and started after a brief retry. Your SMB sharing will continue to work normally, allowing read and write operations. After a while, restart the shutdown NAS device, and Ceph will automatically start balancing.

Of course, more extreme situations, such as all three NAS devices failing or two failing simultaneously, are not within our consideration. In such cases, a higher level of remote multi-activity is required.

In a home environment, if NAS is used to store important data such as code, family photos, blog articles, and keys, relying on a single machine's hardware RAID or software RAID carries certain risks. Even if successful reconstruction is possible, it can take a considerable amount of time, especially with today's common mechanical hard drives of 10TB or more. During this time, your storage service will be completely unavailable, and the success of the reconstruction is also a concern. This situation is clearly unacceptable.

Therefore, distributed storage systems like Ceph have great potential in the home market. It is expected that manufacturers will try multi-node solutions to meet the needs of geek users. Adopting a multi-node solution can also provide greater scalability, as the disk slots of a single node are limited.

Even for the most geeky users, relying solely on RAID in a home environment is not advisable. RAID is just a fault-tolerant mechanism, not a backup solution. Enterprise-level solutions use distributed storage, and even with just three nodes, a distributed system can be formed.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.