This article is centralized around backup strategies for home users, and what a backup strategy is. The backup strategies discussed herein will be around RAID and what types of hardware users should deploy in a home, or SoHo Environment when attempting to provide some recovery mechanics for their data, or their business. This article looks at both internal RAID and NAS based RAID systems, breaks down the information you need to know and assists you with a direction to head when attempting to deploy such a solution.
This blog is hopefully found before you really need it. If not, it's not uncommon to find that you've lost your files due to a disk failure, malware, or applications similar to crypto-wall. When an issue like this happens, you'll hear the same thing all the time "I wish I had a backup", or "Did you make a backup" and of course "I told you, you should have had a backup." Home users don't really need the fancy bells and whistles that others may need (like SoHo and SMB, or even Enterprise businesses). Although you won't really read that you need extensive equipment, and extravagant software -- what you can expect though is to obtain some tips to save your software, and maybe some tips on how to get your information back if a disk does fail.
External Hard Drives
You may read on a few web pages, and some magazines that you can purchase an external hard disk for a back up, and of course to "extend your storage." However, the main reason those hard disks are marketed, is for the sole purpose of extra storage. And there are reasons for this statement. First and foremost, external hard drives are exposed to more stress and shock (due to improper handling) due to their portability. Mix this with possibly having pets, or children which may increase the risks of the hard disks dropping, or sustaining spills, and it becomes a SINGLE point of failure. And, we will discuss the point of single VS multiple points of failure.
Considering we mentioned the first problem with the USB style hard disks, we shall continue onward with the next issue. Many users are not aware that these disks also need to be properly "shutdown" You shouldn't simply plug it in, and then rip out the plug. This can cause a few problems, and more than likely disrupt the data that is already on the hard disk that may be in use, waiting to be written, or even worse -- data that isn't being touched but may still be impacted. And, this is also said for USB Flash drives, as well as solid states, and you guess it platter based hard disks.
Internal Based RAID
Internal based RAID (a RAID that is inside the computer you use every day) is a pretty good idea. However, due to the internal nature of the RAID / Backup it's not a true back up. Also, if in the event that the system is struck with a piece of malware like crypto-wall, and it's not caught in time, what can happen is a loss of data. Internal based RAID is also useless against theft, and we can also say that for all storage devices if they are not locked away. And, if they aren't locked away -- it will fall back on encryption. We will get to that later. The other issues that you may want to look into with internal based RAID similar hard drives may actually raise the failure rate of your hardware -- and what is worse without a true back up, your data may be lost forever. More so, if you don't have a say in building your own configuration or hardware, it will contain the same hard drive as the primary disk and that is exactly what most manufacturers do. So what this means is that if we have 2 hard disks that are Sea Gate, and they both have an uptime of 100,000 Hours and you've been using the computer and you're at your 99,999th hour the likelihood of both disks failing due to mechanical wear and tear are great. Instead dissimilar hard disks will benefit you in the long run. This way if one is rated for 100,000 hours and the other 200,000 hours, and either fails -- you still have a backup.
Adding a RAID controller to a P.C. is also asking for another point of failure. Should the controller stop working, or the drivers have problems -- this could spell a significant amount of data loss. In many instances, performance may be impacted because "true RAID" is usually never present within computers -- it's software based. More so it will not protect you against a virus, user error or worse -- deletion (intentional or unintentional). The point of this type of RAID or any RAID is to assist in a hardware (disk) failure. Many times disk RAID 0 is utilized for performance, but the problem with this is if you don't have 1 full working back up -- the chances of you recovering your data are very low.
Software RAID
Software RAID and, "Software RAID" (meaning most "hardware" based RAIDs on systems discussed above) have the same advantages of the previously discussed method but the problem with this aspect is that it's slower. If your system is built using a software RAID the performance will be impacted. This is mostly on the motherboard level, and is sometimes called FakeRAID. Normally to determine if you are on a SoftRAID one way you can tell is the price of your computer. Hardware RAID will normally cost a few hundred dollars more.
More so, software RAID is what you can do with an external hard disk, but that, too is not a true back up. The problem with these types of backup schematics is that either 1) No one has the time, 2) No one really remembers, 3) It takes too long, 4) Insert proper excuse here.
Encryption
There are a few benefits of utilizing encryption with, RAID and that is of course theft. But the problem with encryption is that it will not only impact performance, it will also impact how the data is retrieved (and this is said with some strict eyes). Here is where management of a whole disk type of encryption schematic will be needed -- and you probably guessed it. Keeping track of that secret key you have going on over there. Although windows bitlocker does support hardware based raid -- the one thing that you'd need to do in other instances is with the stored hard disks (You know, your grandfather-father-son backups.) is to encrypt them before they are shelved. It may seem impractical to store these devices outside the home for a home user, but the suggested location would be a safety deposit box. If it's the security of the data that you are worried about, the best bet for you would be to have a static mounted device where the RAID will be.
If you do decide to encrypt your RAID be mindful that if in the event that the RAID set fails, you may need to deploy different tactics in order to retrieve your data. Again, remembering the key, pass-phrase, or other authentication method you've utilized. We've delt with a crash (where we had one disk that was still working) and wanted to validate the files on both hard drives however, Linux was giving us a headache. Figuring that the time spent to decrypt the files, validate them, and then re-install was a huge monetary loss we scrapped and fell back on one of the images we've made before the disk failed. If we didn't have the RAID set up with it's respective backup we would have lost everything!
Considering What to Backup?
Some people will say full disk is essential, others will say just the data. However, there are minor issues with each respective approach. Should you backup and archive your data, you'd need to install the operating system all over again. And, this can take time and energy (ugh -- all those programs!). If you perform a whole disk RAID you simply point the RAID to do it's thing and your back up and running within a short time. Of course many people would figure it would be a no-brainer but wfhere is the real problem? Well... The storage.
So let's say you have a 1TB hard disk RAID, and the RAID is mirrored. That makes it a 1TB hard disk RAID. Cool. So you installed your operating system, you have a few programs -- a lot of pictures of the grand kids, videos, etc. In time what happens is that all those photos, and programs need to go somewhere, if the computer is not undergoing maintenance even the history can put you a little over the top. So now, what happens? You have 1.5TB of Junk, and 500GB of memories that you can't add anymore. So, here is where you have to decide whether you want full disk, or dedicated solution.
What is this dedicated solution you speak of? Well, if we are speaking of an internal solution for your systems, what you'd need to do is have 1 hard disk for the Operating system, your programs, settings, etc. Set up a secondary hard disk, and then a third hard disk. The second and third hard disks will be your storage locations. Whenever you save a picture disk 3 will mirror disk 2 and all is well and happy. Now, you're free to install all the software you want. The space will not be impacted by anything other than, more memories. Of course this can and will cost a bit more money because you're now purchasing 4 hard drives (1 for the os, 2 for the internal backup and of course one outside the computer for a backup of the internal RAID).
"Important Data"
Today one of the things that carries data that is not considered until it's well too late are our smart phones (Android, iPhone). The only time the data on these devices are considered is when the phone is either being upgraded, there is no available space on our cloud solution or, the phone meets a miserable death. In any of those cases (with the exception of iPhone being the wonderful device it is), you can backup Music, Photos, Pictures and Text Messages (again, with the exception of iMessage). The devices we normally push to our customers are either the: Synology DiskStation DS723+ or the: Synology DiskStation DS923+ (For larger locations like businesses with more than 1 computer we normally suggest the: DS1823xs+, DS3622xs+ and, for larger businesses for server applications: RS1619xs+)
However, with the exception of business backups (we can exclude the rackmount devices) we can focus on the 2, 4 and 8/12 bay devices. Setting up backups for your P.C., Mac or, Linux end points are rather easy and, the same thing can be said for the mobile devices. There are a few packages that you can download and install to help you along with the setup of synology backups. Please see the chart below:
Endpoint Backups
Synology CloudSynology Drive
Mobile
Synology DriveSynology PhotosSynology DSFile
Synology Software
Synology DriveSynology Active BackupSynology Cloud Sync
USB Copy
Hyper Backup
Virtual Machine Manager*
Of the many options to consider when backing up, the major problem that users (and some businesses) have is identifying what information or data should be backed up. There are a few things we should consider when attempting to make a backup of the important data that is in our environment.
In larger organizations is this OneDrive, OneNote or any other drive platform? Are there local folders that we should consider? Additionally those local folders may include server configurations and other server applications / configurations that we have not considered. Some of these examples include but are not limited to: SQL databases, documents, photos, diagrams, configuration files (firewall, windows settings, etc.) down to the personal files on the end points that our users, customers or family utilize. Here are some locations for the home office to consider when deploying a NAS.
-
Identify Important Data
The first discussion or thought needs to go into what types of data are important to you. Pictures? Videos? Media in general? Documents and spread sheets? Source code and development? Or, is it a combination of things?
Additionally, considerations around where "custom" data is stored and how and if it should be processed need to be planned out.
For instance in our environment we might need to backup e-mail, source code (between multiple operating systems and, endpoints) as well as diagrams and documents / contracts. These can be in different locations on different end points.
-
Where The Data is Stored
After you've identified the important data and where it could be stored, there are some general rules around the default locations within windows where some of your data may be stored. For instance, in Windows The main locations where a user will be creating and maintaining some form of data will be: C:\users\[username]\Documents; C:\users\[username]\Desktop; are the two most common directories that data will be stored. To cover all the locations we can make a rule in backs that covers the C:\users\[username]\ folder and grab everything in one shit. However, this should be approached with caution as many user will download large files which do not need to be backed up.
For the Linux operating system, we can perform the same action(s). The main files that we need to consider are: /home/[username] as a base directory as discussed in the Windows OS, and we can get more granular with the specified paths of: /home/[username]/Desktop; /home/[username]Documents; /home/[username]/Pictures; /home/[username]/Music.
MacOS & Unix operaing systems are similar in nature, most of your important data will be stored in /Users/[username]. While the base can be backed up and the backup will grab everything, the granularity specified in the other locations can also be employed here. Please be aware, specific locations for other applications like video editors and development tools should be considered unless you have a standard location for your business or preferences that should be added.
-
Where Should Data be Backed Up
Choosing where you backup files are stored is another consideration that needs to be made. Are you looking to store specifically on site? On site and in the cloud? Low bandwidth connections will suffer if you are pushing / pulling data from cloud solutions (e.g: carbonite, etc.). If you are on a slower internet connection you might want to consider keeping the data on-site or, pushing during off-peak hours. Additionally, if something should go catastrophically wrong pulling that data down might take a considerable amount of time. Finally at this point in your planning, you should consider purchasing an additional set of disks (e.g: RAID takes two, purchase two more) this is so that you can keep a backup off the RAID and, if a disk fails you have one on site you can swap out while the original is undergoing an RMA.
-
Determine Access to The Data
Access to your RAID Device can be internal and, external. With internal solely allowing you to access the RAID from the internal network, and only the internal network. The latter part of access is from the public internet. While both of these contain their own challenges. You will need to consider what types of access you want to deploy in your environment. While internal can have password protected access, SAML / OAuth, AD authentication allowing anyone to access all the data may expose you to various risks and vulnerabilities. Ideally, you can also set up some access or accounts to have R/O (Read-Only) access for things such as software downloads or, other data that does not require a user to write data to it (also think logging data).
-
Type of Backup & Restore
There are a few methods of backups and restore options you have available. You have full disk backup (which requires more space than you originally planned) and, there are the backups discussed in steps 1 and 2. Full backups can enable you to deploy images to your damaged systems and while the backups are being deployed your users can use the system as it was before it became unstable. The simplest approach and the most affordable would be to just backup the most critical data.
The final consideration would be to determine the RAID type (0, 1, 5, 6, 10, F1, SHR, SHR-2). Not only does the backup type determine failure tolerance, it also determines how much data you have available. A quick calculator for this is the Synology RAID Calculator This tool will help you choose the RAID and disk setup that is most optimal.
-
Frequency of Backups
As a general rule there should be at least one full backup made of your data each week. This is normally set up for our customers on a friday with some of the tools discussed within this blog posting. Additionally, with the other tools such as disk and cloud station backups are made to the data that changes and, you can specify the reversions you want on file. E.g: 8 backups so you can go back 7 times from the last back up (incremental). You may also want to consider an additional 2 disks on top of the amount of disks for your RAID device. E.g: 2 bay RAID with 2 additional disks. One disk for a spare and the other disk for full backups each week.
What Options are Available?
If you are really concerned with space the only logical thing you might be able to do is dedicate the efforts to an internal RAID. However, there are trade-offs as we have discussed. In this case you'd have a RAID and you can opt-in for the external hard disk that will backup your RAID just in case anything goes wrong. But you'd have to remember to backup and write your files to the external hard disk. Here is where it may become a bit dizzy regarding the setup options you'd have before you. The chart below will aim to break this down.
Disk Count |
Cost |
3 Disk (Mirror + OS + External Backup) | Low |
4 Disk RAID. 1 for OS 2 for data Mirror. 1 for External | High |
NAS (Network Attached Storage)
NAS has a lot of great points to it, and again it also has a lot of issues that you'd need to consider when attempting to go this route. First, you'd need to purchase a nas, and 3 hard drives (and this depends on your NAS -- We are speaking of such with the view of a 2 bay RAID + 1 External backup). Mostly these will be done through your switch, Firewall, or Router. The one thing that you should look into if your files are rather large is a gigabit connection for your devices (again, switch, firewall or Router). With this type of setup you can create different folders "zones" that can assist you with backing up your data. You can set guest access, access to folders for the kids, and access for yourself if you should so wish to do so. There are two methods you can provide a backup to this type of setup. On-demand, and when needed. With on-demand your disks are always powered up, and always spinning. This is said 24/7 365. Although this can put strain on the mechanical aspects of your disks -- they can and will perform functions faster. And, by faster we mean not waiting for the disks to spin up and write the data. Although this may age the disks faster, in our environments we don't utilize this type of setup. The other portion you can use is with disk sleep. When the disks are not in use, or not needed the disks are spun down and await for a connection before waking them up.
You can use this type of setup only for your data, and the settings for your systems. If you require more storage space you can re-direct programs to utilize the NAS. When speaking of which programs will have access to the NAS we are referring to Microsoft Office, redirecting your OS pictures folder to the NAS so that when images are imported they are directly sent to the NAS, and other such important pieces of information. But of course you will need that secondary hard disk that will be the backup for the RAID. Now, you should realize that with this, you will still need to make an independent backup of your files on the RAID in case something goes wrong. This way you have multiple points of failure.
RAID Levels
Of the many things you need to consider with a RAID, one of them is the RAID level you choose. There are multiple RAID levels you can choose from. These include: Level 0, 1, 5, 10, F10. You also have SHR and SHR-1 (with synology products). The benefit of SHR is that you can mix and match disks of nearly any size and the RAID will adjust. One of the things you should consider when working with a RAID is the size of the disks. For example; if you begin with a 2TB disk in a RAID configuration you can ONLY replace that disk with a 2TB. If you go higher only 2TB will be used and the remaining will be wasted / unused space.
RAID 0
RAID 0 can be thought of as just a collection of disks that give you the full volume. E.g: 1TB Disk + 1TB Disk = 2TB Total storage. The way this works is that files are split between the two. If any of those drives in the array fail, the data that is stored in this configuration is lost. The graphic below shows how this works:
Healthy and Functioning RAID 0.
In the example above we see that the RAID is in fact working. And, part of the RAID function is to split the file between the two disks (Drive 1 and, Drive 2) and store that single file across the hard drives in the array. If in the event one of thos disks fail, it would look something like this:
Unhealthy and Degraded RAID 0.
The second example of the RAID 0 volume is actually failing. The drive on the right-side of the configuration has checked out. What this translates to is half of the data (in this case the PDF) is now lost. The only way to recover this volume would be with a full backup.
The Cons of RAID 0: Raid 0 will in fact cost you more money for a few reasons. The more data you have and acculumate the larger the disks. In the most EXTREME case say we have a 20TB Volume across two 10 TB disks. (10TB + 10TB + RAID 0 = 20TB Storage Volume). The cost (today) is approximately 240.00 USD. For two disks that is 480.00 USD before taxes. To make matters worse, a full backup of all the data on a 20 TB disk would cost: 379.99 before taxes. So, the total in disks alone would be: 859.99. If you decide to have spare hard drive to swap in and out in the event of a failure the price goes up. Please note that these prices DO NOT include the cost of the RAID device / other hardware. Lastly, in a RAID 0 configuration there is no fault tolerance or, redundancy. You are at the mercy of full backups if something fails.
The Pros of RAID 0: One of the benefits of RAID 0 are it's speed in reading/writing.
RAID 1
A step up from RAID 0, RAID 1 has a few enhancements that provide you some data "security." Please be aware that when we use data security in this context, it means we are securing you from data loss. This does not include data security in terms of encryption or security in the 3 common states (rest, use, transmission). Now back to RAID 1. The RAID 1 setup allows you to have an exact copy of the data you are writing to one disk on the other. What this means is that if one drive fails, the other drive will infact have an exact copy of the data.
To understand RAID 1, we first need to understand how it is set up. Again, we are using the same size disks for each. In this case two 10TB hard drives together. Now in this case, you would think that 10TB + 10TB would give us 20TB total. Not in the RAID 1 application. In a RAID 1 application we will only be allowed to access the total space of one of the disks. This is because the second disk is making an exact copy of the data when a file is copied. For a better idea how a RAID 1 application works, please see the image below:
Healthy and Functioning RAID 1
As you can see from the image above, RAID 1 copies the file to both of the disks when it is written. Therfore if in the event that there is a failure with disk 0 or disk 1 the data is still safe as it is mirrored across both drives. The image below which demonstrates a disk degredation or crash below shows that the disk on the left still has a whole file on it, unlike the partial file from RAID 0.
Unhealthy and Degraded RAID 1
Although the fault tolerance is that you can lose one disk there are other points to consider when selecting this type of RAID.
The Cons of RAID 1: RAID 1 requires double the amount of drives to achieve the maximum capacity. As discussed previously two 10TB hard drives will result in just 10TB usable space. If we require 20TB of usable space, we need two 20TB drives as pointed out one drive is used to keep data safe.
The Pros of RAID 1: Raid 1 advantages are it's read operations are fast. There is high fault tolerance in systems with 2 disks. If in the event of a drive failure, data can be copied to the new drive.
RAID 5
While trying not to remain bias to a given RAID preference, the next step up is RAID 5. Although it does have a lot of benefits in the read / write category and it's affordability it does provide fault tolerance and, data redundancy. In a RAID 5 configuration the minimum number of disks are 3. While the fault tolerance of a RAID 5 can withstand any one of it's disks being lost / damaged however, this is not a silver bullet. RAID 5 introduces parity. Parity is a bit intricate to undetstand for how it works, but we will do our best to help you understand how RAID 5 with parity works.
Similar to RAID 0 with how the files are split, parity does the same thing but across the number of drives we have. So, lets visualize how this works.
RAID 5 Storage Example
In the above example we are using a RAID 5 setup with 4 hard drives. In the example, we have this configured for let's say 2TB drives each. Because the final drive and the way in which RAID 5 operates the final 2TB of data will be used for parity. So in total we have 6TB of storage space (first 3 disks * 2TB = 3TB). In this example, even with 4 drives. We still have a fault tolerance of 1 drive failure! Meaning if more than one drive fails, we lose all our data.
To visualize how a RAID 5 setup can withstand a drive failure, we can use the graphic below to simulate a single drive failure (1 drive fault tolerance) while mitigating the total loss of data:
RAID 5 Single Drive Failure
Given the example above, even though we've lost one drive a replacement drive on Disk 4 would allow us to rebuild and still have our data. Now, if in the event we lost two of the drive in any position, we lose all our data. The image below shows what a complete data loss on a RAID 5 would look like:
RAID 5 Multiple Drive Failure
Within the last example, without a full backup of the data that was stored on your RAID, this would be considered a complete loss for RAID 5.
Points to Consider
- Disk RAID require the hard disks used for the RAID + 1 for additional backup. (E.g: 2 bay RAID + 1 Disk for external backup via USB)
- Should your RAID be out in the open (on a desk; not in a locked location), ACL and Encryption should be utilized.
- SoftRaid / FakeRAID impacts performance and may only work with a specific OS (Microsoft Windows).
- External backup of your RAID should be encrypted. Ideally rotated between two additional disks and one kept off-site.
- RAID does not protect against viruses, malware, user error, and or deletion. (External / Remote Backup / Additional Hard Disk)
- External hard drives, flash drives, CF/SD, etc. Are not real backup strategies. (Unless a rotation is introduced with additional disks)
- Some laptop / desktop computers come with 2, or 3 disk RAID. (This may not protect you in all cases; simultaneous disk failure, malware, theft)
- Consider backing up the full disk, or just data.
- The RAID level you choose helps safeguard against multiple disk loss and, it affects write/read speed as well as how much storage is available.
- Future proofing your RAID or NAS should be another priority. Knowing maximum space requirements and external devices should be considered.
- New users should consider SHR with synology as it is more user friendly with space and expansion
- There is no one size fits all!