By Michael Burke | November 25, 2019
While some industry jargon might make it seem otherwise, there is no data stored on the internet. All electronic data resides on a computer that is lives somewhere. Typically, this computer is called a “server”, a term related to the ‘client-server’ architecture model in which one computer–the ‘client’–makes requests of another computer which fulfills those requests, or the ‘server’. A server actually could be any computer. For instance, you could host your company’s website on a PC sitting in your garage–it would be a very bad idea, but you could do it.
More often, data that isn’t associated with a specific computer lives in a datacenter–a facility that has not only servers, but also all the equipment needed to make it usable, like routers and switches, as well as all the components that keep it safe, like firewalls, backup systems and systems that keep it physically safe. Most large organizations have their own data centers (described as ‘on-premise’), but increasingly organizations rent data storage from third-party vendors (the biggest of which are AWS, Google and Microsoft). The latter is typically referred to as ‘cloud storage’, which is essentially a fancy name for using someone else’s computers.
Data centers come in all shapes and sizes. If you’re (unwisely) hosting your website from a PC in your garage, your garage is a data center. Alternately, the data center might reside in a military-guarded facility underneath a mountain.
Where to store the data? That is the question…
For all organizations, the question of where to store the data is an important one that balances various forms of risk, as well as performance. On the risk side, companies grapple with whether or not their data is safer from external or internal threats in their own private facilities, or if it’s better entrusted to AWS, Google or Microsoft. As the aforementioned companies have better resources than just about any other organization, many make the argument that data is safer in the cloud than it is on-premise.
Additionally, the cloud storage approach allows you to sidestep the cost of building and maintaining your own data centers. The cost of running your own data center can be immense–the electric bill alone required to keep the servers cool can be astronomical, not to mention all the people you have to hire. On top of this, companies often don’t know how much computer power they’ll actually need, and often are stuck overspending by purchasing too many servers.
Cloud storage, on the other hand, offers the opportunity to rent only the amount of storage you need from one of the big cloud vendors. If you find that you need more than you originally thought, getting more is often as easy as a few mouse clicks (vs. having to go out and buy/configure new servers to meet unexpected demand). This ability to quickly increase your data storage capacity is often referred to as ‘scaling’ (a concept which is taken to its extreme with hyperscale data centers). As a result, many small, medium and large organizations find that the cloud offers better economics. For a refresher on the underlying technologies that make this possible, see our post on virtualization.
An additional storage scheme that’s worth mentioning is co-location, in which multiple organizations share the same datacenter in order to better absorb the costs. And finally, it should be mentioned that in real-life companies mix and match all of these models. For example, you’ll frequently hear the term “hybrid IT” used to describe an organization that mixes cloud and on-premise resources.
Einstein’s universal speed limit affects data center strategy
There are also issues of availability and performance. For instance, some applications require incredibly low ‘latency’–in other words the time lag between when an action is initiated and when it is carried out has to be almost non-existent. While you’d think that the amount of time earth geography adds to computing actions would be negligible, as electricity travels at nearly the speed of light, the fact that computer operations can be composed of thousands, even millions of actions, each one of which has its own latency, means that using a data center that’s located in a different state or country isn’t an option for some applications.
For example, some Wall Street trading platforms need to have their data centers in very close proximity, or their software will run too slowly to be competitive with other trading platforms. And since it’s impossible to transmit data faster than the speed of light, this remains for the time-being an unsolvable challenge. On top of that, in general cloud storage or any sort of remote data architecture that relies on the internet to transmit data presents the spectre of an internet outage making it impossible to function.
We’re all our own personal CIOs now
Actually, most of us make datacenter decisions, we just don’t call them ‘datacenter decisions’ per se. If you use DropBox to store your files, you’ve probably made a decision that the DropBox corporation with its billion-dollar resources is better equipped to keep your files safe than you are, with your limited resources. On the other hand, if you lose internet access internet goes out, you may temporarily lose access to the latest version of those files. This is similar to the tradeoffs that large organizations must weigh in determining their storage schemes.