Edited by: Shaun Mouton (@sdmouton)
|Hermione casting a spell.|
Illustration by Frida Lundqvist
Companies have had and needed Database Administrators for years. Data is one of a business’s most important assets. That means many businesses, once they grow to the point where they must be able to rapidly scale, need someone to make sure that asset is well managed, performant for the product needs, and available to restore in case of disasters.
In a traditional sense, the job of the DBA means she is the only person with access to the servers that host the data, the person to go to create new database cluster for new features, the person to design new schemas, and the only person to contact when anything database related breaks in a production environment.
Because DBAs traditionally have such unique roles their time is at a premium, and it becomes harder to think big picture when day to day tasks overwhelm. It is typical to resort to brittle tools like bash for all sorts of operational tasks in DBA land. Need a new DB setup from a clean OS install? Take, validate, or restore backups? Rotate partitions or stale data? When your most commonly used tool is bash scripting, everything looks like a nail. I am sure many readers are preparing tweets to tell me how powerful bash is, but please hold your commentary until after evaluating my reasoning.
Does all this sound like your job description as a DBA? Does the job description talk in details about upgrading servers, creating and testing backups, and monitoring? Most typical DBA job postings will make sure to say that you have to configure and setup ‘multiple’ database servers (because the expectation is that DBAs hand craft them), and automate database management tasks with (hand crafted) scripts.
Is that really a scalable approach for what is often a team of one in a growing, fast paced organization?
I am here to argue that your job is not to perform and manage backups, create and manage databases, or optimize queries. You will do all these things in the span of your job but the primary goal is to make your business’s data accessible and scalable. This is not just for the business to run the current product but also to build new features and provide value to customers.
Many tech organizations nowadays do one or more of the following:
- They are formed of many smaller teams
- They provide feature by creating many micro-services in place of one or a few larger services
- They adopt agile methodologies to speed the delivery of features
- They combine operations and engineering under one leadership
- They embed operations engineers with developers as early as possible in the design process
So what can be done to bust that silo and make it easier for other folks to debug, help scale the database layer, and empower engineers to design services that can scale? Most up-and coming shops have at most one in-house DBA. Can the one DBA be ‘present’ in all design meetings, approve every schema change, and be on call for a sprawling, ever growing database footprint?
DBAs can no longer be gate keepers or magicians. A DBA can and should be a source of knowledge and expertise for engineers in an organization. She should help the delivery teams not just deliver features but to deliver products that scale and empower them to not fear the database. But how can a DBA achieve that while doing the daily job of managing the data layer? There are a number of ways you, the DBA, can set yourself up for excellence.
foo. You become the ‘blocker’ for completing work. Getting familiar with the configuration management at your company is also a two way benefit. As you get familiar with how the infrastructure is managed, you get to know the team’s standards, get more familiar with the stack, and are able to collaborate on changes that ultimately affect the product scale. A DBA who is tuned into the engineering organization’s product and infrastructure as a whole is invaluable.
If your operations team is like mine where you are the only DBA, it probably means someone else on the team is the first line of defense when a DB related event pages. Some simple documentation on how to do initial debugging, data collection, can go a long way in making the rest of the operations team comfortable with the database layer and more familiar with how we monitor it and debug it. Even if that event still results into paging the DBA, slowly but surely, the runbook becomes a place for everyone to add acquired knowledge.
Additionally, I add a link to the related runbook section (use anchors!) to the page descriptions that go to the pager. This is incredibly helpful for someone being paged by a database host at 3 AM to find a place to start. These things may seem small but in my experience they have gone a long way breaking mental barriers for my operations team working on database layer when necessary.
As a personal preference, I write these as markdown docs inside my Chef cookbook repositories. This falls seamlessly into a pull request, review and merge pattern, and it becomes an integral part of the databases’ cookbooks pattern. As engineering teams start creating their own, the runbooks become a familiar template as new database clusters spring out all over the place.
Orchestrator is a great tool in that regard in that it makes visualizing clusters and their health a browser window away.
Graphite is in wide use for ingesting metrics in modern infrastructure teams, and Grafana is a widely used dashboarding front-end for metrics and analytics.
VividCortex isn’t a possibility (although, seriously, they are awesome!), there are other products and open source tools that can capture even just the slow log and put it in an easy to read web page for non DBAs to inspect and see the effect of their code. The important point here is that if you provide the means to see the data, engineers will use that data and do their best to keep things efficient. But it is part of your job to make that access available and not a special DBA trick.
It takes a lot of work to change the application layer of any product to protect the infrastructure and in the interim, allowing spurious database activity to cause pager fatigue is a big danger to both you and the rest of the operations organization. Get familiar with tools like pt-kill that can be used in a cinch to keep a database host from having major downtime due to unplanned volume. Make the use of that tool known and communicate the action and its effect to the stakeholder engineering team but it is unhealthy to try and absorb the pain from something you directly cannot change and it is ultimately not going to be beneficial to helping the engineering teams’ learn how to deal with growing pains.
There are a lot of ways a DBA’s work is unique to her role in comparison to the rest of the operations team but that doesn’t mean it has to be a magical priesthood no one can approach. These steps go a long way in making your work transparent but most importantly is approaching your work not as a gatekeeper to a golden garden of database host but as a subject matter expert who can provide advice and help grow the engineers you work with and provide more value to the business than backups and query tuning (but those are fun too!).
Special thanks to the wonderful operations team at Sendgrid who continue to teach me many things, and to Charity Majors for coining the title of this post.