Nix-ify your environment

Over some vacation I put a bunch of effort into rebuilding my dotfiles and other environment configuration using home-manger, Nix, and the wealth of packages available in Nixpkgs. Previously, all of my system’s configuration was bespoke, unversioned files and random commands run to bring it to its current state. This has worked fine for a number of years, but has some drawbacks such as not being easily reproducible and portable between other systems.

Our developer acceleration team at Shopify is exploring the feasibility of Nix to solve a number of problems when it comes to supporting development environments for hundreds of software developers. Burke Libbey, who is spearheading a lot of Nix exploration on the developer acceleration team at Shopify, has a number of excellent resources, two of which are public that have inspired me to look into Nix on my own and write this article. He’s created a number of Nix related youtube videos, and an article on the Shopify Engineering blog diving into what Nix is. I won’t go into detail about what Nix is in this article as these resources can help. Instead, I’ll focus on some learnings I’ve had over my time switching to using home-manger, using the Nix language, and the Nix package manager.

home-manager

Home-manager is a program built on top of Nix which makes it simple for a user to manage their environment. Home-manager has a long list of applications which it natively supports configuring, as well as the flexibility to configure programs not yet managed by home-manager. Configuring home-manager is generally as simple as settings a number of key-value pairs in a file. Home-manager then deals with installing, uninstalling, and configuring everything for you from a few simple commands. For example, here’s a simplified version of the home-manager config which installs and configures a few packages and plugins:

Here is my full home-manager config for reference.

Some of the biggest factors that sold home-manager to me was easily being able to version my environment’s configuration, installing neovim and all the plugins I use by only specifying the name of the plugin, integrating fzf into my shell with only two config options, zsh installed and configured with plugins, and lastly having an escape hatch to specify custom options in my zshrc and neovim config.

All of this configuration is now versioned, and any edits I make to my home-manager config or associated config files just require one home-manager switch to be run to update my entire environment. If I want to try out some new vim plugins, a different shell, or someone else’s home-manager configuration, I can safely modify my configuration and know that I can revert back to the version I have stored in Git.

home-manager tips

I found the manpages for home-manager to be greatly useful at seeing which configuration options there are as well as what they do, what types it takes, etc. This can be accessed via man home-configuration.nix. I would always have it open when modifying my home-manager configuration.

By default home-manager stores its configuration in ~/.config/nixpkgs/home.nix. Home-manager provides an easy way to jump right into editing this file: home-manager edit. Since this configuration file isn’t in the nicest of places we can change the location of it and still have home-manager pick it up. The best way to configure this would be to use home-manager to configure itself by setting programs.home-manager.path = "~/src/github.com/jonniesweb/dotfiles/home-manager/home.nix";, or wherever your configuration file exists. If needed, the HOME_MANAGER_CONFIG environment variable can be set with the same value to tell the home-manager command where the config exists if something goes wrong.

In the switchover I challenged myself to change over from vim to neovim. This didn’t involve too much effort as my vim config needed a few updates to be compatible with neovim. A large amount of time was saved by the automatic install of the various vim plugins I use.

In the process I also moved away from using oh-my-zsh to plain old zsh. A fair amount of time was spent understanding the different zsh shell options and which ones oh-my-zsh provided me with. More time was spent configuring my shell’s prompt to use a plugin offering git information and its own theme. Oh-my-zsh does a fair amount of magic in the background at plugin and theme loading, but when looking at it’s source code, it’s actually incredibly simple.

A lot of language, tools, and other dependencies were left out of my home-manager’s config since Shopify’s internal dev tools handles the majority of this for us on a per-project basis.

If you’re having home-manager manage your shell, don’t forget to set the xdg.enable = true; option in your config. Some programs depend on the XDG_*_HOME environment variables to be present. I can see why this option isn’t enabled by default as many operating systems may have values that differ from the ones defaulted to by home-manager.

My main development environment is on OS X and therefore differs from Linux in some areas. One of the projects I’m going to keep my eye on is nix-darwin which appears to be solving the problem that NixOS solves for Linux: complete system configuration.

Conclusion

Similar to Docker, Canonical’s snap packages, and Nix ecosystems, we’re going to see a steady increase in the number of companies and individuals utilizing these technologies for their use cases of explicitly defining what software runs on their systems. Docker is already gained critical mass throughout enterprises, Canonical’s Snap packages are slowly picking up steam on Ubuntu-based systems, and Nix appears to be breaking into the scene. I’m rooting for Nix as it has a leg up on other systems with its complete and explicit control over all components which go into making up a program or even a complete system. I’m excited to see how much it will catch on.

Replacing the engine while driving: planning safety measures into the project

When a project is created for changing out an existing dependency with another one, there are many factors that should be thought of. If there is a high risk involved if something goes wrong, then more caution would need to be taken. If a lot of effort is involved to add the new dependency, then it may make sense to incrementally ship the new dependency alongside the old dependency. Lastly, depending on the complexity, will a lot of work be done up front to integrate the new dependency, or will it be hacked in then cleaned up after the launch?

These are some of the questions that can be asked to help guide the planning for a project which changes out one dependency for another. The same pattern can be used for similar types of work besides dependencies such as service calls, database queries, integrations, and components, to name a few.

Generally, a project like this can have the following steps:

  • Investigate
  • Refactor
  • Add the new dependency
  • Launch
  • Remove the old dependency
  • Refactor
  • Celebrate

Not all of these steps need to be performed in this order or at all. For example, the launch, add the new dependency, and remove the old dependency can all be completed at once if the risk and simplicity allows for it.

Refactoring may be redundant to mention given developers should be refactoring as they normally make changes to the codebase. I am being explicit about mentioning it here so that the team can use deliberate refactoring to their advantage to make the new dependency easy to add, as well as removing any unnecessary abstractions or code leftover from the old dependency.

A special shoutout goes to the celebrate step, since I certainly know that teams can be eager to move onto the next project and forget to appreciate the hard work put into achieving its success.

Now lets get into the different concerns that can apply to projects like this, and the practices that can help it succeed.

Concerns and Practices

The most important step every project should have, the investigation step informs how the rest of the project should work. During this step as much information should be gathered to give a confident enough plan for the rest of the project to proceed. Some of the actions taken during this step are to understand how the current system works, how the dependency to be replaced is integrated, how critical this dependency is to the businesses operations, how the new dependency should be integrated, and any cleanup and refactorings that should be made before, during, and after the new dependency is added and launched.

A big topic to explore is what the system would look like if the system was originally built using this new dependency. This mental model forces the envisioning of a clean integration with the new dependency, ignoring any sort of legacy code, and free of any constraints within the system. This aligns the team with the ultimate design of the system that uses this new dependency. The team should strive and fight for reaching this state at the end of the project since it can result in the cleanest and most maintainable code. If this is not part of the end goal for the project then the system may carry forward unnecessary code and bad abstractions that result in tech debt piling up for future developers.

Another big consideration is what will the system look like when it is launched? Will the new and old dependency both have to coexist in the system so that there can be a gradual transition? Or maybe the change is small enough that the old dependency can be removed and the new one added in one change and no user interruption will occur? If there does exist a period where the two dependencies need to coexist then how can this behaviour be implemented to fit in with the ultimate design discussed earlier? This may involve doing a lot of refactoring up front to get the system into a state that fits the ultimate design at the end of the project. There also exists the tradeoff that the new dependency can be hastily integrated into the system alongside the other dependency, then all of the hacks can be undone with a sufficient cleanup and refactoring after the launch. Both methods have different tradeoffs: with the up front refactoring, the new dependency is integrated the right way into the system, but may require a lot of refactoring of the surrounding and old dependency’s code. On the other hand, hacking the new dependency into the same pattern that the old dependency uses gets the project faster to launch, but can lead to bugs and integration troubles if the interfaces or behaviour are different. Regardless, at the end of the project, the system should look as if the old dependency never existed.

How much risk is involved with switching over to use the new dependency? If it is very risky then more precautions should be taken to reduce the risk. If there is very little risk then little to no precautions can be taken, and the project can move much faster. Some methods I have used to help with reducing risk are collecting metrics on the different types of requests and responses between the two dependencies. These metrics can be early warning systems to the new dependency behaving incorrectly. Being able to rollback to the old dependency via a deploy or a feature flag provides the flexibility to quickly switch back to the old dependency in case things go wrong. Dark launching the new dependency to production has been a practice I often encourage my team to use, since it allows for testing out the new code in the production environment without affecting real users. Lastly, beta testing with a percentage of users can also reduce the impact since if something goes wrong with the new dependency, only a fraction of the users are affected. Many of these practices are complimentary and can be used together to achieve the best mix of handling risk.

How much effort is involved to add the new dependency? Effort could mean the amount of changes to make to the code or the time involved. If there is a significant amount of effort then it absolutely makes sense to incrementally ship small changes to production. Even if the changes are not live code paths, at least the team can review each change, provide feedback, and course correct if needed. I have seen in small effort projects all work was deployed to production in one pull request. On large effort projects the work was split across many pull requests written and deployed over time by a team. This latter case enabled the team to dark launch the new dependency to production and had the added safety to switch back to the old dependency if needed.

Given the considerations and practices discussed throughout this article, it is best to validate that they will actually work when it comes time to execute. If a team is experienced with these projects and practices then the investigation period can be quicker, otherwise if the team is less experienced with these projects the investigation should be more substantial. Building a prototype can help build confidence in the assumptions made about running the project and guide the team going forward. A good prototype proves or disproves assumptions in as minimal time as possible. Once the team is confident in their plan, start the project, and do not be afraid to reevaluate the choices already made as the project goes on.

RubyConf 2019 Talks – Day 2

Here’s a continuation of the previous post covering day 1, this one instead on the talks I attended for Day 2 of RubyConf 2019! Headings are linked to a video of the talk.

Injecting Dependencies For Fun and Profit

Chris Hoffman discussed the basics and the benefits of dependency injection, mentioning that it’s an alternative to mocking when testing. The benefit of dependency injection is that all of the classes using dependency injection list their dependencies explicitly in the initializer. This benefits people who go to read that code later, especially new devs to the team, since all of a classes dependencies are centralized in one location. It also makes testing classes in isolation easier since test doubles of the dependencies can be passed into the classes initializer, compared to the implicit method of mocking objects which can lead to dependencies being forgotten deep in the class.

One of the interesting patterns that Chris’ company adopted (and that I don’t necessarily agree with) to manage dependencies with dependency injection in their codebase is to have a dependency god object. This object is initialized at the start of the program and contains a reference to each dependency in their system. This dependency object is then passed by reference into each classes initializer. When a class needs a dependency it refers to the dependency in the dependency god object. This appears to be a purely functional way of using dependency injection compared to the more popular solution of using globally accessible dependency objects. dry-rb‘s auto_inject is a common dependency injection library which uses the globally accessible dependency pattern.

Overall, dependency injection is a great pattern for scaling medium to large codebases and making testing components simpler.

The Fewer the Concepts, the Better the Code

David Copeland presented the idea of programming with fewer concepts for better code comprehension between many developers. This talk was a bit of a shock since it goes against many of my ideals, but I fully enjoyed challenging my beliefs on the subject. For context, David’s team was just himself and a lead with a non-computer-science background at a small company. When David’s code was reviewed by his lead, the code was critiqued to be simpler and easier to understand. Over time David figured out that using more generic programming language concepts, such as for, if, return, etc. common to most procedural programming languages was what his lead was pushing him towards.

The talk then went into an example of some code which a Rubyist would have written with each, map, implicit returns, etc. contains many more concepts that a developer would have to know about compared to the same code written with much fewer concepts. An example of the benefit of writing code to use these generic programming language concepts is that learning to use new programming languages can be much simpler since they all generally have the same shared concepts. Onboarding new developers onto the team can be much faster if the dev only has to understand a small subset of programming language features. The Go programming language was compared to this practice as it has a smaller number of concepts than other programming languages.

At the end of the talk I asked the question about whether this style of programming may outweigh benefits by making it easier to introduce more bugs. Using functional programming language features such as the Enumerable collection of functions in Ruby can make code much easier to reason about. David agreed that more bugs are definitely a possibility, but he doesn’t have anecdotal evidence from his team.

Disk is Fast, Memory is Slow. Forget all you Think you Know

Another controversial talk I wanted to challenge my beliefs with, this time challenging the principle of memory not always being faster than disk. Daniel Magliola presented this conundrum in the form of a improvement he was attempting to make. The improvement was making metrics available for his cluster of forking Unicorn processes. When using Prometheus to collect metrics from apps, it queries each app at a specific endpoint to read in the metrics and their values. The problem with forking web servers is that when the request comes in to return all of the metrics, the request is dispatched to one of the Unicorn processes, only returning that process’ metrics, not the group of forked Unicorn processes as it should.

Daniel went down the rabbit hole on this GitHub issue looking for performant ways to support metrics collection for forking webservers. With the goal of keeping the recording of metrics as close to 1 microsecond as possible, some solutions that were investigated involved storing metrics in Redis, the Ruby class PStore which transactionally stores a hash to a file, and tenderlove/mmap library to share a memory mapped hash to each process. Unfortunately none of the potential solutions could beat 1 microsecond.

The solution Daniel discovered, and expertly discussed throughout his talk was using plain old files and file locks. This solution ended up only taking ~6 microseconds per metric write and was much more reliable and simpler than dealing with mmap’ed memory, or more running infrastructure. The title of the talk was misleading, and was touched on near the end of the talk as this file-based solution benefitted from operating system optimizations made to cache writes in main memory and disk caches. According to the program the file was updated successfully to disk, with proper locking to prevent multiple writers tripping over each other, but this was all possible by the performant abstraction our modern operating systems provide us with.

Digging up Code Graves in Ruby

Noah Matisoff went into how code graves, aka dead code comes about. Oftentimes developers can be modifying existing parts of code and stop calling other pieces of code, either in the same or different file. That code may still have tests, so test code coverage metrics can’t really help here. Feature flags, where 100% of users are going through one code path and not the other are also prime candidates for code that doesn’t need to exist.

Code coverage tools can be run in production, or in development and help give a good idea of what parts of code are never reached. Static code analysis tools can also help determine if code isn’t referenced anywhere, but it is a hard problem to solve with Ruby since the language isn’t typed and is quite dynamic. Another solution to help keep dead code out of codebases was to add todos to the codebase. Todos can be setup to remind developers to remove bits of code from the codebase or perform other actions. Some automations were shown to make todos more actionable.

RubyConf 2019 Talks – Day 1

I attended RubyConf this year in Nashville, Tennessee with a few of my teammates from Shopify. What a great city and a great first time attending RubyConf!

I took notes on many of the talks I attended and here are the summaries for the first of the three days. Day 2 is available here. Headings that have links go to a video of the talk.

Matz Keynote – Ruby Progress Report

Matz started off the conference with his talk on the upcoming Ruby 3, talking about some upcoming features with it, and the timeline. Ruby 3 will absolutely be released at the end of 2020, removing half-baked features if necessary to keep it on track. This probably also means that if the 3×3 performance goals aren’t fully met, then it’ll still be shipped. He spent some time on talking about being a Rubyist, as the majority of attendees were new to RubyConf, encouraging people to have discussions and contribute to the future of Ruby.

Matz went into some of the new features going into Ruby 2.7 and Ruby 3, and some of the features or experiments being removed. Some of the biggest hype was around the addition of pattern matching, the just in time compiler (JIT), emojis (though Matz didn’t think so), type checking, static analysis, and an improved concurrency model via guilds (think Javascript workers) and fibers. Some features or experiments that were removed were the .: (shorthand for Object#method), the pipeline operator, deprecating automatic conversion of hash to keyword arguments. Some attendees were vocal about getting more rationale about removing these features, and Matz was more than accommodating to explain in more detail.

No Return: Beyond Transactions in Code and Life

Avdi Grimm’s talk focused on discussing the unlifelike constraints that are imposed on users when performing things online. For example, filling out a survey or form online may result in the user losing their progress if they exit their browser. In real life this doesn’t happen, so why should we constrain these transactions so much? Avdi recommends that when building out these processes, these transactions, that we should instead think of it as a narritive, one stream of information sharing that only requires the user to complete a step when it’s really necessary. Avdi related this to our code by suggesting a few concepts that can make our programs more narrative-like such as embracing state and history of data by utilizing event-sourcing/storming and temporal modelling, failing forwards in code by treating exceptions as data and expecting failures, and interdependence in code by using back pressure, and circuit breakers.

Investigative Metaprogramming

Betsy Haibel talked about an effective way of figuring out a bug during a potentially painful upgrade of their Rails app to 6.0. Through the use of metaprogramming, she was able to fix a frozen hash modification bug that would have otherwise been quite difficult to debug. She accomplished this feat by monkey patching the Hash#freeze method, saving a backtrace whenever it is called. Then in the Hash#[]= method, rescue any runtime exceptions that occur and start a debugger session. This helped her narrow down exactly where the hash was frozen earlier on in the code.

Besty then went into detail on what metaprogramming is, and how it differs from language to language. Java, for example has distinct loadtime and runtime phases when the application is starting up. Ruby, on the other hand is both loading classes and executing code at the same time since it’s all performed together during runtime.

Lastly, the talk provided a pattern for using metaprogramming to investigate bugs or other problems in code. Through reflection, recording, and reviewing, the same pattern can be applied to help debug even the most complex code. The reflection step makes up determining what part of the code early on leads to the program failing. The moment that it occurs can be found by inspecting the backtrace at that point in time. Next is the recording step where we want to patch the code that we’ve identified from the reflection step to save the backtrace. This can be done either by saving the caller to an instance variable, class variable, logging. To get a foothold into the code, the patching can be accomplished by using Module#prepend or even the TracePoint library. Lastly, reviewing is the step in which we observe an event in the system (eg. an Exception) and either pause the world or log some info for further reading. An example of this would be to put in a breakpoint or debugger statement, optionally making it a conditional breakpoint to help filter through the many occurrences.

Ruby Ate My DSL!

Daniel Azuma presented about what DSLs (Domain Specific Languages) are, the benefits of them, and how they work. One of the biggest takeaways from this talk was that DSLs are more like Domain Specific Ruby as we’re not building our own language, instead the user of these DSLs should fully expect to be able to use Ruby while using DSLs.

Daniel also went on to mention how to build your own DSL, mentioning a few gotchas as he went. One of those was that since instance_eval is used throughout implementing DSLs, that we should be aware of users clobbering existing instance variables and methods. One solution is to have a naming convention for the DSLs internal instance variables and private methods (eg. prefixing with underscore characters). Alternatively, another way of preventing this clobbering from going on is to separate the DSL objects from the implementation which operates on those objects. This then has the effect that the user of the DSL has the minimum surface area needed to set the DSL up, removing the possibility of overwriting instance variables or methods the internal DSL needs to run.

Design DSLs which look and behave like classes. Specifically, whenever blocks are used, have them map to an instance of a class. RSpec is a great example of this where describe and it calls are blocks which create instances of classes. The it call creates instances that belong to the describe instance. Where things get more interesting and lifelike is if helper methods and instance variables defined higher up in a DSL can be used further down in the DSL. This is the concept of lexical scoping.

Lastly, constants are a pain to work with in Ruby. They don’t behave as expected when using blocks and evals. Some DSLs provide alternatives to constants, for example RSpec’s let.

mruby/c: Running on Less Than 64KB RAM Microcontroller

Hitoshi HASUMI presented mruby/c, an mruby implementation focused on very resource constrained devices. Where mruby focuses on devices with 400k of memory, mruby/c is for devices with 40k of memory. Devices with this small amount of memory can be microcontrollers which are cheap to run and offer many benefits over devices which run operating systems. Some benefits are instantaneous startup and being more secure.

Hitoshi focused his talk on the work he performed building out IoT devices to monitor temperatures of ingredients at a sake brewery in Japan. These devices had a way for workers to measure temperatures, display the reading, as well as send that reading back to a server for further processing. Hitoshi made it clear that there are many different thing that could go wrong in the intense environment of a brewery. High temperatures, hardware failure, resource constraints, etc.

The latter half of the talk was focused on how mruby/c works and how to use it. mruby/c uses the same bytecode as mruby, but removes a few features that regular Ruby developers are used to having, namely: modules and the stdlib. mruby/c compiles down to C files and provides it’s own realtime operating system. Hitoshi finishes the talk with plugging a number of libraries and tools that he’s developed to help with debugging, testing, and generating code. Those being mrubyc-debugger, mrubyc-test, and mrubyc-utils, respectively.

Statistically Optimal API Timeouts

Daniel Ackerman discussed the widespread use of APIs and how timeouts for those remote requests are not being configured efficiently. He introduces the problem that timeouts should be optimized for the best user experience – the fastest response. Given a slow responding API request, we should timeout if we have high confidence that the request is taking too long. He prefixed the rest of his talk explaining that setting the timeout to the 95th percentile is a quick but accurate estimate.

Since APIs are all different, Daniel presents a mathematical proof of determining statistically optimal API request timeouts. By analyzing a histogram of the API response times, we can determine the optimal timeout that balances user experience with timing out requests. Slow API requests often mean that the service is under heavy load or not responding.

The Ultimate Guide to Ruby Timeouts was mentioned as a go-to source for configuring timeouts and knowing which exceptions are raised for many commonly used libraries. Definitely a useful resource. Daniel finished his talk with a plug to his gem rb_maxima, a library which makes it easy to use the Maxima algebraic system from Ruby.

Collective Problem Solving in Software

Jessica Kerr talked about the idea of cameratas – the concept of a group of people who discuss and influence the trends of a certain area. More formally, camerata came from the Florentine Camerata, a group of renaissance musicians and artists gathered in Florence, Italy who helped develop the genre of opera. Their work was revolutionary at the time.

Jessica then related it to the great ideas that have come out of ThoughtWorks, a London-based consulting company. Their incredible contributions over the years have included the concepts of Agile, CI, CD, and DevOps to name a few, have influenced the entire software industry and has set the bar higher.

In general, great teams make great people. Software teams are special in that they consist of the connections between the people in the team as well as the tools that the team uses. Jessica relates this to a big socio-technical system, introducing the term symmathesy to capture the idea that teams and their tools learn from each other. No one person has a full understanding of the systems they work on, therefore the more symmathesy going on in the team, the better the team and system is. This is similar to the concept of senior developers being able to understand the bigger picture when it comes to teams, tools, and people compared to new developers usually concerned about their small bit of code.

The talk was closed by encouraging dev teams to incentivize putting the team first compared to the individual, grow teams by increasing the flow of information sharing and back and forth with their tools. Lastly, great developers are symmathesized.


Summaries of Day 2’s talks are available here

Twenty Five

My friends half-jokingly call it my quarter-century birthday. This year I’m a day late, and according to previous posts, sick again at this same time of year surprisingly. It may have been from the surprise birthday party my friends threw. Oh well, lets get on with the show.

It’s always hard to remember back to what October of last year was like, especially after a year like this. Around November of 2018 I had a wonderful surprise promotion: becoming my own team’s manager. With that came a whole new challenge of understanding people instead of just software. I was thrown in the deep end one day and had to figure out what managing people was all about. From talking with colleagues to reading books, I performed a lot of research over the past year to understand what it means to manage people, especially as a manager of a development team.

Besides career-related achievements, I also had the great opportunity to vacation in Mexico and New York with a bunch of my close buddies, and to partake in a few extra ski trips over the winter to Mont Tremblant and Camp Fortune. I surprised myself with my skiing skills – I must not have gone very recently. Over the summer a number of visits to the cottage was made, one of the times with a bunch of good friends.

Something I wasn’t expecting to have happened at all was to get my SSI Open Water Diving certification. This allows me to go scuba diving to maximum depths of 60 ft. A friend of mine suggested the idea to a few of us and we all had nothing to lose, so why not! Five pool dive sessions, three classroom sessions, and four dives at Morrison’s Quarry over a weekend gave the three of us the ability to travel anywhere around the world and go diving. It was a great learning experience as we had excellent instructors and one on one training as luck would have it. The three of us will be planning a diving trip in the new year!

This was the year where more weddings made it my way. One of them was for my longtime cottage neighbour who is around my age, and another wedding was for my Aunt. Both were vastly different weddings, but both were very enjoyable.

In late fall and early spring there’s a large period of time where bad weather holds me back from going running and cycling in the summertime, or skating on the canal in the wintertime. To keep myself active I bought myself an indoor trainer for my road bike. I was able to train indoors a few times a week pretty consistently over the winter. Keeping this training up for all of the cold months allowed me to jump on my bike in the spring and not miss a beat.

Here’s a few interesting stats of mine from the past year:

  • 57 hours of running/cycling/training, 1220 km total
  • 14 articles for this blog written, 7 published
  • 7 books read – the either being In The Plex, or the Elon Musk biography
  • 1932 GitHub contributions from work and personal projects

🍻 to another year!