Josh Stella

Co-founder & Chief Technology Officer

A Brief History

Josh Stella is Co-founder and CTO of Fugue, the cloud infrastructure automation and security company. Fugue identifies security and compliance violations in cloud infrastructure and ensures they are never repeated.

Previously, Josh was a Principal Solutions Architect at Amazon Web Services, where he supported customers in the area of national security. He has served as CTO for a technology startup and in numerous other IT leadership and technical roles over the past 25 years.

From Our Blog

Articles By Josh Stella

  • Two Years With Emacs as a CEO (and now CTO)

    Two years ago, I wrote a blog post that got some notice, which surprised me. It was a piece about going back to Emacs as my primary content creation tool, first as a CEO, and now as a CTO. A brief recap is that I spent most of my career as a programmer and a software architect, and preferred Emacs as my code editor for much of that time. Reconsidering Emacs was an experiment that I was excited about, but wasn't sure how it would work out. On the Internet, the post was met with roughly equal parts disdain and appreciation, but tens of thousands of people read it, so it seems that I touched on something interesting. Some of the more challenging and funny posts on Reddit and HackerNews predicted that I'd have hands shaped like claws or that I'd have lost my eyesight because I use white backgrounds. I'm pleased to report that no dire consequences resulted, and in fact my wrists are thanking me for the decision. Some folks worried that using Emacs would be a cognitive drain on a CEO. Having taken Fugue from an idea in my backyard to a powerful product with great and enthusiastic customers, I find Emacs to be a solace from things that are actually difficult. I still use white backgrounds.   Recently, the post was rediscovered and posted to HackerNews. I got a number of requests to follow up with a post on how things have gone since, so this is that report. In this post, I will also focus on why Emacs and functional programming are highly relevant now; and how Emacs works with our product, Fugue, that uses functional programming to automate cloud computing. I received a lot of feedback that the level of detail and color commentary were useful, so this post is also fairly verbose and I do spend some effort on explaining my thinking. I've recently moved from the CEO to CTO role here at Fugue, but the content of this post reflects the work I have been doing as CEO. I expect to do more work in code with Emacs going forward, so I have some yak shaving ahead. As always, YMMV, caveat emptor, etc.   It worked out better than I suspected it would   My time is filled with nearly constant communication outside of and within the company. Communication is how things get done in the world, and the enemy of reflection and real contemplation of difficult or complex problems. The rarest commodity for me as a startup CEO is time to focus without distraction. Emacs is particularly good for this, once you've invested the time to learn a handful of commands. Other applications call out to be noticed, but a well configured Emacs gets out of the way both visually and mentally. It doesn't change unless you want it to, and there is no cleaner interface than a blank screen and beautiful typography. In my world of constant interruption, this simplicity allows me to focus solely on what I am thinking rather than the computer. The best programs provide access to the computer without demanding attention.   A few folks pointed out that the original post was as much a criticism of modern computer interfaces as a recommendation of Emacs. I agree and disagree. Modern interfaces, and particularly their application-centric approach (vs. content-centric), are not user focused or task oriented. Emacs avoids this fallacy, which is part of why I like it so much, but it brings other merits to the table as well. Emacs is a portal into the power of the computer itself, and that is a rabbit hole worth descending. Its idioms are paths to discovering and creating your own, and that for me is the definition of creativity. One of the sad things about modern computing is that it is largely made up of black boxes with shiny interfaces that provide momentary gratification rather than real satisfaction. This makes us into consumers rather than creators/makers of technology. I don't care who you are or what your background is; you can understand your computer, and you can make things with it. It's fun, satisfying, and not as hard as you think to get started!   We often underappreciate the effects of our environments on our psychology. Emacs imparts a feeling of calm and freedom, rather than of urgency, annoyance, or excitement - the latter of which are enemies of thought and contemplation. I like things that last, get out of the way, and provide insight when I do take the time to pay attention to them. Emacs meets all these criteria for me. I use Emacs every day for content creation, and I'm very pleased with how little I think about it. Emacs does have a learning curve, but it's no steeper than a bicycle, and has a similar payoff in that once you are through it, you don't have to think about it anymore, and it imparts a feeling of freedom that other tools don't. It's an elegant tool, from a more civilized age. I'm happy that we seem to be entering another civilized age in computing, and so Emacs is gaining in popularity.   I gave up on using Org-mode for schedules and to-do lists   I spent some words in the original post on using Org-mode for schedules. I gave up on using Org-mode for to dos and the like, as I have to coordinate many meetings and calls every day with dozens of people, and I cannot ask the rest of the world to adapt to my choice of tools, nor do I have the time to transcribe or automate moving things to Org. We are primarily a Mac shop, use Google Calendar etc., and the native Mac OS/iOS tools do a good job for collaboration. I also use a plain old pen for note-taking during meetings, as I find laptop/keyboard use in meetings to be rude and limiting to my ability to listen and think. Therefore I've largely abandoned the idea that Emacs/org can help me with my schedule or organizing my life. Org-mode is great for lots of other things too though, and is my go-to for writing documents, including this one. In other words, I use it largely in ways the author didn't intend, and it's great at them. I hope someone says the same of our work at Fugue someday.   Emacs use has spread at Fugue   I started the original post with an admonition that you may love Emacs, but will probably hate it. I was therefore a little concerned when the documentation team at Fugue picked it as their standard tool, as I thought perhaps they were influenced by my appreciation for it. A couple years later, I'm pretty sure that it was a good call for them. The leader of the team at the time was a very bright programmer, but the two writers we hired to make the Fugue documentation had less technical backgrounds. I figured that if it was a case of a manager imposing the wrong tool, I'd hear about it and it would resolve itself, as Fugue has an anti-authoritarian culture where people are unafraid to call bullshit on anything or anyone, including me. The original manager left Fugue last year, but the docs team now has a slick, integrated CI/CD toolchain for docs.fugue.co, and they've become enthusiastic Emacs users. There is a learning curve for Emacs, but it's not that tall even if it is steep, and climbing it has real benefits in productivity and general happiness. It was also a reminder that liberal arts focused people are every bit as smart and capable with technology as programmers, and perhaps less prone to technology religions and tribalism.   My wrists are thanking me   I've been spending 12 hours a day or so at a computer since the mid-eighties, and it has taken a toll on my wrists (as well as my back, for which I unreservedly recommend the Tag Capisco chair). The combination of Emacs and an ergonomic keyboard has made the RSI wrist issues go away to the point that I haven't thought about it in over a year. Prior to that, I was having daily pain, particularly in my right wrist, and if you've had this issue, you know it can be very distracting and worrying. A few folks asked about keyboards and mice, so if you're interested I'm currently using a keyboard.io though I've mainly used a Truly Ergonomic keyboard over the last couple years. I'm a few weeks into using the keyboard.io, and I absolutely love it. The shaped key caps are amazing for knowing where you are without looking, and the thumb keys seem obvious in retrospect, particularly for Emacs, where Control and Meta are your constant companions. No more using the pinkie for highly repetitive tasks!   The amount of mousing I do is much lower than when using Office and IDEs, and that has helped a lot, but I do still need a mouse. I've been using the rather dated looking but highly functional and ergonomic Clearly Superior trackball, which lives up to its name.   Specific tools aside, the main point is that a great keyboard combined with mouse avoidance has proved very effective at reducing wear and tear on my body. Emacs is central to this because I don't have to mouse around menus to get things done, and the navigation keys are right under my fingers. I'm pretty convinced now that hand movement away from the standard typing position causes a lot of tendon stress for me. YMMV, I'm not a doctor, etc.   I haven't done much to my config...   Some predicted that I'd spend a lot of time yak shaving my configuration. I wondered if they were right, so I paid attention. Not only have I left my config largely alone, paying attention to the issue has made me realize just how much the other tools I use demand my attention and time. Emacs is easily the lowest maintenance piece of software I use. Mac OS and Windows are constantly demanding that I update them, but that's far less intrusive than Adobe Suite and Office's update intrusions in my world. I do occasionally update my Emacs, but it still works the same way, so it's largely a near zero cost operation for me, and one I can choose to do when I please.   I'm sorry to disappoint, as a number of folks wanted to know what I've done to keep up with a renewed Emacs community and its output, but I've only added a few things to my config over the last two years. I consider this a success, as Emacs is a tool, not a hobby for me. That said, I'd love to hear about new things if you want to share.   ...Except for controlling the cloud   We have a lot of Emacs fans at Fugue, so we've had a Ludwig-mode for a while now. Ludwig is our declarative, functional DSL for automating cloud infrastructure and services. Recently, Alex Schoof took some flight and evening hours to build fugue-mode, which acts as an Emacs console over the Fugue CLI. If you aren't familiar with Fugue, we make a cloud automation and governance tool that leverages functional programming to give users a great experience of interacting with cloud APIs. Well, it does a lot more than that, but it does that too. Fugue-mode is cool for a number of reasons. It allows me to have a buffer that is constantly reporting on the status of my cloud infrastructure, and since I often modify that infrastructure, I can quickly see the effects of my coding. Fugue organizes cloud workloads into processes, and Fugue-mode is a lot like top for cloud workloads. It also allows me to perform operations like creating new infrastructure or deleting stuff that isn't needed anymore, without much typing. Fugue-mode is a prototype, but it's pretty handy and I now use it regularly.     Modes and monitors   I have added a few modes and integrations, but not really for work/CEO functions. I've been hacking around in Haskell and Scheme on the weekends for fun, so I've added haskell-mode and geiser. Emacs is great for languages that have a REPL, as you can divide up your screen into different "windows" that are running different modes, including REPLs or shells. Geiser is great for Scheme, and if you've not done so, working through SICP is a joy and possibly a revelation in an age that has lots of examples of cargo cult programming. Install MIT Scheme and geiser and you've got something that feels a bit like the Symbolics environments of lore.   This brings up another topic I didn't cover in the 2015 post: screen management. I like to use a single portrait mode monitor for writing, and I have this configuration at my home and at my primary office. For programming or mixed use, I like the new ultra-wide monitors that we provide to all Fuguers. For these, I prefer to divide my screen into three columns, with the center having my main editing buffer, the left side having a shell and a fugue-mode buffer divided horizontally, and the right having either a documentation buffer or another editing buffer or two. This is easily done by first using 'Ctl-x 3' twice, then 'Ctl-x =' to make the windows equal in width. This will give you three equal columns that you can further subdivide as you like with 'Ctl-x 2' for horizontal divisions. Here's a screenshot of what this looks like.     This will be my last CEO/Emacs post...   The first reason for this is that I'm now the CTO of Fugue, but also because there are so many topics I'm looking forward to blogging about and now I should have time to do so. I'm planning on doing some deeper dive posts on topics like functional programming, type safety for infrastructure-as-code, and as we roll out some awesome new Fugue capabilities, some posts on what is achievable on the cloud using Fugue.   Related Posts Fugue Welcomes Phillip Merrick, Our New CEO July 18th, 2018   Securing AWS CloudFormation Stacks with Fugue June 28th, 2018   A CEO's Guide to Emacs November 11th, 2015

    Read More
  • Fugue Welcomes Phillip Merrick, Our New CEO

    In late November of 2017, I informed Fugue's Board that I intended to lead a search for a new CEO. We had a substantial amount of money on the balance sheet, some really impressive customers, a solid product, and a highly motivated team - many of the things needed to attract a world class CEO. My passion has always been for technology and team building, and it's been an amazing 4 years at the helm through the R&D and engineering phases of the company and well into the go-to-market execution phase, but I've known since founding Fugue that someday I'd look for a partner to fulfill Fugue's potential, and the time is right. Growing Fugue is now about execution in the market, building out great sales and marketing functions, and scaling the business. We've put together great teams to execute, but we now need an experienced and passionate CEO to lead the business into what it can and will be. I will remain Chairman of the board, and CTO.   We interviewed many candidates for the position, but Phillip Merrick really stood out to me and the others on the search committee for many reasons. I think you need a Founder mentality in the CEO, whether they were there for the initial creation of the company or not. Phillip founded one of the most successful startups in enterprise infrastructure software in WebMethods and took it all the way to IPO. It was clear in our conversations that he and I could work together as a partnership to really change how things are done on the cloud and realize the huge opportunity we have.   I've never been more excited about the future for Fugue!   Josh Stella,  Founder & CTO     

    Read More
  • The Next Cloud Outage is Coming—What You Can Do To Survive

    This article was first published in DZone's Cloud Zone on April 3, 2017.   The repercussions of recent cloud outages—AWS’s S3 crash and Azure’s Active Directory cascading failure—linger in IT departments and manifest in revenue loss. But, the bigger story is that the next outage is around the corner—unpredictable, coming to get us on a random Tuesday. Whether businesses are using cloud providers, on-premise data centers, or hybrid setups to host web services and backends, infrastructure failures are a fact of life and have to be on our radars as a matter of routine. This makes architecting for failure and for the future, from the start, among the most pressing imperatives for business IT departments.   The next five years will see the rise and democratization of centralized control systems for cloud ops with fault tolerance architected into the very fabric of those systems. Configuration management is being reinvented and taken to entirely new levels of automated action, where the machines take responsibility for failure and do the right thing as part of their continuous tasking. The cloud's scalability, elasticity, distributed resources, and potential cost savings increasingly make it the wiser, preferred choice for enterprises. Unlike on-premise data centers, with cloud, the pieces are all there to help us withstand the storm of outages and their fallout. The challenge is to figure out how to stack, manage, and tune those pieces to automate resilience—and to do that as the pieces change over time.   Be Honest About the Weakest Links   Your architecting analysis begins with ruthless honesty about what causes systems fragility. Knowing where the potential to get burned is and not taking shortcuts requires a disciplined, cynical, self-critical, transparent architecting and engineering ethic. Ask: have I really solved for this? If your transaction boundaries are fuzzy, if you’re not clear as to whether an interface is really idempotent, if folks try to sneak a little bit of state into what need to be stateless operations, those all constitute fundamental cracks in the infrastructure that can manifest in an outage. They have to be accounted for upfront, not as afterthoughts. There has to be honest reconciliation of vulnerabilities.   In fact, whenever you’re building a system out of services, the reliability of your system at most is not greater than the lowest reliability threshold of all the services you’re using combined. Come again? In order to work well, maybe your business application requires ten services, some of which are cloud provider services, some of which are homegrown or adapted by your team. Think about the measured percentage of uptime each service has and the percentage of failure for each. It isn’t just the weakest individual link in the chain of services you have to worry about. It’s the combined weaknesses of all the links in the chain. This constitutes the overall fragility of the system.   Is Disaster Recovery an Antiquated Notion?   Not yet. But, maybe it should be. It’s tough to stop thinking in terms of catastrophic recovery plans and reactionary behavior. But a core best practice is to design cloud systems with preemptive and pro-active “built-in” mechanisms that expect failure, while at the same time designing flexible components, which anticipate the only real certainty about the future—change. Having adaptive, automated fail-safes that are a fundamental part of the way you manage dynamic infrastructure is a very different notion than accepting that inevitable tech failures have to mean inevitable business catastrophes or revenue losses. The trend with next generation automated infrastructure is that a single command resuming processes will replace multiple, manual recovery steps potentially fraught with complications.   Separate Concerns and Determine Priorities   After honestly reflecting on system weaknesses and thinking anew about preventative disaster medicine, so to speak, you roll up your sleeves. Separating your application into the appropriate pieces and understanding the requirements for each piece are your architecting meat and potatoes in building a resilient system. Requirements boil down to prioritizing certain characteristics like reliability, durability, availability, accessibility, speed, security, and scalability for each of your components.   For example, persistent data need to be kept in a place featuring very high durability—durability level refers to whether or not your data get lost. Despite S3’s downtime, persistent data for thousands of businesses were never lost. The data weren’t available for a time and that’s bad. But, had S3’s durability, as opposed to availability, not been exceptional, the outcome would have been infinitely worse.   On the other hand, the durability of data in the transactional part of the system that’s facing the customer, perhaps governed by a different service, is not of critical importance. Availability matters much more; keeping user interaction open is paramount. If you’re doing a few searches, say, and the system goes down and you just lose some session data, that’s inconvenient, but not terrible. Your persistent data is still safe in another part of your system that’s shaped by its own, most appropriate, highest priority requirements.   Maybe there’s another component of your system where speed yielding cost efficiency is the highest priority consideration. You architect accordingly. Your prioritizations, sometimes in a cyclical manner, help you decide exactly which concerns to separate from others and to implement as different services or microservices—where exactly you draw the line.   Do No Harm: Architect a Responsive “Circuit Breaker”   Making assertions about architecting fault tolerance from the get-go, in the very fabric of a system, and demonstrating what that specifically looks like are two different things. It helps to examine part of a blueprint that shows abstract principles in action.   Fugue is an example of a dynamic cloud infrastructure orchestration and enforcement system that provides a concise, accurate view of an application’s cloud footprint at any given time and automatically returns infrastructure straying for any reason to desired declarations. It centralizes cloud management and handles failure elegantly with “built-in” preemptive mechanisms that use the cloud’s native advantages to keep applications safe. Its engine is the Fugue Conductor that builds out infrastructure, checks it every 30 seconds, and is empowered to make the right decisions about processing work. Human intervention almost always necessitates adrenalin-driven mistakes which can cause things to worsen during an outage. A Conductor’s programmed, automated, Hippocratic mantra is ‘first, do no harm.’   When the S3 outage hit and AWS API error responses in the 500s indicated that something aberrant was underway, that unsafe behavior was unfolding on the other side of a cloud API, Fugue Conductors, following their core design, automatically erred on the side of caution and stopped work immediately—popping the metaphorical circuit breaker. Each Conductor, comparing actual infrastructure state for an application against declarations in a single, concise file serving as the source of truth, could not verify consistency. When a Conductor consumes messaging that indicates its view of the world is inaccurate, it halts all change. As the core service disruption, like S3’s, calms and a Conductor’s view of the world is restored, it’s designed to take a single resume command and continue work without missing a beat. Since all change was halted, since its async messages were not marked as read, and since the design is idempotent and stateless, a Conductor can start operating on a message again with nothing corrupted.   Beyond that, a Fugue Conductor is further architected so that, at any moment, it can go offline or be completely destroyed—its instance terminated—and another one will come up, picking up where the other left off with no reliance on the previous conductor. A regular part of Fugue testing includes killing Conductors out of band and making sure new ones come up correctly, that weird, inconsistent states can be handled safely. Testing like this, really pounding away at the Conductors, is an extension of architecting principles and helps ensure reliability and availability.   If you try to bolt on ad hoc features in a system to accomplish these things, it’ll be a mess. Either a system is blueprinted to handle eventual consistency and API problems or it isn’t.   Humans Forget, Computers Remember   Cloud crashes, big and small, are as inevitable as human fallibility. A vast majority of outages come from human operator error and that’s okay. We know that. It’s expected. People get tired. Even the savviest developers make the occasional mistake. As noted by AWS, the S3 cascade resulted from an incorrectly typed command. So, the rational and effective response isn’t to lose faith in humanity, but to give it a break. Build or use a system that checks for correctness before it does anything else at all. That’s a hard thing to layer back on top if you don’t have it in place from the start. Make it the responsibility of the machine to tell the operator whether something is right or wrong, to the extent possible.   A computer can say decisively whether something is possible to do. It can also respond to boundaries specified ahead of time around whitelisted, correct behaviors. Here’s the list of stuff you may do within these constraints, computer! Here’s a finite set to process—as opposed to a blacklist of what not to do, which might be an infinite set. Any operations performed against the system first can be checked to make sure they’re allowed. So, even if you fat-fingered something, you can’t destroy 3000 servers because that’s not allowed. This is critically important because humans can be bad at remembering the details, especially in the heat of the moment, whereas computers don’t break a sweat.   Making the machine responsible in its DNA for managing failure quickly, gracefully, and safely is how a cloud crash becomes a non-event for businesses.   Register for our upcoming webinar, Taming the Wild West of Infrastructure As Code, with Josh and Chris Gardner, Senior Analyst at Forrester Research, live on Wednesday, April 26, 2017 at 2pm ET (11am PT).   Related Posts Diagnosing and Fixing Memory Leaks in Python March 7th, 2017   Why Write a Book? January 11th, 2017   Fugue Emerges to Solve Complexity of Operating at Scale in the Cloud August 9th, 2016   Architecting a Serverless Web Application in AWS May 9th, 2016

    Read More