About Us

We're Working On Some Of Cloud Computing's Toughest Challenges

Fugue automates cloud operations, eases public and private sector compliance, and simplifies lifecycle management of the AWS infrastructure service stack. To ensure customers can innovate faster, Fugue validates systems before they're built and continuously enforces them after. Fugue has eight patents granted and 16 pending. Gartner named Fugue a Cool Vendor in Cloud Computing 2017.

Our founders started Fugue in 2013 to face one of cloud computing’s most vexing challenges—the nitty-gritty, secure management of cloud infrastructure operations for businesses large and small. Since then, we’ve built a new kind of system for operating the cloud that automates infrastructure-as-code and policy-as-code from one source of truth. Our devoted team of more than 70 skilled engineers and talented creative professionals combines decades of experience with fresh vision and relentless quality control to deliver an innovative approach that works, while adhering to our shared cultural principles.

Our offices are primarily based in Frederick, MD, Washington, DC, and Silicon Valley, CA. Our team spans the globe, with members located in Atlanta, Miami, and Seattle and as far away as Zurich and Tokyo, all collaborating daily. We’re venture-backed, having raised $74MM to date, with prescient, supportive partners at New Enterprise Associates, Future Fund, The Maryland Venture Fund, and Core Capital. Meet our executive team below!

Our Leadership

Phillip Merrick
Chief Executive Officer
Phillip Merrick
Josh Stella
Co-founder & Chief Technology Officer
Josh Stella
Mary Alexander
Vice President of Sales
Mary Alexander
Gus Bessalel
Chief Financial Officer
Gus Bessalel
Nathan McCourtney
Vice President of Engineering
Nathan McCourtney
Richard Park
Vice President of Product
Richard Park
Tim Webb
Chief Strategy & Security Officer
Tim Webb
Heather Wiley
Chief People Officer
Heather Wiley
Andrew Wright
Co-founder & Vice President of Communications
Andrew Wright

Our Advisors

Frank Slootman
ServiceNow
Ben Fathi
Cloudflare & VMware
Dave Merkel
Expel & FireEye
Chad Fowler
Microsoft & Wunderlist
Joe Payne
Code42 & Eloqua
Amena Ali
VividCortex & Earth Networks

Our Investors

OUR OFFICES

Where Are We Located?

  • Location Details

    Fugue Offices

    San Jose, CA

  • Location Details

    Fugue Offices

    Washington, DC

  • Location Details

    Fugue Offices

    Miami, FL

  • Location Details

    Fugue Offices

    Atlanta, GA

  • Location Details

    Fugue Offices

    Baldwin City, KS

  • Location Details

    Fugue Headquarters

    Frederick, MD

  • Location Details

    Fugue Offices

    Manalapan, NJ

  • Location Details

    Fugue Offices

    Eugene, OR

  • Location Details

    Fugue Offices

    Seattle, WA

Career Opportunities

Join The Fugue Team

From Our Blog

Featured Articles

  • Python Mocking 101: Fake It Before You Make It

    Welcome to a guide to the basics of mocking in Python. It was borne out of my need to test some code that used a lot of network services and my experience with GoMock, which showed me how powerful mocking can be when done correctly (thanks, Tyler). I'll begin with a philosophical discussion about mocking because good mocking requires a different mindset than good development. Development is about making things, while mocking is about faking things. This may seem obvious, but the "faking it" aspect of mocking tests runs deep, and understanding this completely changes how one looks at testing. After that, we'll look into the mocking tools that Python provides, and then we'll finish up with a full example.   Mocking can be difficult to understand. When I'm testing code that I've written, I want to see whether the code does what it's supposed to do from end-to-end. I usually start thinking about a functional, integrated test, where I enter realistic input and get realistic output. I access every real system that my code uses to make sure the interactions between those systems are working properly, using real objects and real API calls. While these kinds of tests are essential to verify that complex systems are interworking well, they are not what we want from unit tests.   Unit tests are about testing the outermost layer of the code. Integration tests are necessary, but the automated unit tests we run should not reach that depth of systems interaction. This means that any API calls in the function we're testing can and should be mocked out. We should replace any nontrivial API call or object creation with a mock call or object. This allows us to avoid unnecessary resource usage, simplify the instantiation of our tests, and reduce their running time. Think of testing a function that accesses an external HTTP API. Rather than ensuring that a test server is available to send the correct responses, we can mock the HTTP library and replace all the HTTP calls with mock calls. This reduces test complexity and dependencies, and gives us precise control over what the HTTP library returns, which may be difficult to accomplish otherwise.   What do we mean by mocking?   The term mocking is thrown around a lot, but this document uses the following definition:   "The replacement of one or more function calls or objects with mock calls or objects"   A mock function call returns a predefined value immediately, without doing any work. A mock object's attributes and methods are similarly defined entirely in the test, without creating the real object or doing any work. The fact that the writer of the test can define the return values of each function call gives him or her a tremendous amount of power when testing, but it also means that s/he needs to do some foundational work to get everything set up properly.   In Python, mocking is accomplished through the unittest.mock module. The module contains a number of useful classes and functions, the most important of which are the patch function (as decorator and context manager) and the MagicMock class. Mocking in Python is largely accomplished through the use of these two powerful components.   What do we NOT mean by mocking?   Developers use a lot of "mock" objects or modules, which are fully functional local replacements for networked services and APIs. For example, the moto library is a mock boto library that captures all boto API calls and processes them locally. While these mocks allow developers to test external APIs locally, they still require the creation of real objects. This is not the kind of mocking covered in this document. This document is specifically about using MagicMock objects to fully manage the control flow of the function under test, which allows for easy testing of failures and exception handling.   How do we mock in Python?   Mocking in Python is done by using patch to hijack an API function or object creation call. When patch intercepts a call, it returns a MagicMock object by default. By setting properties on the MagicMock object, you can mock the API call to return any value you want or raise an Exception.   The overall procedure is as follows:   Write the test as if you were using real external APIs. In the function under test, determine which API calls need to be mocked out; this should be a small number. In the test function, patch the API calls. Set up the MagicMock object responses. Run your test.   If your test passes, you're done. If not, you might have an error in the function under test, or you might have set up your MagicMock response incorrectly. Next, we'll go into more detail about the tools that you use to create and configure mocks.   patch import unittest from unittest.mock import patch   patch can be used as a decorator to the test function, taking a string naming the function that will be patched as an argument. In order for patch to locate the function to be patched, it must be specified using its fully qualified name, which may not be what you expect. If a class is imported using a from module import ClassA statement, ClassA becomes part of the namespace of the module into which it is imported.   For example, if a class is imported in the module my_module.py as follows:   [in my_module.py] from module import ClassA   It must be patched as @patch(my_module.ClassA), rather than @patch(module.ClassA), due to the semantics of the from ... import ... statement, which imports classes and functions into the current namespace.   Typically patch is used to patch an external API call or any other time- or resource-intensive function call or object creation. You should only be patching a few callables per test. If you find yourself trying patch more than a handful of times, consider refactoring your test or the function you're testing.   Using the patch decorator will automatically send a positional argument to the function you're decorating (i.e., your test function). When patching multiple functions, the decorator closest to the function being decorated is called first, so it will create the first positional argument.   @patch('module.ClassB')@patch('module.functionA')def test_some_func(self, mock_A, mock_B): ...   By default, these arguments are instances of MagicMock, which is unittest.mock's default mocking object. You can define the behavior of the patched function by setting attributes on the returned MagicMock instance.   MagicMock   MagicMock objects provide a simple mocking interface that allows you to set the return value or other behavior of the function or object creation call that you patched. This allows you to fully define the behavior of the call and avoid creating real objects, which can be onerous. For example, if we're patching a call to requests.get, an HTTP library call, we can define a response to that call that will be returned when the API call is made in the function under test, rather than ensuring that a test server is available to return the desired response.   The two most important attributes of a MagicMock instance are return_value and side_effect, both of which allow us to define the return behavior of the patched call.   return_value   The return_value attribute on the MagicMock instance passed into your test function allows you to choose what the patched callable returns. In most cases, you'll want to return a mock version of what the callable would normally return. This can be JSON, an iterable, a value, an instance of the real response object, a MagicMock pretending to be the response object, or just about anything else. When patching objects, the patched call is the object creation call, so the return_value of the MagicMock should be a mock object, which could be another MagicMock. If the code you're testing is Pythonic and does duck typing rather than explicit typing, using a MagicMock as a response object can be convenient. Rather than going through the trouble of creating a real instance of a class, you can define arbitrary attribute key-value pairs in the MagicMock constructor and they will be automatically applied to the instance.   [in test_my_module]@patch('external_module.api_call')def test_some_func(self, mock_api_call): mock_api_call.return_value = MagicMock(status_code=200, response=json.dumps({'key':'value'})) my_module.some_func()[in my_module]import external_moduledef some_func(): response = external_module.api_call() #normally returns a Response object, but now returns a MagicMock#response == mock_api_call.return_value == MagicMock(status_code=200, response=json.dumps({'key':'value'}))   Note that the argument passed to test_some_func, i.e., mock_api_call, is a MagicMock and we are setting return_value to another MagicMock. When mocking, everything is a MagicMock.   Speccing a MagicMock   While a MagicMock’s flexibility is convenient for quickly mocking classes with complex requirements, it can also be a downside. By default, MagicMocks act like they have any attribute, even attributes that you don’t want them to have. In the example above, we return a MagicMock object instead of a Response object. However, say we had made a mistake in the patch call and patched a function that was supposed to return a Request object instead of a Response object. The MagicMock we return will still act like it has all of the attributes of the Request object, even though we meant for it to model a Response object. This can lead to confusing testing errors and incorrect test behavior.   The solution to this is to spec the MagicMock when creating it, using the spec keyword argument: MagicMock(spec=Response). This creates a MagicMock that will only allow access to attributes and methods that are in the class from which the MagicMock is specced. Attempting to access an attribute not in the originating object will raise an AttributeError, just like the real object would. A simple example is:   m = MagicMock()m.foo() #no error raised# Response objects have a status_code attributem = MagicMock(spec=Response, status_code=200, response=json.dumps({‘key’:’value’}))m.foo() #raises AttributeErrorm.status_code #no error raised   side_effect   Sometimes you'll want to test that your function correctly handles an exception, or that multiple calls of the function you're patching are handled correctly. You can do that using side_effect. Setting side_effect to an exception raises that exception immediately when the patched function is called.   Setting side_effect to an iterable will return the next item from the iterable each time the patched function is called. Setting side_effect to any other value will return that value.   [in test_my_module]@patch('external_module.api_call')def test_some_func(self, mock_api_call): mock_api_call.side_effect = SomeException() my_module.some_func()[in my_module]def some_func(): try: external_module.api_call() except SomeException: print(“SomeException caught!”) # this code is executed except SomeOtherException: print(“SomeOtherException caught!”) # not executed[in test_my_module]@patch('external_module.api_call')def test_some_func(self, mock_api_call): mock_api_call.side_effect = [0, 1] my_module.some_func()[in my_module]def some_func(): rv0 = external_module.api_call() # rv0 == 0 rv1 = external_module.api_call() # rv1 == 1   assert_called_with   assert_called_with asserts that the patched function was called with the arguments specified as arguments to assert_called_with.   [inside some_func]someAPI.API_call(foo, bar='baz')[inside test_some_func]some_func()mock_api_call.assert_called_with(foo, bar='baz')   A full example   In this example, I'm testing a retry function on Client.update. This means that the API calls in update will be made twice, which is a great time to use MagicMock.side_effect.   The full code of the example is here:   import unittestfrom unittest.mock import patchclass TestClient(unittest.TestCase):def setUp(self): self.vars_client = VarsClient()@patch('pyvars.vars_client.VarsClient.get')@patch('requests.post')def test_update_retry_works_eventually(self, mock_post, mock_get): mock_get.side_effect = [ VarsResponse(), VarsResponse()] mock_post.side_effect = [ requests.ConnectionError('Test error'), MagicMock(status_code=200, headers={'content-type':"application/json"}, text=json.dumps({'status':True})) ] response = self.vars_client.update('test', '0') self.assertEqual(response, response)@patch('pyvars.vars_client.VarsClient.get')@patch('requests.post') def test_update_retry_works_eventually(self, mock_post, mock_get):   I'm patching two calls in the function under test (pyvars.vars_client.VarsClient.update), one to VarsClient.get and one to requests.post. Since I'm patching two calls, I get two arguments to my test function, which I've called mock_post and mock_get. These are both MagicMock objects. In their default state, they don't do much. We need to assign some response behaviors to them.   mock_get.side_effect = [ VarsResponse(), VarsResponse()]mock_post.side_effect = [ requests.ConnectionError('Test error'), MagicMock(status_code=200, headers={'content-type':"application/json"}, text=json.dumps({'status':True}))]   This tests to make sure a retry facility works eventually, so I'll be calling update multiple times, and making multiple calls to VarsClient.get and requests.post.   Here I set up the side_effects that I want. I want all the calls to VarsClient.get to work (returning an empty VarsResponse is fine for this test), the first call to requests.post to fail with an exception, and the second call to requests.post to work. This kind of fine-grained control over behavior is only possible through mocking.   response = self.vars_client.update('test', '0')self.assertEqual(response, response)   Once I've set up the side_effects, the rest of the test is straightforward. The behavior is: the first call to requests.post fails, so the retry facility wrapping VarsClient.update should catch the error, and everything should work the second time. This behavior can be further verified by checking the call history of mock_get and mock_post.   Conclusion   Using mock objects correctly goes against our intuition to make tests as real and thorough as possible, but doing so gives us the ability to write self-contained tests that run quickly, with no dependencies. It gives us the power to test exception handling and edge cases that would otherwise be impossible to test. Most importantly, it gives us the freedom to focus our test efforts on the functionality of our code, rather than our ability to set up a test environment. By concentrating on testing what’s important, we can improve test coverage and increase the reliability of our code, which is why we test in the first place.   Documentation Links https://docs.python.org/3/library/unittest.mock.html   And, check out fugue.co.   Related Posts Managing Secrets at Scale at Velocity EU January 12th, 2016 AWS Lambda and the Evolution of the Cloud February 1st, 2016 We're now Fugue and we raised $20M January 14th, 2016

    Read More
  • Diagnosing and Fixing Memory Leaks in Python

    Fugue uses Python extensively throughout the Conductor and in our support tools, due to its ease-of-use, extensive package library, and powerful language tools. One thing we've learned from building complex software for the cloud is that a language is only as good as its debugging and profiling tools. Logic errors, CPU spikes, and memory leaks are inevitable, but a good debugger, CPU profiler, and memory profiler can make finding these errors significantly easier and faster, letting our developers get back to creating Fugue’s dynamic cloud orchestration and enforcement system. Let’s look at a case in point.   In the fall, our metrics reported that a Python component of Fugue called the reflector was experiencing random restarts and instability after a few days of uptime. Looking at memory usage showed that the reflector's memory footprint increased monotonically and continuously, indicating a memory leak. tracemalloc, a powerful memory tracking tool in the Python standard library, made it possible to quickly diagnose and fix the leak. We discovered that the memory leak was related to our use of requests, a popular third-party Python HTTP library. Rewriting the component to use urllib from the Python standard library eliminated the memory leak. In this blog, we'll explore the details.   Metrics show the problem: Percentage of total system memory used by the reflector, using the requests library.   Memory Allocation in Python   In most scenarios, there's no need to understand memory management in Python beyond knowing that the interpreter manages memory for you. However, when writing large, complex Python programs with high stability requirements, it’s useful to peek behind the curtain to understand how to write code that interacts well with Python's memory management algorithms. Notably, Python uses reference counting and garbage collection to free memory blocks, and only frees memory to the system when certain internal requirements are met. A pure Python script will never have direct control over memory allocation in the interpreter. If direct control over memory allocation is desired, the interpreter's memory allocation can be bypassed by writing or using an extension. For example, numpy manages memory for large data arrays using its own memory allocator.   Fundamentally, Python is a garbage-collected language that uses reference counting. The interpreter automatically allocates memory for objects as they are created and tracks the number of references to those objects in a data structure associated with the object itself. This memory will be freed when the reference count for those objects reaches zero. In addition, garbage collection will detect cycles and remove objects that are only referenced in cycles. Every byte of memory allocated within the Python interpreter is able to be freed between these two mechanisms, but no claims can be made about memory allocated in extensions.   Python manages its own heap, separate from the system heap. Memory is allocated in the Python interpreter by different methods according to the type of the object to be created. Scalar types, such as integers and floats, use different memory allocation methods than composite types, such as lists, tuples, and dictionaries. In general, memory is allocated on the Python heap in fixed-size blocks, depending on the type. These blocks are organized into pools, which are further organized into arenas. Memory is pre-allocated using arenas, pools, and blocks, which are then used to store data as needed over the course of program’s execution. Since these blocks, pools, and arenas are kept in Python's own heap, freeing a memory block merely marks it as available for future use in the interpreter. Freeing memory in Python does not immediately free the memory at the system level. When an entire arena is marked as free, its memory is released by the Python interpreter and returned to the system. However, this may occur infrequently due to memory fragmentation.   Due to these abstractions, memory usage in Python often exhibits high-water-mark behavior, where peak memory usage determines the memory usage for the remainder of execution, regardless of whether that memory is actively being used. Furthermore, the relationship between memory being "freed" in code and being returned to the system is vague and difficult to predict. These behaviors make completely understanding the memory usage of complex Python programs notoriously difficult.   Memory Profiling Using tracemalloc   tracemalloc is a package included in the Python standard library (as of version 3.4). It provides detailed, block-level traces of memory allocation, including the full traceback to the line where the memory allocation occurred, and statistics for the overall memory behavior of a program. The documentation is available here and provides a good introduction to its capabilities. The original Python Enhancement Proposal (PEP) introducing it also has some insight on its design.   tracemalloc can be used to locate high-memory-usage areas of code in two ways:   looking at cumulative statistics on memory use to identify which object allocations are using the most memory, and tracing execution frames to identify where those objects are allocated in the code.   Module-level Memory Usage   We start by tracing the memory usage of the entire program, so we can identify, at a high level, which objects are using the most memory. This will hopefully provide us with enough insight to know where and how to look more deeply. The following wrapper starts tracing and prints statistics when Ctrl-C is hit: import tracemalloctracemalloc.start(10)try: run_reflector()except: snapshot = tracemalloc.take_snapshot() top_n(25, snapshot, trace_type='filename') tracemalloc.start(10) starts memory tracing, while saving 10 frames of traceback for each entry. The default is 1, but saving more traceback frames is useful if you plan on using tracebacks to locate memory leaks, which will be discussed later. tracemalloc.take_snapshot() takes a snapshot of currently allocated memory in the Python heap. It stores the number of allocated blocks, their size, and tracebacks to identify which lines of code allocated which blocks of memory. Once a snapshot is created, we can compute statistics on memory use, compare snapshots, or save them to analyze later. top_n is a helper function I wrote to pretty print the output from tracemalloc. Here, I ask for the top 25 memory allocations in the snapshot, grouped by filename. After running for a few minutes, the output looks like this: [ Top 25 with filename tracebacks ]197618 blocks 17.02311134338379 MB/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/collections/__init__.py:0: size=17.0 MiB, count=197618, average=90 B105364 blocks 11.34091567993164 MB<frozen importlib._bootstrap>:0: size=11.3 MiB, count=105364, average=113 B60339 blocks 9.233230590820312 MB/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/json/decoder.py:0: size=9455 KiB, count=60339, average=160 B... This shows the cumulative amount of memory allocated by the component over the entire runtime, grouped by filename. At this level of granularity, it's hard to make sense of the results. For instance, the first line shows us that 17 MB of collections objects are created, but this view doesn't provide enough detail for us to know which objects, or where they're being used. A different approach is needed to isolate the problem.   Understanding tracemalloc Output   tracemalloc shows the net memory usage at the time a memory snapshot is taken. When comparing two snapshots, it shows the net memory usage between the two snapshots. If memory is allocated and freed between snapshots, it won't be shown in the output. Therefore, if snapshots are created at the same point in a loop, any memory allocations visible in the differences between two snapshots are contributing to the long-term total amount of memory used, rather than being a temporary allocation made in the course of execution.   In the case of reference cycles that require garbage collection, uncollected cycles are recorded in the output, while collected cycles are not. Any blocks freed by the garbage collector in the time covered by a snapshot will be recorded as freed memory. Therefore, forcing garbage collection with gc.collect() before taking a snapshot will reduce noise in the output.   Per-Iteration Memory Usage   Since we're looking for a memory leak, it's useful to understand how the memory usage of our program changes over time. We can instrument the main loop of the component, to see how much memory is allocated in each iteration, by calling the following method from the main loop: def collect_stats(self): self.snapshots.append(tracemalloc.take_snapshot()) if len(self.snapshots) > 1: stats = self.snapshots[-1].filter_traces(filters).compare_to(self.snapshots[-2], 'filename') for stat in stats[:10]: print("{} new KiB {} total KiB {} new {} total memory blocks: ".format(stat.size_diff/1024, stat.size / 1024, stat.count_diff ,stat.count)) for line in stat.traceback.format(): print(line) This code takes a memory snapshot and saves it, then uses snapshot.compare_to(other_snapshot, group_by='filename') to compare the newest snapshot with the previous snapshot, with results grouped by filename. After a few iterations to warm up memory, the output looks like this: [ Top 5 with filename tracebacks ]190.7421875 new KiB 1356.5634765625 total KiB 1930 new 13574 total memory blocks: (1) File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/linecache.py", line 02.1328125 new KiB 12.375 total KiB 32 new 86 total memory blocks: (2) File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/tracemalloc.py", line 01.859375 new KiB 18.7001953125 total KiB 3 new 53 total memory blocks: (3) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 0-1.71875 new KiB 34.5224609375 total KiB -2 new 91 total memory blocks: File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 01.66015625 new KiB 61.662109375 total KiB 18 new 260 total memory blocks: File "/Users/mike/.pyenv/versions/3.4.2/lib/python3.4/urllib/parse.py", line 0 The linecache (1) and tracemalloc (2) allocations are part of the instrumentation, but we can also see some memory allocations made by the requests HTTP package (3) that warrant further investigation. Recall that tracemalloc tracks net memory usage, so these memory allocations are accumulating on each iteration. Although the individual allocations are small and don't jump out as problematic, the memory leak only becomes apparent over the course of a few days, so it's likely to be a case of small losses adding up.   Filtering Snapshots   Now that we have an idea of where to look, we can use tracemalloc's filtering capabilities to show only memory allocations related to the requests package: from tracemalloc import Filter filters = [Filter(inclusive=True, filename_pattern="*requests*")] filtered_stats = snapshot.filter_traces(filters).compare_to(old_snapshot.filter_traces(filters), 'traceback') for stat in stats[:10]: print("{} new KiB {} total KiB {} new {} total memory blocks: ".format(stat.size_diff/1024, stat.size / 1024, stat.count_diff ,stat.count)) for line in stat.traceback.format(): print(line) snapshot.filter_traces() takes a list of Filters to apply to the snapshot. Here, we create a Filter in inclusive mode, so it includes only traces that match the filename_pattern. When inclusive is False, the filter excludes traces that match the filename_pattern. The filename_pattern uses UNIX-style wildcards to match filenames in the traceback. In this example, the wildcards in "requests" match occurrences of "requests" in the middle of a path, such as "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py".   We then use compare_to() to compare the results to the previous snapshot. The filtered output is below: 48.7890625 new KiB 373.974609375 total KiB 4 new 1440 total memory blocks: (4) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/structures.py", line 01.46875 new KiB 16.2939453125 total KiB 2 new 49 total memory blocks: File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py", line 0 -1.4453125 new KiB 34.2802734375 total KiB -2 new 96 total memory blocks: (5) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py", line 0-0.859375 new KiB 31.8505859375 total KiB -1 new 85 total memory blocks: File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py", line 00.6484375 new KiB 20.8330078125 total KiB 1 new 56 total memory blocks: File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/packages/urllib3/connection.py", line 0 With the Filter in place, we can clearly see how requests is using memory. Line (4) shows that roughly 50 KiB of memory is lost in requests on each iteration of the main loop. Note that negative memory allocations, such as (5), are visible in this output. These allocations are freeing memory allocated in previous loop iterations.   Tracking Down Memory Allocations   To determine which uses of requests are leaking memory, we can take a detailed look at where problematic memory allocations occur by calling compare_to() with traceback instead of filename, while using a Filter to narrow down the output: stats = snapshot.filter_traces(filters).compare_to(old_snapshot.filter_traces(filters), 'traceback') This prints 10 frames of traceback (since we started tracing with tracemalloc.start(10)) for each entry in the output, a truncated example of which is below: 5 memory blocks: 4.4921875 KiB File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py", line 585 r = adapter.send(request, **kwargs) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests/sessions.py", line 475 resp = self.send(prep, **send_kwargs) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py", line 46 return session.request(method=method, url=url, **kwargs) File "/Users/mike/.pyenv/versions/venv/lib/python3.4/site-packages/requests_unixsocket/__init__.py", line 60 return request('post', url, data=data, json=json, **kwargs) The full traceback gives us the ability to trace backwards from memory allocations to the lines in our project code that generate them. In the case of this component, our uses of requests came from an internal storage library that used an HTTP API. Rewriting the library to use urllib directly eliminated the memory leak. Metrics indicate the problem is solved: Percentage of total system memory used by the reflector, after removing requests and switching to urllib.   Memory Profiling: Art or Science?   tracemalloc is a powerful tool for understanding the memory usage of Python programs. It helped us understand module-level memory usage, find out which objects are being allocated the most, and it demonstrated how the reflector’s memory usage changed on a per-iteration basis. It comes with useful filtering tools and gives us the ability to see the full traceback for any memory allocation. Despite all of its features, however, finding memory leaks in Python can still feel like more of an art than a science. Memory profilers give us the ability to see how memory is being used, but oftentimes it’s difficult to find the exact memory allocation that is causing problems. It’s up to us to synthesize the information we get from our tools into a conclusion about the memory behavior of the program, then make a decision about what actions to take from there.   We use virtually every available Python tool (test frameworks, cProfile, etc.) to make Fugue’s system reliable, performant, and easy to maintain. The broker and reflector both take advantage of Python's introspection to make judgments about dynamic calls to the AWS API, which allows us to focus on logic rather than coding exhaustive cases. Fugue leverages the strengths of Python where it makes sense in the system, which ultimately means more product stability and extensibility for end-users.   Related Posts Fugue Computing: Next Generation Infrastructure Automation Is Here August 9th, 2016   Testing Migration: Shifting from Manual to Automatic March 21st, 2016   Python Mocking 101: Fake It Before You Make It February 11th, 2016   Our Functional Future or: How I Learned to Stop Worrying and Love Haskell January 27th, 2016

    Read More
  • Using AWS KMS to manage secrets in your Infrastructure

    At Re:Invent 2014, AWS launched their new Key Management Service, or KMS. As its name implies, KMS is an AWS service that helps securely manage encryption keys in the cloud. Traditionally, keys have been managed in haphazard ways, from SCP-ing keys around your instances to baking them into machine images. The safe way to manage high-value keys has been to employ dedicated Hardware Security Modules (HSMs), either on-premise or with the AWS CloudHSM service. In either case, HSMs are expensive and hard to use.   The new KMS service provides HSM-style key management that is both inexpensive and easy to use via a web service API. First, we'll look at what KMS is and how you can use it to manage encryption keys. Then, we'll look at credstash, a simple system that uses KMS and DynamoDB to safely store, distribute, and manage credentials in the cloud.   What is KMS?   Basic functionality   Other than the excellent GenerateRandom API call (which you should check out for seeding your PRNGs), KMS is composed of a set of API operations for creating, managing, and using a relatively small set of encryption keys, called Customer Master Keys (here, "master keys"). There are a bunch of operations for managing grants and policy around who can use which keys for what operations, but the fundamental operations in KMS are CreateKey, Encrypt, and Decrypt. CreateKey will generate a key in the KMS service that will never leave the KMS service. Once you create a key in KMS, you can disable it, you can set permissions on who can use it, you can alias it, but you cannot export it. In order to use the keys for cryptography, you use the Encrypt and Decrypt API calls. This is the core security value proposition in KMS: no one can run off with the keys.   In fact, this is the same model that is used by expensive Hardware Security Modules (HSMs): you generate a key in the device; once it's generated, it never leaves. Instead, you send the data to encrypt/decrypt to the device and say "encrypt this blob with key foo," and the HSM returns the resulting ciphertext or plaintext.   In the case of KMS, this is done using the Encrypt API operation. You pass the service the handle of the KMS master key that you want to use for encryption, along with up to 4KB of data to encrypt. You get back a blob containing ciphertext and a key reference, which can later be passed to KMS's Decrypt operation, which will return the plaintext. Again, there are many useful operations for managing key audit, policy, and grants. But the service really boils down to creating a key, then using it to encrypt and decrypt 4KB blobs of data.   4KB?! How am I supposed to use KMS for encryption?   There are lots of things that we might want to encrypt that are larger than 4KB. In fact, the new Relational Database Service (RDS) Encryption uses KMS to manage the keys used to encrypt entire databases! In order to encrypt arbitrary data and still keep our keys safe in KMS, we use a technique called Envelope Encryption.   Here's how it works:   Locally generate a random encryption key (or use the excellent GenerateDataKey operation). We will call this your data key. Use the data key to encrypt your data. Use KMS to encrypt your data key with one of your master keys. This is called key wrapping. The encrypted data key is now a "wrapped key." Discard the plaintext datakey.   You can now safely store the encrypted data and the wrapped key. You can even store them next to each other in a database, on your filesystem, etc. Note: this is not "encraption" (the practice of storing a key next to the data that it protects), because without access to the master key that wraps the data key, the data key is useless. It is an opaque blob.   To decrypt data, you simply:   Fetch the wrapped data key and the encrypted data. Use KMS to decrypt the wrapped data key. Use the decrypted data key to decrypt the encrypted data.   It should now be obvious why KMS refers to the its keys as "Master Keys"–they are not used to encrypt data, but are instead used to encrypt keys that encrypt data. A single master key can protect many keys, and, in fact, every independent datum in your system can have its own unique data key.   Now that we understand how to use KMS to encrypt things, let's look at a practical example using KMS and DynamoDB to manage and distribute credentials to systems.   Credstash: using KMS and DynamoDB to manage credentials   Software systems often need access to some shared credential. For example, your web application needs access to a database password or an API key for a third party service. CredStash is a very simple, easy-to-use credential management and distribution system that uses:   AWS Key Management System (KMS) for key wrapping and master key storage, and DynamoDB for credential storage and sharing.   Check out the code at https://github.com/fugue/credstash and follow the directions to set up credstash. You will end up with a master key in KMS and a DynamoDB table to hold encrypted credentials and wrapped data keys.   Whenever you want to store/share a credential, such as a database password, you simply run $ credstash put [credential-name] [credential-value] For example, credstash put myapp.db.prod supersecretpassword1234.   Credstash will:   go to the KMS and generate a unique data encryption key, which is wrapped by the master key; use the data encryption key to encrypt the credential value; and store the encrypted credential, along with the wrapped (encrypted) data encryption key in the credential store in DynamoDB.   When you want to fetch a credential, perhaps as part of the bootstrap process on your web-server, you simply do $ credstash get [credential-name]   For example, export DBPASSWORD=$(credstash get myapp.db.prod).   When you run get, credstash will:   go and fetch the encrypted credential and the wrapped encryption key from the credential store (DynamoDB); send the wrapped encryption key to KMS, where it is decrypted with the master key; use the decrypted data encryption key to decrypt the credential; and print the credential to stdout, so you can use it in scripts or assign environment variables to it.   The README file in the credstash repo has lots of additional information and some notes about the actual operational security of this setup, so you should check it out to learn more.   Wrapping up and Learning More   KMS is a great new service that makes it easier than ever to safely store and manage secrets across your infrastructure. When used with other AWS services, like DynamoDB, you can add more security and control than before to high-churn immutable infrastructure (i.e., what Fugue creates).   You can learn lots more about KMS and access detailed API docs at:   http://aws.amazon.com/kms/ http://docs.aws.amazon.com/kms/latest/developerguide/overview.html http://docs.aws.amazon.com/kms/latest/APIReference/Welcome.html   And check out credstash at https://github.com/fugue/credstash.   Related Posts Immutable Infrastructure: Networks April 17th, 2015   Your Beautiful Baby VPC on AWS: Part 1 September 23rd, 2013

    Read More