Over the past few weeks, I have been following the scandal involving Volkswagen. Most of us have learned that VW installed so called “cheating software” in their diesel cars, which in conjunction with the anti-lock brakes and traction control system enables a cheat mode when their diesel cars were tested for emissions. Current estimates place the difference of the emissions under cheat mode somewhere in the magnitude of 40x below normal mode. Although investigations are ongoing, Volkswagen has earmarked a massive amount of cash to the tune of €6.5 billion for ‘handling’ the scandal alone. Additionally, millions of cars across the world will be called for maintenance and VW has recently halted all sales of affected models in the EU.
One of the core issues in this scandal is that it seems like, somewhere along the line, VW has intentionally made the choice to include the cheating system in their cars. The jury is still out where exactly this decision occurred as the CEO of VW USA stated in front of the Congress that the inclusion was the work of a few key engineers. This of course is debatable from both sides, and it’s my personal stance that no bad decisions can be made without a system of management that supports and incentivises such actions.
In this article I will consider the unfolding scandal from a software production perspective, exploring how more automated governance in security systems and more transparent reporting might have helped VW as an organisation make better decisions by incentivising the right kind of behaviour through transparency and supply chain management. Perhaps such considerations might’ve helped to avoid this crisis altogether or at least cast a light on the issue earlier down the line.
Finally, we’ll talk about why it’s important to heed the lessons this scandal teaches us, as developers unknowingly make similar decisions every day due to a similar lack of visibility and transparency in terms of information about what they are using in the software they produce. While both are elective risks, one is known risk, the other an unknown and avoidable one. Unfortunately, as far as avoidable risk goes, this issue is very hard to fully understand.
Lesson 1: Supply Chain Principles are key
I am not the first to recognize the parallels between the automotive and software industries. For example software factory principles and the LEAN movement stemmed from the Automotive Industrial Revolution in the Car manufacturing business and most importantly Toyota’s LEAN way. There is no doubt software professionals can still learn much from the traditional manufacturing world using techniques such as: limiting the amount of suppliers we use, selecting high quality components from those suppliers , and monitoring/tracking/auditing those components once they are in production. Collectively, this method of managing your suppliers is known as supply chain management. It can be applied to software as well as manufacturing – in this case the supply chain consists of code originating outside your organisation.
It cuts the other way, too; the software-rich automotive industry can take on lessons which the software industry at large has already learned through its own trial-and-error period. As the two industries converge more and more, it is important that lessons learnt on both sides of the coin are not discarded – or reinvented.
The Cost of a Poor Supply Chain
To fully appreciate why I illustrate the need to manage this supply chain, let us first look at what happens when it is not considered:
I bet VW’s investors are not happy about this graph of the VOW.DE share prices. Source: Yahoo Finance
Independent tests are showing that the cheat mode noticeably reduces the performance of the car. This not only reduces the resale value of the cars, but also has caused permanent damage to the reputation of Volkswagen as a manufacturer, and by implication, the entire car industry. The legal, commercial and reputational ramifications are huge.
As we can see, what might’ve been a very quick and dirty decision to make on the manufacturing floor has had wide-ranging implications, causing damage to both consumer and the producer alike. Of course, not every case of this kind of neglect will end up with such drastic results, but as we can see, small decisions earlier on can snowball down the hill.
It’s not yet known whether the board of directors of VW were aware of this software being in place, nevertheless it has become evident that VW was warned. Not only by some of their own engineers, but even by the supplier of the component that enables the cheat mode, Bosch.
Why then, with such loud voicing of opinions was this warning not heeded? What made the message get lost along the way? Without taking any stances on where exactly the guilt lies (that will eventually be decided according to due legal process), it is certainly evident that there was a fundamental lack of understanding about the components shipped in the final VW engine.
This is a complex problem to tackle even by the best in the business, as software today is increasingly comprised of dozens upon dozens of external components that were not produced by internal development teams. Software design is more akin to urban planning than building a house. Complex interacting components work in unison to produce a desired outcome, whether in an individual application or in a series of micro services.
Moreover, the average car nowadays has more software in it than a run-of-the-mill smartphone might. As software gets increasingly more complex to write, developers rely on external producers for a lion’s share of functionalities. It is conventional wisdom to avoid reinventing the wheel, and to instead reuse code where possible.
Third party software vendors, internal providers and teams, and even open source projects provide these components for everyone to use. This phenomenon has been a key development in the maturing process of the software industry and it is a very positive development for the most part. When code exists in reusable components, we can produce more software, faster, with far better quality.
However, there is a darker side to this too: Developers place a great deal of trust in these external suppliers.
In traditional industries these external suppliers would be audited and regulated so that trust is established between supplier and provider before a component can be included in a final product. In the software industry this has not been the case. As developers, we tend automatically to place trust in the outside teams and their ability to fix issues as they arise. Most of the time, the choice of components is made with casual googling of what is trustworthy and what is not. Even when the decision to use or not to use some component is an informed choice, rarely do developers prioritise re-evaluating that choice in the light of new developments on a regular basis.
This is not to say that they do not care about or realise that this issue exists – but rather that the issue isn’t really a top priority to address due to the perception that it is within the realm of acceptable risk, or that they feel they are informed about the issues from the projects.
This perception however is incorrect, due to them not having a very good mechanism for this information to flow in a timely manner. When new issues are discovered and new releases issued, teams should become aware of this as fast as is possible. Otherwise every moment lost in not understanding the issue and potentially applying fixes is a moment spent being possibly vulnerable to the issue.
Lesson 2: The Communication Gap Between Producers and Consumers needs to be shortened
There is another problem that stems from a distinct gap of communication between producers and consumers of these components. As an example, if a vulnerability is discovered and the provider issues a patch on day one, unless it’s highly publicised issue (such as heartbleed, shellshock, venom etc), these fixes are very easy to slip by unnoticed. A recent analysis we did at Sonatype revealed that about 7.5% of every download from the Central Repository contains known vulnerabilities — and over half of those vulnerabilities were older than 2 years.
Who outside of the Java development ecosystem knows about the hugely exploitable struts vulnerabilities reported a few years back? How many companies still have old “certified components” or” approved versions” that are still running in production because of “if it ain’t broke, don’t fix it’ mentality?
It is unwise and damaging to solve this communications gap by restricting developers and technologists from using these external components. Creating unnecessarily strict controls and enforcing blacklists can end up causing more harm for several reasons — ranging from developer frustration to wasted management time. Moreover, however diligently a list is compiled, change is so fast-paced that the list can become invalidated overnight when new vulnerabilities become disclosed.
Personally, I’ve seen a case where an organisation maintained a list of 300 approved components with about 800 versions in total. Now, I’m not a master in math but even the sheer effort of manually maintaining such a list by just a casual google every once in awhile can add up to a seriously long time. Because these efforts have traditionally been manual, they have been relegated to paper exercises that happen maybe once a development cycle – far too infrequently to be effective.
Another byproduct has been that these processes have been very opaque and closely guarded territories of a select, elite few. I tend to refer to this as the ‘scan and scold’ approach of enforcing governance. This method sees governance imposed and acted as a more auditory function rather than a fully integrated part of the product development process. The usual playbook in this scenario is that security and governance is performed at the very end of a delivery cycle, with a built-in assumption of remedying the issues being a ‘drag’ or a pain. Any developer worth their money will know that it is a necessary evil, but when deadlines and pressure to deliver start to pile up it might be easier to succumb to temptation.
Pictured above an approximation of how developers see this approach of enforcement, Source: https://www.flickr.com/photos/mllerustad/
Again, it is not a matter of people in product development roles acting out out of pure laziness or security personnel being purely evil – rather, the lack of feedback loops from these producers combined with ill-aligned incentives and business expectations can force this to occur. To some extent, they have also been a natural part of evolution of the software industry or a function within an organisation – security being implemented with the best methods and tools available at the time. Even scanning and scolding is better than no oversight or self-auditing at all.
Lesson 3: Governance and Security have to move at the pace of development
How can we then improve this part of the process? We’ve established that it is risky for governance and security to lag behind the pace of development, partially because vulnerabilities can be unearthed every day. If this part of the process becomes a slowing factor in a development cycle, they risk being dropped in priority or completely. In this situation the stereotype of the ignorant and pushy project manager stands true.
Instead of manually working out what should happen, we can and should aim to improve and automate the communication between suppliers and consumers. The same applies to transparency — we should not hide our deficiencies but rather learn to accept them as a measure of bettering ourselves. Making mistakes is not bad —not attempting to fix the issues thus discovered is.
By making this information not only automated, but also available for all eyes to see we can enforce a better way of governance — one where individuals and teams are intrinsically motivated to better themselves rather than being driven by fear of that annoying, nagging security guy.
An easy way to start is to create a bill of materials out of every build of your application and cross-checking it for lists like the
The Verizon Data Breach Report for 2015 claims that 97% of attacks observed in 2014 exploited just 10 different vulnerabilities, the earliest of which was first reported in 2001. By weeding out these CVEs you’ve taken the first steps and are already much better off.
Once you have taken care of basic hygiene and done away with the most critical issue it might be worth your while to make scrutiny of the bill of materials a key automated test. Just as unit tests should exist for code, governance tests should exist for this bill of materials. This set of tests should be as robust as any test suite you might adopt, covering every build and requiring that violations are fixed before shipping a product is allowed.
They of course could look a little bit something like this. See more: Nexus Lifecycle
Summa Summarum: Communication, Transparency and Empathy are important
Whatever the truth in the Volkswagen case may be, it illustrates a gap in communication, transparency and proper feedback loops between the team that put the original test software in place. This includes the people in management that chose to ignore warnings, either by neglect or by intent.
In the absence of further information, I doubt that any individual engineer worked in malice knowing their code would be misused. I suspect that if VW had managed software risk as well as they manage physical components, the “cheat system” might have raised more concern at an earlier point. Transparent and available flows of information would have aided management decision-making and eventually accountability. Perhaps if this issue was openly available for all eyes to see, somewhere along the line the voices raised would’ve been much much louder.
We have a dream.
Latest posts by Ilkka Turunen (see all)
- The Latest Victim of Deserialization-Gate - November 23, 2015
- Nexus and SSL - November 4, 2015
- 3 Things Developers Can Learn from the Scandal at VW - October 28, 2015
- Automating Nexus Deployment: Cookbooks, Modules and Playbooks - August 18, 2015
- Using the REST API in Nexus 2.x - August 13, 2015