bernie – Bernie Sumption's Blog https://blog.berniesumption.com Various writings on software development and photography Thu, 17 Oct 2019 08:06:41 +0000 en-GB hourly 1 https://wordpress.org/?v=4.9.11 Strategies for building reliable user interfaces https://blog.berniesumption.com/software/building-reliable-user-interfaces/ Wed, 27 Sep 2017 14:09:26 +0000 http://blog.berniesumption.com/?p=1092 Or, “unit testing of UI code is overrated”

This article is about how to make UI code reliable. By reliable I mean free from bugs. This implies that the software must also be maintainable, which is to say that you can easily add new features without also adding new bugs. By UI code I mean code that accepts input from users and displays results. In a modern JavaScript-powered web application, a lot of business logic is moved into the browser. For example, an online chess game may have a virtual opponent that can play against you, choosing the best move to make on each turn. Silly as it may sound to refer to chess as business, the code for this virtual opponent is “business logic” rather than UI, and you should probably unit test the shit out of it. UI code covers things like providing feedback as a player drags their chess piece across the board.

There’s something of a testing religion in software development. Mostly this is a Good Thing – unit tests are probably the most important tool for building reliable software. In this article I argue that in the context of UI development, there are better techniques for achieving reliability.

Part 1: why unit tests don’t work on UI code

This is an article about building reliable UIs, but I’m going to spend the first half of the article on my claim that unit tests are not a good tool for this purpose. This says something about the position of unit testing in software development – to many people, unit testing is synonymous with reliability. In fact my motivation for writing this article is that I’ve worked on a couple of projects where the business considered reliability of the product to be even more important than usual, and this was translated directly into a requirement to unit test more than usual, without much apparent consideration of the alternatives. If you’re more interested in positive advice then you may want to skip to part 2.

The warm fuzzy feeling of comprehensive tests

First off, let’s acknowledge that there is a good reason for unit testing being popular: in some contexts, such tests are necessary and sufficient for building a bug free program. Necessary because they are the most effective tool for getting rid of bugs; sufficient because tests alone are pretty much all you need.

Some programs, especially those that run server-side in a web application, can be specified entirely in terms of input and output. A specification for a REST API server might consist of sentences like this: When a POST request is made to /api/smurfs with a valid JSON-encoded Smurf in the request body, the Smurf should be persisted to the data store and a 200 OK response returned with the new Smurf’s database ID as the response body. This translates *very* well into a unit test:

function api_should_create_a_new_smurf(api, backend) {
  let smurf = {
    name: "Papa Smurf",
    age: 546,
    biography: "https://en.wikipedia.org/wiki/Papa_Smurf"
  };
  let response = api.smurfs.handlePost(papaSmurf);
  expect(response.status).to.be(200);
  expect(response.body).to.be.a.number;
  expect(backend.getSmurf(response.body).to.equal(smurf);
}

Further test cases may stipulate what is expected when an invalid Smurf is provided, or when the data store is unavailable. When you have a test for every important aspect of the system, you get the warm fuzzy feeling of comprehensive tests. This is the confidence that you can work freely on any part of the software, knowing that if you accidentally break something then the tests will alert you.

This is important stuff – it’s the kind of thing that makes the difference between success and failure for the project, or between happiness and irritation for your developers.

Why UIs can’t be comprehensively tested

Servers are interfaces designed to be used by machines, and so it makes sense that machines are good at testing them. The Smurf API above can be specified in terms of data in and data out. If the input reliably produces the expected output, the server is working, end of story. UIs on the other hand are used by humans, which makes them literally impossible for a machine to test properly. The input to a UI is human intention and the output is human understanding. The only way to test this, at least with current technology, is to have a human do it.

Let me illustrate this with an example. Say we’ve got this UI:

Clicking the + icon expands the biography and changes the plus sign to a minus:

The test for whether this feature is working is that when a human attempts to click the biography button, the human can read the biography. The closest that a unit test can get to this is something like this:

function smurf_biography_should_expand(browser) {
  // assume that browser is a selenium object that
  // remote-controls a real web browser
  let page = browser.navigateTo(SMURF_VIEW_PAGE);
  let bio = page.findElement("#biography");
  let bioButton = page.findElement("#biography-toggle-btn");
  expect(bioButton).to.containText("Show biography");
  expect(bio.style.visibility).to.be("hidden");
  bioButton.click();
  expect(bioButton).to.containText("Hide biography");
  expect(bio.style.visibility).to.be("visible");
}

This test is not very useful for two reasons.

1. If a UI test passes, it doesn’t mean that the feature works. In principle this is true of all tests – even in well tested server-side code there could always be edge cases where a feature sometimes fails even though all of its tests pass. But UIs are special in that the tests can pass and the feature still be totally, 100% broken all of the time.

This can happen at either the input or output end, when something goes wrong in the gap between what humans and machines are capable of testing. At the input end, the line bioButton.click(); will tend to work even if a real human could not click the button. For example the Smurf photo may be wrapped in a full width container with a transparent background that extends over the button, so that when you try to click the button with a mouse you click the photo container instead. The browser automation tool however is just looking up elements by ID and issuing virtual clicks on them, so isn’t tripped up by this. At the output end, the test verifies that the CSS visibility property has the correct value, not that the eye can see all the right pixels on the screen. Perhaps the biography element has a fixed height so even though the content is shown, it is cropped off and you can only read the first couple of lines.

The end result is that no matter how many unit tests you have, you will never achieve the warm fuzzy feeling of comprehensive tests.

2. The stuff in a UI that can be tested, usually isn’t worth testing. Behind a requirement like “clicking the ‘Show biography’ button expands the biography text” there are tens of tiny requirements, some visible in the design mockup, others implicit: the text should be entirely visible; the icon and button label should be different in the expanded state; if the text is too long it should probably scroll rather than extending off the bottom of the page; the padding around the text should be consistent with the rest of the design; the animation used to expand the text should be consistent with the animations elsewhere; etc etc. In general these sub-features are trivial to implement (anywhere from a single HTML attribute to a few of lines of code), easy to test manually, and isolated (adding or changing one feature is unlikely to break others).

Certain sub-features such as the changing of the button label are easy to test automatically. Others such as the smoothness of the animation are difficult or impossible to test.

Some people advocate testing what can be tested, on the grounds that some tests are better than none. I take the view that this gives a false sense of security, and that it’s better to be consistent: test none of these sub-features, and focus on other techniques for making reliable UIs.

Part 2: How to build reliable UIs

Fear not! The good news is that it is possible to write reliable UI programs. Automated testing has a role to play, but it isn’t nearly as important as in server-side development.

These are the techniques that I’ve found useful for building more reliable UIs, in no particular order. The list may grow over time and is biased towards web applications because that’s what I’ve been working on recently.

Have a reliability strategy

Explicitly recognise reliability as a non-functional requirement of your software, like performance. Decide what techniques you’re going to use to encourage reliability, then ensure that the whole team is up to speed with the decision.

The list of techniques in the remainder of this article is a good place to start. For each technique, analyse the cost and benefit for your specific project, then decide whether and how to adopt it.

Good old fashioned software craftsmanship

Well written software is less likely to contain hidden bugs, and easier to change later without introducing new bugs. This is, after all, how software was created back in the 20th century before Kent Beck made unit testing popular. Whole books have been written on this topic so I can’t even scratch the surface here, but all of the classic rules of thumb for how to write good software are even more important in UI code: write short methods that do one thing well; use a consistent coding style; use carefully chosen, descriptive method and variable names; avoid code duplication, prefer simple self-explanatory code to comments, but use comments when this isn’t possible; split features into independent modules with a clearly defined interface between then; avoid shared mutable state; etc etc. This kind of stuff may be best learned over a career of learning from mistakes, but a Google for “software engineering best practices” brings up plenty of lists.

When a sufficiently experienced developer is making a product on their own, they can simply decide to employ good software craftsmanship. In a team, good craftsmanship does not simply happen as a result of hiring good developers – it requires the right leadership and culture. Again, many books have been written on this topic, but some of the important aspects are: designing and communicating coding standards; reading each others code (preferably in a more peer-to-peer manner than the term “code review” implies); sharing expertise; collaborative design; and pair programming.

Automated unit tests

Having spent all that time bashing unit tests, I’ll back-pedal a bit.

Sometimes you will find that it is simply easier to develop a bit of code using test-driven development than with manual testing. In this case, it’s a no brainer: writing unit tests saves you time, and gets you the benefit of unit testing for free.

Other times you may feel that a bit of code you have to write is particularly fiddly or error prone. Your first thought here should be to change your coding patterns so that you don’t have to write fiddly or error prone code. But sometimes you’re using a framework that has great benefits, but requires a bit of fiddly coding (Redux reducers I’m looking at you) so you appreciate the warm fuzzy feeling of comprehensively testing that little bit of the code base.

Component development harnesses

One of the main benefits of unit tests is that they force your code to be modular by making dependencies clear. Each unit test is an isolated environment which creates a new instance of the class and provides it with any dependencies that it needs. If a class has many dependencies then it will be hard to write tests for it, and this encourages the developer to organise the system to minimise dependencies between classes, making the code base more maintainable.

The equivalent for visual components is a development harness. Say you’re developing a drop-down menu component. The development harness for this will be a screen populated with multiple instances of the menu component with different settings, styles and data. As you develop the component and it grows in complexity, you add more instances to the development harness just as you would add tests to the test suite of a non-visual component. When you’re doing any major work on the component, you do so in the development harness rather than in the application. This makes you more productive because it’s faster to make code changes and see their effect. It also enforces modularity – because your component must be capable of running in both the application and the development harness, it must be a standalone component.

Strongly typed JavaScript

JavaScript is a dynamic language, so certain errors aren’t discovered until the program runs. A bit of code that looks right – alert("Hello " + user.name) – may be broken because the user object has no name field, only firstname and lastname. This error could appear in previously working code if user object is refactored but not all of the code using said object is updated. One argument for extensive unit testing is that tests will find these errors, but static type checking is a better way of achieving the same goal.

TypeScript and Flow are both excellent static type checkers for JavaScript. Personally I use TypeScript. They eliminate a whole class of errors, and make editing much more productive and enjoyable by underlining errors and autocompleting names as you type.

Use architecture that encourages modularity

There are several different techniques relevant here that generally come under Inversion of Control, also known as the Hollywood Principle: “don’t call us, we’ll call you”. Components should not reach outside their bounds and interact with the rest of the system. Instead they should have dependencies provided through dependency injection and signal actions by dispatching events.

For example, when designing a UI it may be the case that when the user clicks “save” then the model saves itself to the server. Neither model nor button should know about each other. Instead the button should dispatch a “request save” event onto a central event bus which is picked up by the model and triggers it to save. This gives you flexibility to add new features – such as auto-save every few minutes or a loading spinner that notifies the user when the document is saving – without having to change the existing components.

Split code into declarations and engines

When you can’t comprehensively test part of a program, it’s particularly important that the code be simple, readable, and it’s function self-evident. A really powerful tool for achieving this is to split the code into declarations and engines.

The declaration is a data structure that specifies exactly what you’re trying to achieve, with no implementation details. The engine is code that interprets this data structure and makes it so.

A good example of this is the Django forms system. A form is defined in one place as a class:

class NameForm(forms.Form):
  name = forms.CharField(label='Your name', max_length=100);
  bmu = forms.BooleanField(label='Beam me up?', required=False);

This data structure is used by engine code in several places: on the server to generate the form HTML that is sent to the browser; in the browser by JavaScript to provide immediate validation; and back on the server to securely validate submitted responses.

The Declaration/engine pattern works best when the engine is provided by a well designed and documented framework like Django, in which case you don’t have to write any of the engine code. But if there’s no framework available for your use case, it’s easy enough to write your own. Be aware however that there is a cost: new developers need to learn the declaration format before they can use it, so it’s only worth doing if you’re going to make significant use of it.

Use best-of-breed frameworks that smooth over browser inconsistencies

Frameworks like React, Angular, Bootstrap and jQuery are indispensable for building modern web apps. They’re most often thought of as tools to increase productivity or encourage modular architecture, but much of their value lies in the way that they present a consistent API over browser features.

If you add a change event handler to a <select> (dropdown) element using JavaScript, the results will be inconsistent. Some browsers will dispatch the change event when a new item is selected, others only when the user defocusses the select control. In some browsers the change event will bubble, in others it will not. And there are probably more inconsistencies that I don’t know about. The popular frameworks mentioned above are full of special cases for specific browsers, and they’ve been tested and refined by thousands of contributions to make sure that some obscure DOM bug in the latest WiFi-enabled fridge from Samsung isn’t going to cause a bug in your application.

Automated smoke tests

Smoke tests are high-level tests on the full running software that check that most of the important features seem to be working. For a browser-based UI application they’re typically written using a browser remote-control tool like Selenium. The difference between smoke tests and unit, integration or acceptance tests is that a smoke test doesn’t aim to prove that the system is working exactly as specified, jut that it is working at all.

The great thing about smoke tests is that a little goes a really long way. You describe a few simple paths through your application, for example: log in, add a new Smurf, find the new Smurf on the search page, view the new Smurf’s biography, remove the smurf, log out. This test would only take a couple of hours to write if you’re already familiar with Selenium, but if it passes it tells you that all major parts of the application are working – the JavaScript doesn’t have any fatal errors, the application server has started up, the database connection is working, etc etc.

A smoke test is a useful thing to run straight after a build to check that the build works. But it’s also something that you can point at staging or production systems. You could even run it on a schedule every hour, as a more advanced version of a “ping” to check server uptime.

Client-side error collection

Client-side error collection involves putting code in your app to detect errors that happen on the user’s computer and alert you to them. You can do it yourself by listening to the window.onerror event, or subscribe to a service like RaygunSentry or errorception.

These work so well because they change the economics of bugs that get past QA. Say there’s a bug that only happens on Firefox Mobile, and that isn’t one of the browsers you test on. Without client-side error collection, Firefox Mobile users just know that your site doesn’t work, and either go elsewhere or get a little annoyed and switch to another browser. With client-side error collection, you get notified of the bug and can fix it after only having irritated a few users.

These services only catch JavaScript errors in web applications, not other kinds of errors like images that fail to load. But JavaScript errors are often the most damaging as they will entirely break a feature rather than just make it look a bit off.

Manual testing

Some degree of manual testing is usually a good idea for most software projects.

Traditionally, “manual testing” means that either you, or another team member, or a 3rd party company, goes through every possible permutation of every feature trying to find bugs. Because the person doing the testing doesn’t have specialist knowledge required to recognise intuitively if a feature is broken, they need a document telling them how to test. This is called a Testing Procedure Specification, and was immortalised in the film Office Space as the emblem of tedious drudge-work in a software company.

A much more pleasant version of manual testing is Dogfooding: the practice of using pre-release versions of your own software internally. Obviously this only works if your company makes a product that it has need of itself. You can’t dogfood nuclear power plant control software. But if you’re making a project management app, most of your staff can manually test your software just by going about their day-to-day work. This also has the benefit of making sure that your staff are aware of the latest developments in your software.

User testing

User testing is normally thought of as a tool for polishing UX design, not for finding bugs. But as far as users are concerned there is no distinction line between a minor technical bug and a UX issue – both get in the way of the task they’re trying to perform. And because user testing is best done on early prototype software, the process of user testing sometimes uncovers new technical bugs.

The most effective kind of user testing involves inviting test subjects to go through a list of tasks in the software, talking out loud about their thought process as they go.

User feedback

Finally, when all of the above techniques fail to stop a bug or UX issue from getting into production software, make sure that users have an easy way to report bugs and give suggestions. At a minimum, give an email address that reports can be sent to. Much better is to use a feedback tool like Usersnap which can take a screenshot of the page and automatically record information like the browser and operating system in use.

The end…

Have I missed anything out? What are your favourite techniques for making reliable UIs? Feel free to leave a comment with your own preferred techniques for making reliable UIs, or email me on bernie@berniecode.com.

]]>
Making DNArtwork #8: Generating realistic painted shapes on-demand https://blog.berniesumption.com/software/dnartwork/dnartwork-generating-realistic-painted-shapes/ Sun, 17 Sep 2017 12:53:18 +0000 http://blog.berniesumption.com/?p=1045 DNArtworks are supposed to look like paintings, or at least like prints of paintings – with realistic brush strokes and no unnaturally straight lines:

Now a human may find it easier and faster to draw a sloppy picture than a neat one, but computers are the other way round. Drawing perfect geometric shapes is easy, but it takes much longer to make authentic-looking imperfections, both in terms of development effort and rendering time.

The largest downloadable size of DNArtwork is over 160 megapixels, and generating it from scratch takes several minutes of CPU time and hundreds of megabytes of memory. In principle it would be possible to do this work on a virtual machine in the cloud, by queueing artworks and generating one at a time, but if several people ask for an artwork at the same time, the wait could get pretty long.

I’ve developed a system that generates artwork images almost instantly using less 1% of memory required to hold the full artwork image. It achieves this by doing most of the work before deployment, and using an online rendering engine custom-built in C++.

Step 1: Plain shape rendering

When your favourite tool is a hammer, every problem starts to look like a nail. I decided to use JavaScript canvas to generate the shapes, even though this part of the project doesn’t need to run in a web browser, just because I’m used to it. There are 6 kinds of shape:

Crosslines
Geocluster
Geototem
Moustache
Snakerod
Spirograph

A different JavaScript program generates around 800 variations of each shape.

The trick here is controlled sloppiness. The art of Wassily Kandinsky upon which the graphical style is based may be chaotic, but it is not random. Here’s an example of how the moustache shape above is built up.

I start with a checkerboard grid. The number of rows and columns varies between shapes, various lines are shortened and removed, and colours are chosen semi-randomly – the first colour is truly random, then subsequent colours are chosen to go well with it using colour compliments, triads or tetrads. This produces substantial variation between shapes, but it looks very regular:

Applying a random rotation and perspective transformation to the shape adds more variation:

A little more randomness applied to the angle of each line:

Varying the width of each line as it is drawn produces an effect a little like calligraphy – but the lines are still a little straight:

Finally, the shape is fed through a WebGL-powered 2D simplex noise distortion shader that bends all those straight lines:

The shape is now looking handsomely messy, as if drawn by hand, but the edges still have the razor-sharp, noise-free look that is the hallmark of computer generated graphics.

Step 2: Applying the paint effect

The shapes you’ve seen so far are all rendered in a web browser. JavaScript running in a web browser can’t save files, but it can make HTTP requests, so each shape is encoded as a PNG file and posted to a little Node.js script that saves it to disk. The images are then processed with the Photoshop plugin Snap Art, which produces the paint effect:

Step 3: Online rendering

Each artwork has 23 shapes, and each shape takes the best part of a minute to generate, so a full artwork should take several minutes to generate. However while there are are 8×1052 different possible artworks, there are only a few thousand different shapes which are repeated between artworks. This is all part of the design – the more closely related two people are, the more identical shapes will appear in the same positions in each of their artworks. So it’s possible to render a painted image of each shape in advance, and then generate a full artwork image by combining many smaller images into a large one.

Rendering a copy of each shape, in several different sizes, took around 3 days with my poor little laptop overheating and running its fans full power. The resulting images were uploaded to a virtual machine in the Microsoft Azure cloud.

The first version of the server program to generate the artwork collage took the obvious route:

  1. open and decompress the JPEG file containing the painted background, which produces a full size image in memory
  2. open and decompress each shape file in turn and superimpose it onto the background
  3. compress the background again and serve it as the HTTP response

This is acceptably fast – about a second for small images and 20 seconds for very large images. And while it uses a lot of memory – around 300MB for a very large image – by using Azure Functions you only pay for the memory usage for the few seconds that the function is running, so it’s affordable.

Still, I figured I could do better, and so I wrote a custom rendering engine in C++. This uses libjpeg-turbo to process JPEG images line by line, so that only one full row of the image pixels needs to be held in memory at any moment. The new rendering engine is about 3 times faster – it takes 7 seconds instead of 20 to generate a large image. But the improvement in waiting time is better than 300% because of streaming. The old engine had to generate the whole image before sending any data, which caused a message to hang at the bottom of the browser window for 20 seconds saying “waiting for dnartwork.com…”. The new rendering engine can start serving the first rows of image data after half a second, then generate the rest of image image while the first bit is being downloaded. So in terms of responsiveness, the new engine is more like 40 times faster.

The result

Here’s a 23000 x 8000 image (550 megapixel, i.e. don’t blame me if it crashes your browser!) that took 7 seconds and 300KB of memory to generate.


For more posts in this series, check out the DNArtwork category on this blog.

]]>
Project: DNArtwork https://blog.berniesumption.com/software/dnartwork/ Sat, 16 Sep 2017 13:07:39 +0000 http://blog.berniesumption.com/?p=1040 Continue reading Project: DNArtwork]]> DNArtwork is a personal project to generate artwork from DNA. It took me 3 years, and is currently available at dnartwork.com. If you already have a DNA test from 23andme, National Geographic or Ancestry.com, you can get your own artwork for free. If not, download and unzip this dummy DNA test result file so that you can see how it all works.

I’ve written a series of blog posts on various parts of the project:

]]>
Making DNArtwork #7: how does it work? https://blog.berniesumption.com/software/dnartwork/dnartwork-how-it-works/ https://blog.berniesumption.com/software/dnartwork/dnartwork-how-it-works/#comments Wed, 14 Dec 2016 22:10:08 +0000 http://blog.berniesumption.com/?p=989 This article is available in Romanian thanks to Irina Vasilescu and in Czech thanks to Barbora Lebedova.

This article is part of a series documenting my project to make artwork from DNA. In the last article I showed off what the artwork looks like. In this post I’ll explain how I analyse an individual’s DNA to extract the information needed for an artwork.

It’s going to get quite technical, but I’ve tried to include enough information that the interested layman can understand it too, perhaps with a little googling.

A brief recap. Right back at the start of this project I decided that my genetic artworks will consist of a number of distinct shapes. The more related two people are, the more objects will appear in both their artworks. Have a look at these two artworks:

dnart-relation-composite

Each of the 12 shapes has a unique personality and can easily be distinguished from the others. A few seconds looking at both artworks above and you can see that some shapes appear in the same position in both artworks (e.g. the top left shape) and some are unique to each (the bottom left shape). This is what you’d expect the artworks of two siblings to look like.

Let’s say we have a collection of ten thousand unique shapes that each have a sufficiently distinct personality that it’s easy to recognise when the same shape appears in two paintings. Exactly how we create those shapes is a subject for another post. The task at hand is to take your DNA and boil it down into a list of numbers between 1 and 10,000, so that you share more of the numbers with a close relative than with an unrelated person.

I call this list of numbers the DNArtwork signature. It is safe to share publicly since it contains no sensitive information, except of course that if two people both share their DNArtwork signatures then you can tell how related they are. Here you go, here’s mine:

9083, 2302, 1083, 1735, 5474, 1728, 9925, 1231, 95, 7831, 1526, 1505, 729, 4866, 3778, 2161, 20, 8178, 3972, 3103, 9332, 9859, 9757

My algorithm for creating the DNArwork signature is, as far as I’m aware, the only really original idea I’ve contributed to this project, and it’s what sets this project apart from other artwork created from DNA.

Part 1: a quick primer on SNP genetics

In order to understand the description of the DNArtwork signature algorithm later in this article, you need to know some basics about DNA. If you think that SNP stands for Scottish National Party then you should read this section.

DNA is a massive molecule made from a string of smaller molecules called nucleotides. There are 4 nucleotides: Adenine, Cytosine, Guanine and Thymine, referred as A, C, G and T. Each nucleotide is about 13 atoms in size, think of them as the letters that spell out sentences in a coded genetic language.

Your genome is about 3 billion letters long and split up into 23 lengths of DNA called chromosomes. Chromosomes are large enough that you can see them under a microscope, and they look like this:

humanchromosomeschromomycina3

Because DNA molecules are continuous strings of letters, it’s possibly to identify any position on that chromosome by counting from one end of the chromosome, so for example at position 8,907,307 on chromosome 3 you have a A. This position is called a locus, or loci in plural. Each chromosome is roughly symmetrical, having two copies of your DNA. One copy comes from your mother and the other from your father, and they’re almost identical. In fact, your DNA sequence is almost identical to every other human – only about one in 300 loci have been found to vary between individuals.

Earlier I told you that at position 8,907,307 on chromosome 3 you have an A, and I could do that because position 8,907,307 on chromosome 3 is one of those boring locations that always has the same letter in humans. Everyone has an A, and for that reason nobody has thought to call this locus anything other than “position 8,907,307 on chromosome 3”. The locus next door however is a different beast. Position 8,907,308 on chromosome 3 has a “Single Nucleotide Polymorphism” or SNP for short, meaning that it has been observed to be different between some people. This particular locus can have either an A or a G, which are referred to as the two possible alleles. Since you have two copies of your DNA one from each parent, you can have either two As, two Gs, or one of each. This makes the locus interesting enough that scientists have given it the pithy name rs180498, to save them from having to say “position 8,907,308 on chromosome 3” all the time.

Like most SNPs, we have no idea what rs180498 does, or indeed if it does anything at all, so it has an extremely boring entry in SNPedia. Some SNPs are more interesting, like rs1815739 which can make you a better sprinter, or rs7495174 which can change your eye colour.

It’s rare that both alleles of an SNP will be equally common. The more common allele is called the major allele and the less common one the minor allele. In the case of our boring SNP rs180498, the minor allele is A with a frequency of 0.167, meaning that 16.7% of DNA strands will have an A and the remaining 88.3% will have a G. These frequency figures are averages, and may vary between populations. In fact, our boring allele rs180498/A has a frequency of 13% among western Europeans and 41% among Japanese people, according to the 1000 Genomes project.

The DNArtwork signature algorithm relies on looking for relatively rare alleles – ones with a minor allele frequency of around 2%. I call these “marker alleles” because having one is a distinctive feature of your genome that can be used to distinguish you from others.

OK, now you know enough about DNA to follow how the DNArtwork signature algorithm works.

Part 2: the DNArtwork Signature Algorithm

The algorithm consists of a preparation phase that is done once before anybody is tested, then an analysis stage that is done on a person’s DNA to generate their DNArtwork signature.

Preparation phase

Generate a list “marker alleles” – SNPs with low minor allele frequencies across all ethnicities. I use these steps:

  1. Let N be the length of the list of numbers that is the DNArtwork signature and M be the maximum value of each number. These will be chosen based on the creative requirements of the project, and for my project N=23 and M=4600 because each artwork has 23 shapes and there are 4600 different possible shapes.
  2. Start with the set of SNPs that are tested by all of the genetic testing companies that the project must support. In my case it’s 23andme, ancestry.com and National Genographic. All of these companies test a slightly different set of SNPs but there is a large overlap.
  3. Using population frequency data from the 1000 Genomes project via HapMap, discard any SNPs with a minor allele frequency less than 1% in any ethnicity. This is because DNA tests are not perfectly accurate, so the rarer an allele is the more likely that its appearance in your results is a testing error not a real result. This is Bayes’s Theorem and is an important consideration when testing for rare medical conditions.
  4. If N=23 then use actual chromosomes for this process. Otherwise divide the genome into N equal length sections and consider these to be “chromosomes” for the purposes of the rest of this algorithm.
  5. Sort the alleles in each chromosome by the highest minor allele frequencies in any HapMap populations, with rarer alleles first. Using the highest frequency is important – if we used the average frequency, we may end up with markers that are rare on average but very common among Koreans for example, so that every Korean person will predictably share a number in their DNArtwork signature.
  6. Take the first M÷N alleles from each chromosome.
  7. Calculate the probability that an individual will fail to have any marker allele for at least one chromosome. * If this probability is unacceptably high, either increase M or return to step 6 but take every other allele, or every third allele, or every nth allele in order to incorporate more common markers.
  8. Randomly assign an identifier number between 1 and M to each allele, so that each number is used for exactly one allele. This randomisation step means that, even if all numbers are not equally common, the numbers are at least approximately evenly distributed between 1 and M.

The use of distinct M and N values and the random assignment of the identifier number means that DNArtwork signatures are not comparable between creative projects that use this algorithm, unless they have cooperated to share the preparation phase of the algorithm.

* When combining independent probabilities, if the chance of an event happening at an opportunity is P, the chance of it happening twice in two consecutive opportunities is P2. The chance of any individual not having a specific marker allele is  1-frequency, so the chance of an individual not having any marker is the product of  1-frequency for each marker on the chromosome. For example, if you have 200 markers each with frequency 0.02, then the chance of failing to have one marker is 1-0.02 = 0.98 = 98%, and the chance of failing to have all 200 markers is 0.98200 = 0.018 = 1.8%.

Analysis phase

Given a subject’s DNA test results:

  1. Perform validation on the results file to make sure that it contains sensible data. In my case I check that the file has a valid result for at least 50% of the marker allele SNPs. Lower values may indicate a corrupted input file, contaminated or non-human DNA used for the testing process, laboratory error, or any of the other fun fun causes of failure that DNA testing companies have to deal with on a daily basis.
  2. Take the results for chromosome 1 only
  3. Proceed through the list of marker alleles until you find a SNP for which the subject has at least one copy of the minor allele.
  4. If you get to the end of the list and there are no matching alleles, choose the last allele on the list. This should happen very rarely*.
  5. Append the identifier of that allele to the DNArtwork signature list.
  6. Do the same for the rest of the chromosomes until you have a list of N numbers

Real-world performance

The Personal Genome Project is a collection of DNA testing results kindly released by members of the public for the benefit of researchers. It contains hundreds of individuals, and a few full families.

Firstly, I downloaded the DNA test results of 30 people and generated genetic signatures for each of them. I looked for marker alleles that appeared more frequently than I’d expect by chance. If every person had a unique set of 23 numbers, I’d have 23 × 30 = 690 distinct numbers. In fact I had 263, indicating that each number appears about 2.5 times among the 30 subjects. The most common number appeared in 30% of subjects but there are relatively few of these common numbers: and 90% of the numbers appear in less than 10% of subjects. I was happy with these figures. Some excessively common numbers are to be expected since my minor allele frequency data is based on the 1000 Genomes project which only provides an estimate of actual global frequencies. The important thing here is that there are no numbers that reliably appear in most subjects.

Secondly, I downloaded two sample families, one parent with two children and one grandparent, parent, child trio. On average, these family members shared 40% of their signature numbers with close members of family, and as expected the grandparent/grandchild pair shared slightly less – 30%. Between families, subjects shared 15% of their numbers. Since both families were western ancestry Americans, some relatedness is not unexpected. Again, I’m pretty happy with these figures.

Patent / copyright declaration

As far as I can tell, under UK and EU law, algorithms are not subject to copyright and not eligible for patent protection. I therefore have no issue with anyone creating a software implementation of the algorithm based on the description in this article.

One purpose of publishing this algorithm is to define some prior art, should anyone attempt to assert ownership of it in future. This algorithm was developed independently by me (Bernie Sumption) in August 2013, first implemented in software in August 2016, and published (in this article) on 14th December 2016.

]]>
https://blog.berniesumption.com/software/dnartwork/dnartwork-how-it-works/feed/ 1
Making DNArtwork #6: the first art print, hot off the press https://blog.berniesumption.com/software/dnartwork/dnartwork-first-art-print/ Wed, 28 Sep 2016 14:37:13 +0000 http://blog.berniesumption.com/?p=968 It took a while (to wit: 3 years of idle sketching in notebooks, 2 months of coding and 100 hours of image rendering from my poor little laptop with its cooling fans constantly blowing like so many tiny hairdryers) but the visual part of the DNArtwork project is complete.

Obviously as soon as I’d produced the first full size digital artwork, I had to head round my friend Michael’s house to borrow his printing skills (and ma-hoo-ssive printer)

Inspecting the first ever DNArtwork print.
Inspecting the first ever DNArtwork print
Precision work...
Cutting to size
Close-up of the paint effect
Close-up showing the paint effect

Of course I’m not finished with the project yet – I still need to build the site where people can get their own artwork. But this feels like a milestone worth celebrating, and since it’s too early in the day to open champagne, we can celebrate with a photo of me having a staring match with a falcon instead:

Bernie and a falcon
Although it looks more like I’m proposing a kiss than a staring match

For more posts in this series, check out the DNArtwork category on this blog.

]]>
Making DNArtwork #5: drinking the Microsoft Kool-Aid https://blog.berniesumption.com/software/dnartwork-drinking-the-microsoft-kool-aid/ Mon, 05 Sep 2016 20:41:15 +0000 http://blog.berniesumption.com/?p=922 This post is part of a series in which I explore the building of DNArtwork.com. It’s about how I’ve decided to use Microsoft .NET and Azure as the infrastructure that supports the website because they’re cheaper and better than other technologies. There you go, that’s the conclusion of this article. If you’re reading this series for the DNA and the artwork, you can move along now.

If you’re interested in why this is surprising, then I’ll tell you a story about the computer industry. Not all that many years ago I was personally committed to fighting against Microsoft, which I and many others saw as a corrupting force in that industry. Now I’m actually quite excited about using their new software. This is quite surprising to me, and this article is my way of figuring out how this happened.

By the way, this is one of my occasional long rambling posts. You have been warned :o)

Part 1: the empire and the rebels

When I was a university student teaching myself computer programming, every tech enthusiast I knew believed Micro$oft to be evil. We might have got a little carried away with this belief, egged on by a Star Wars-inspired predilection towards plucky rebels (in the form of free software like Linux) taking on Big Mean Evil Empires. But we did have some good reasons for disliking the company.

Scary eyes mean bad, OK? Hey, it worked for the conservative party.

Microsoft grew up in a world where IT was purchased by big companies with large budgets, purchasing decisions were made by men in suits who had never written a line of code in their lives, and professional developers lived with those decisions because they were paid to do so. Microsoft wasn’t interested in selling individual products to customers who’d pick and choose the best product for each of their requirements. Their strategy was to create a family of products that covered all common IT requirements and worked well with each other but not with the competition. Software teams would design their creations on Windows (~$300) using products like Office (~$500) & Project (~$600), communicate using Exchange (~$600), write code using Visual Studio (~$1000), share it with the team using Visual Source Safe (~$500), deploy it on Windows Server (~$1000) using an SQL Server database (~$6000 wait what?!?), and… well you get the idea. The pitch was “buy Microsoft for everything and all your software will just work together”.

Software developers tended to adopt the Microsoft product family wholesale and make software that only worked on Windows, or avoid Microsoft as much as possible. I was in the latter camp, and me and my young idealistic techie friends would refer to Microsoft as The Borg (after a race of parasitic aliens in Star Trek) and to developers who worked only with Microsoft technologies as having drunk the Kool-Aid (explanation for non-American readers).

You will be assimilated

It wasn’t the price of Microsoft software that bothered the young techies since they’d just pirate the software anyway*, it was the feeling that Microsoft put the minimum necessary effort into their software and relied on marketing to middle managers and various kinds of unfair competition to achieve high market share.

* I’m told. «cough»

For example, Microsoft internally referred to their process for bringing new product categories into their ecosystem as embrace, extend and extinguish. They’d enter new markets where the existing vendors had agreed on a standard that made their software interoperable. Microsoft would extend the standard with their own modifications. The effect would be that if you used the Microsoft product then you could read files from the other vendors, but users of the other vendors couldn’t read your files. Moving to Microsoft became a one-way street. New customers would buy the Microsoft product even if it was technically inferior and more expensive (and it often was) as that was the only way of being able to guarantee that they could read files from both MS and non-MS users. The open market for that technology would wither and die.

All this isn’t to say that Microsoft didn’t have some excellent technology. Some of the best developers in the world work for Microsoft and they produce some amazing software. Microsoft are particularly good at making developer tools: software for making software. Windows, .NET and Visual Studio are all excellent works of software engineering. But Microsoft weren’t interested in making them easy to use on their own – they’re supposed to be the carrots that lure developers into the Microsoft product family. Unhappy with this “all or nothing” proposition and the dirty feeling that came with supporting a company that seemed to be on track to turn the whole software industry into a monopoly, the more idealistic developers tended to ignore the whole Microsoft realm and focus their attention on open source software like Linux. It was often messy, unpolished and frustrating to use, but it was free in both senses of the word.

Then, the web happened.

Microsoft was able to dominate the desktop software world because it was there right from the start of the personal computer revolution. It was clear to Microsoft’s executives knew where the value of PCs lay, and could notice new product categories as they became important (word processors, spreadsheets, project management tools, etc) and move in to dominate the new markets.

But the web began by stealth in 1991 at CERN (the European Organisation for Nuclear Research) as a platform for sharing academic documents and gradually morphed into a platform for building and delivering software.

CMS_Higgs-event
CERN, smashing atoms and shit.

Early development of the tools to create web pages was done by a community of open source developers, using a philosophy for developing software that is in many ways the antithesis of Microsoft’s. This is:

  • Software should be developed by, or at least in close collaboration with, its community of users.
  • Make small tools that do one thing well and integrate well with other tools.
  • Release your experimental software early, even if it doesn’t yet work properly and seek help in finishing it. Later, maintain a stable version for most users and an experimental version for enthusiasts and early adopters.
  • Friendly competition with other tools is healthy, and should be based on the merits of the software and its community.
  • Developer experience matters just as much as features and benefits.

By the time anyone in a suit realised that there was money to be made in this Internet thing, open source software had an unassailable lead in the tools to create websites. Microsoft tried to create compelling tools like ASP.NET and IIS to lure web developers inside the Microsoft walled garden. They tried the old Embrace Extend and Extinguish trick with Internet Explorer, adding IE-only features to HTML and encouraging developers to make websites that only ran on Windows computers. But it was too little too late. Every exciting development in web technology – Ruby on Rails, Node.js, NoSQL and containers to name a few – has happened in Linux land. Microsoft’s once effective strategy of requiring people to buy into their product family wholesale is now their biggest weakness. Developers aren’t prepared to forgo the new and exciting technologies, so they tend to ignore all Microsoft products.

One of the hallmarks of open source web tools is that they’re focussed on providing a good experience for developers. Since the early 2000’s, large companies like Google and Amazon and smaller startups have been competing fiercely for engineering talent. In addition to paying well, one way that they compete is to adopt the open source tools that developers want to use. Requirements for Microsoft technologies like C#, .NET and IIS all but vanished off job adverts and were replaced with things like Python, Node.js and ${fashionable framework du jour}.

By 2010, nobody liked MS any more

But Microsoft isn’t stupid. They saw what was happening and responded with what appears to be a wholesale cultural change in the way that they deliver web tools. The new Microsoft tactic is softer than the old: make great products, let people combine your products freely with your competitors’ ones, and try and win the battle for market share based on quality, not lock-in.

To this end, the new .NET framework is a series of small open source modules that work on Windows, Mac and Linux. All development is done in the open – the Microsoft employees working on .NET all have GitHub accounts and commit their code to publicly available Git repositories like any other open source developer. You can go to, for example, the .NET Core repo and see all the design conversations and early attempts at implementation, and even contribute if you’re so inclined. This goes beyond .NET – Microsoft’s cloud hosting service Azure supports non-Microsoft frameworks like Node.js, and lets you choose between hosting your site on Windows or Linux. Visual Studio on a PC is now free for individual developers and they’ve released free code editors for Mac and Linux too. Quite a change from 2001 when Microsoft CEO Steve Ballmer said that “Linux is a cancer”.

The result is that Microsofts tools for making websites are now available to non Windows users like me, and I feel they’re making an concerted effort to demonstrate to me that they’re worth adopting. I’m aware of course that Microsoft hasn’t become a charity – the idea of all these free tools they’re giving me is still to lure me into the Microsoft product family so that I pay them lots of money. But the stick is gone, and only the carrot is left. It’s actually quite inspirational to see them fight back in such a positive manner, and I think I know why. Star Wars may be the most popular example of the “plucky rebels take on the evil empire” narrative, but it also has one of the most famous instances of another common movie trope: “bad guy sees the error of his ways at the last moment and and turns around to fight for Good”. Jaws is a better example of this trope but I wanted to keep the Star Wars theme. I’m talking about Jaws the James Bond character here, the shark was a dick right up to the end.

Part 2: hosting web applications

OK, so Microsoft was Evil and now they’re not. Well they also used to be overpriced, let’s see if that’s changed.

Traditionally, the most common model for web hosting is shared accounts on a web server.

The “web server” bit means that you just upload your files to the server, which is as simple as dragging a folder from your computer desktop, and they immediately become available over the Internet. If I connect to berniesumption.com and upload a file “somefile.txt”, that file becomes available at http://berniesumption.com/somefile.txt.

I had a picture of a web server here, but to be honest it didn’t carry the article as much as this IT-related kitten.

Web applications can be created by uploading files containing code, so if I upload a file called “hello.php” then when someone goes to http://berniesumption.com/hello.php the file is interpreted as a computer program written in the PHP language, and the result of running that program is shown to the user. It’s really simple – if you can arrange files on a computer, you can manage a website.

The “shared accounts” bit means that each project takes up space on the same server, and shares a common set of resources. You buy some resources from a web host – some amount of space and processing power. These resources are shared between all the sites that you upload to your account. In theory, if all the sites on your account experience a spike in traffic at the same time, they’ll all compete for the same resources and will slow down noticeably. But this is rare – a well written website should consume very few resources most of the time, and only very occasionally go into a brief flurry of activity when something changes, like adding a new page or being linked to from a popular site. Sharing resources for multiple sites is a great way to save money, and you always have the choice of putting important sites in their own account to guarantee that they won’t be affected by other sites.

Some resource sharing, yesterday
Some resource sharing, yesterday

This model worked really well for two reasons. Firstly it was super cheap, especially for hobbyists with many small sites, web design companies that host many client sites, and companies with one big site and a number of small microsites. Secondly, it was really easy to manage – it’s amazing what you can get up and running just by copying some files onto a server. For example, this blog you’re reading now runs on WordPress, and took me all of 5 minutes to set up.

Since around 2005, a number of new frameworks like Ruby on Rails and Django have become popular that break out of the traditional file-based web server model. These frameworks emphasise developer productivity, allowing you to throw together a prototype site in hours rather than days. It was a very exciting time to be a developer – how often do you become 2-5 times more productive practically overnight? But apps created using these new frameworks were much harder to deploy onto the Internet than traditional websites. Instead of copying files onto a web server you have to install, configure and run a program on your server, then set up all kinds of fiddly things like load balancing and scaling. In the early days this was a serious barrier to people adopting these new frameworks, so a new breed of web host product has arisen called Platform as a Service, or PaaS for short.

PaaS providers restore the ease of deployment that we had with file-based web servers. You upload some files to the PaaS provider and it figures out what kind of app you’ve created, installs and configures it, sets up fiddly things like load balancers, and makes it available on the Internet. They’re wonderful products, but in the transition from web servers to PaaS, the economies of shared hosting have been lost. PaaS providers seem to be targeting startup web companies who have a single valuable app. If you have 5 employees, then $50 per month for hosting is nothing if it saves you a day or two of server administration per month. But if you’re a hobbyist with a collection of small projects, you’re still going to pay $50 for each of them, and it quickly becomes crazy expensive.

As far as I can tell, there’s only one provider that uses the old shared hosting model of “buy some capacity and share it between all of your apps”. And that provider is Microsoft, with their Azure App Service.

azure

Again, I’m surprised. When it came to products like application hosting that are useful for big companies, The Microsoft Of Old was famous for pricing that focussed on the needs of big businesses and left hobbyists either pirating software or using alternatives.

Part 3: throwing together an app

So, Microsoft aren’t evil any more, and they’re cheap. But is their stuff good? As I said earlier, most of the exciting developments in web technology happened in Linux land, and Microsoft were stuck playing catch-up. Well it turns out that there’s an advantage to being the last one to arrive at the party.

Developers often use the phrase “throw together” when referring to building a prototype app, and the phrase is pretty descriptive of what happens. We cobble together a site by taking code from wherever we can find it and using it to build the repetitive parts of the app, because if we didn’t then we’d never get any app started, let alone finished.

A cobbled together website. Not actual size.
A cobbled together website. Not actual size.

Imagine you want to build a web app, it’s called Frobulator. Users log in and upload their widgets, and other users can frob those widgets. It’s a brilliant idea, timed just right to capitalise on this frobbing craze that the kids are so wild about these days. But you need to get it done fast while it’s still relevant. You write down some requirements and there are only 5 key features that are essential for the first version. Great, you can get this done in a weekend. You start coding.

Day 1. Item 1: “users can log into the website”.

Here’s where you meet a pattern that happens over and over in software development: the requirement that looks really simple but turns out to be a lot of work. To log in with a username and password you’ll need a login form. Then you’ll need a signup form for new users. And you probably want to validate the email address that people enter by requiring that they click a link that you send to them in an email. You’ll need to store the passwords in a cryptographically secure way so that even if you get hacked, the hacker can’t get the original passwords back. And send password reset emails if people forget their login. And if you’re in the EU you’ll be legally required to provide a way for an administrator to delete a user account and associated data if requested to. Then there are a lot of optional but really nice to have features like logging in with Facebook. Behind the requirement “users can log into the site” there are tens of features that must be implemented and hundreds of ways to mess up the user experience or security. That’s why good developers on tight budgets don’t do this themselves, they use existing frameworks that do it for you.

I’ve used login as an example, but there are loads of other examples of similar issues that look simple, turn out to be complex, and are only tangentially related to the actual user experience you’re trying to build.

Again, drawing a blank for relevant images here, have a pot bellied piglet.
Again, drawing a blank for relevant images here, have a pot bellied piglet.

One popular way of getting user login and similar features “for free” is to build your site as a plugin to a content management system that provides all of these features. WordPress and Drupal are great for this purpose if you like PHP (perhaps you’re nostalgic for what software development was like in the 90’s). Other languages have good CMSes too, though it’s surprising how far ahead the PHP ones are given that PHP is arguably the worst designed language ever to achieve popularity. Building an app on top of a CMS is a great way to get started quickly, but more often than not you end up fighting the CMS because it thinks you want a content managed website, and really all you wanted was a login box that worked properly.

So what you want is a language or framework that lets you pick and choose useful modules to add to your site so that you can throw together a prototype. One that lets you use, for example, just the user login system and the database access layer but not the whole kitchen sink. It needs to provide common features like login as part of its core feature set, and also have a good community of developers creating extensions to cover the more esoteric needs. It has to be well documented, and have a helpful community of users to give you a hand if you get stuck on a detail.

I can think of exactly three such frameworks: Django, Meteor and ASP.NET (and arguably Ruby on Rails). Of these, ASP.NET has recently been completely rewritten as part of the effort to support Mac and Linux. It’s incredibly well designed (it ought to be, as a new product it can learn from the mistakes that the competitors made). It’s well documented. And it’s the fastest too.

The bottom lines

They say that every cell in the body is replaced in ten years, so we’re effectively new people every decade. It’s not true, but they say it. The point is, I wonder what the staff turnover rate at a company like Microsoft is? I suspect a company like that turns over most of its staff every 10-15 years or so, which means that the Microsoft of today really is a different company to the Microsoft of 2000.

It’s early days, but I figure The Borg deserves another chance. It’s hot, there’s an iced pitcher of Kool-Aid on the Microsoft table, and it’s looking tasty.

koolaid

]]>
Making DNArtwork #4: the pre-launch page https://blog.berniesumption.com/software/dnartwork/making-dnartwork-4-the-pre-launch-page/ Tue, 05 Jul 2016 19:28:12 +0000 http://blog.berniesumption.com/?p=905 Since my last post where I decided that my DNArtwork prints should look somewhat like Kandinsky paintings, I’ve been busily coding away, making a computer program that can generate the individual shapes from which a composition can be built up. I’m still near the start of a long road that will hopefully end with a fully functional product later in 2016. But one modern trend in startups that I fully agree with is that its never too early to start selling, and to that end I’ve put a up pre-launch page at dnartwork.com to introduce the product and collect email addresses.

If all goes to plan there’ll be a real website there in a few months, so I’d best take a screenshot for posterity:

DNArtwork pre launch website screenshot

The main element of this page is a mockup of an artwork made by manually stitching together some shapes produced by my DNArtwork program and printing them on photo canvas.

This is not what the final artworks will look like, and it’s not generated from anyone’s DNA, but it’s a good enough indication of the direction I’m heading in that it can be the central piece of the pre-launch page.

I was dead chuffed with how nice the brushstroke effect looks when printed on real canvas, so I added a magnifier effect to the pre-launch page to show off the texture.


For more posts in this series, check out the DNArtwork category on this blog.

]]>
Making DNArtwork #3: what will it look like? https://blog.berniesumption.com/software/dnartwork/dnartwork-visual-style/ Fri, 17 Jun 2016 15:37:15 +0000 http://blog.berniesumption.com/?p=884 Today I’m going to write about the challenges in selecting an artistic style for genetic artwork, and at the end of the post give a sneak preview of my early work.

In the last post in this series I looked at the existing offerings in the personalised genetic artwork market and decided that there’s a gap in the market for something that:

  1. Looks like art, not forensic science
  2. Shows relatedness between people – more related people should have more similar artworks

I’ll go into more detail about how item 2 will work in the next post, so for the moment please take two facts for granted:

Firstly, it’s possible to define how related two people as a percentage: you’re 100% related to yourself or an identical twin, 50% related to your siblings, 25% to grandparents and half siblings, and the number gets lower for more distant relatives.

Secondly, most genetic art is currently based on “genetic fingerprint” images like the one to the left, processed on a computer to make them look more attractive. The reason that this image tells you how related two people are is that each column represents a gene and can have one of three possible flavours, with each flavour producing a different pattern of lines. People who are more related will tend to have the same flavour for a greater proportion of their genes, so more of the columns on their image will look the same.

What I want is a style of artwork that works like a genetic fingerprint, but looks like art.

Art: where to start?

Creating something that looks like art isn’t hard. The trick is that you start with art, then modify it to contain genetic information. This is as opposed to the approach taken by most genetic art available today which is to start with the genetic information and modify it to look like art, a process that in my opinion only Genetic Ink has successfully achieved.

I’m not a natural artist myself, more of a craftsman, so I need an artistic collaborator who can define a distinctive graphic style that can be reproduced by a computer program and modified to contain genetic information. One day I’d love to partner with an artist and create something truly original, but for this summer project I’m going to pick an existing artist and “borrow” their style.

Not all art is suitable. In order to indicate relatedness I need an art style that naturally breaks down into many distinct sections or objects that can be independently changed, like the columns on the genetic fingerprint image at the shown earlier. The more related you are to someone, the more of the sections in your artwork will be identical.

First, here are some styles that wouldn’t be suitable:

.
Left: Georgia O’Keeffe, “Blue Morning Glories”; Right: Willem de Kooning, “Excavation”

O’Keeffe’s painting depicts an object and so doesn’t split apart into individual components. In fact, I can rule out pretty much all figurative art for this reason. Not all abstract art is suitable either. De Kooning’s painting above is in improvement over O’Keeffe’s in that it consists of many discrete shapes, but they run into another, and they’re all fairly similar.

After spending a few lovely afternoons trawling around London’s art galleries, I settled on the art of Wassily Kandinsky:

A selection of Kandinsky's works.
A selection of Kandinsky’s works.

There are a few great things about Kandinsky that make his style suitable:

  1. Each painting is composed of discrete objects
  2. Each object has a distinct personality and is quite recognisable – much more so than the lines in a genetic fingerprint.
  3. The objects are made from simple lines, geometric shapes and flat colours, which are much easier for a computer program to reproduce than, for example, de Kooning’s fuzzy sketching.
  4. Kandinsky died in 1944, meaning that the copyright on his works expired in 2014 in the UK. It’s a grey area whether what I’m doing counts as “copying”, but I really don’t want to get on the wrong side of a living artist or (probably worse) a deceased artist’s estate.

A sneak preview

So I set about identifying some of the objects in Kandinsky’s paintings. Kandinsky developed a distinct visual language consisting of recognisable shapes that appear regularly in different variations. One such shape, which I call the checkermoustache, appears in many of his paintings and twice in his most famous work, Composition VIII:

Composition VIII
The checkermoustache is the black/white grid with two arms extending away from each other at 90 degrees and certain squares filled in with complementary colours. There’s one in the centre-left and one in the upper-right.

Two days into the ten that I’ve allocated to the program that generates artwork, and here some early results. I think that the oil painting effect is important to make the end result look less like a computer graphic.


For more posts in this series, check out the DNArtwork category on this blog.

]]>
Making DNArtwork #2: competitor analysis https://blog.berniesumption.com/software/dnartwork/dnartwork-competitor-analysis/ https://blog.berniesumption.com/software/dnartwork/dnartwork-competitor-analysis/#comments Wed, 15 Jun 2016 16:51:46 +0000 http://blog.berniesumption.com/?p=861 There are quite a few companies offering artwork generated from DNA. This is a good thing – unique ideas are overrated, and if you come up with an idea that nobody else seems to have tried yet, it’s less likely that you’re a special genius that had the idea first and more likely that the idea doesn’t work!

Here’s a tour of the currently available offerings, during which I’ll analyse the pros and cons of each and build up a wish list of features for the perfect DNA-based artwork.

Genetic fingerprints

This is the classic image of DNA testing you may have seen illustrating stories about forensic analysis and paternity testing:

Gel electrophoresis image
People call this “genetic fingerprinting” because that’s easier to say than “restriction fragment length polymorphism genotyping by gel electrophoresis”.

These images are produced in a lab using gel electrophoresis, photographed, coloured on a computer and printed. The end result looks like this:

PlayDNA's Classic Portrait
PlayDNA’s Classic Portrait

This is the most common kind of portrait, offered by PlayDNAGenetic Photos, DNA 11Easy DNA and DNA effect to name a few.

Pros: the great thing about this style of artwork is its ability to show relatedness. If a mother, father and child all get an artwork, the child’s artwork will visibly be a combination of the two parents’ artworks*. This is Requirement 1: genetic art should illustrate relatedness.

Cons: there’s very little flexibility in presentation, because the layout of the image is determined by a lab process that was designed to yield information, not an aesthetically pleasing layout. Therefore while it certainly looks interesting, I don’t think it’s beautiful. In fact to my eyes it’s medical / clinical. In order to appreciate its value you need an explanation of what it is. Hence Requirement 2: genetic art should look like art – something you might want to hang on your wall even if you didn’t know about its genetic meaning.

* If it’s not, then mummy got some ‘splaining to do.

Textual base sequences

Your genetic code is essentially a giant length of ticker tape, 3 billion characters long, written in an alphabet of 4 letters: A, T C and G. One popular way of visualising your DNA is to simply print out these letters. The company Genetic Photos has several products that work like this:

Base sequence
Base sequence artworks available from Genetic Photos – from pop art to letters etched in crystal.

Pros: this is substantially more flexible than the genetic fingerprints, allowing a variety of graphic treatments and the possibility to create something beautiful.

Cons: humans are over 99.5% genetically identical, and this style only looks at a small section of DNA, so if two people order the same style of artwork then their artworks will look either very similar or absolutely identical. Siblings are especially likely to have identical sequences. Requirement 3 is therefore: genetic artwork must emphasise genetic differences and gloss over the similarities.

Graphical base sequences

Check out this beautiful offering from Genetic Ink. It is in my opinion the best genetic artwork currently available, because it looks like art.

geneticink-spark-art

Technically this is very similar to textual base sequences in that it is a representation of the individual letters of your genetic code. But it shows what you can achieve if you abstract away the raw data and allow designer or artist free rein to create something that you’d want to hang on your wall.

Pros: first you see the beauty, then the genetic meaning is accessible if you know how to analyse the image. This is the Right Way Round for something that I’m going to hang on my wall and be around for many years, and it’s really just another way of expressing the second requirement: that genetic art should look like normal art.

Cons: Doesn’t fix the big issue with textual base sequences: siblings are likely to have identical portraits.

A note on costs

Finally, a note on costs. Most of artworks described above start at around £200 including shipping, even for small artworks. Larger artworks cost much more. This is way beyond impulse buy thresholds, and therefore most of the above companies make a big deal about how this is a special and unique object to treasure forever. I’ll wager that for every person who’d be prepared to spend £200 on a unique object to treasure forever, there are twenty who’d spend £50 on a fun little present for that cousin who you can never think of a good present for.

The reason for this expense is simple – all these companies sell a DNA test as part of the artwork package, and that test is expensive requiring lab time and return postage. The solution is to sell the artwork separately from the test.

23andMe is a personal genetic testing company that has already tested over a million people, ancestry.com has given DNA tests to 1.4 million people. All these people should be able to get an artwork without paying for another test.

As far as I can see, only one company – DNA 11 – offers the ability to import your genetic testing results from other testing companies, thus bringing the cost down into impulse buy territory. This is the 4th requirement: don’t make people buy a new test.

Summary

So there you have it, based on the current offerings, the perfect genetic art will:

  1. Illustrate relationships. My artwork should look totally different to a stranger’s, but quite similar to my sister’s.
  2. Look like a work of art. It can’t rely on it’s genetic meaning to make it worth taking up space in my living room, it should be something I want to put on my wall in its own right.
  3. Only show differences between people, and ignore the bits of DNA that are the same among all humans.
  4. Be priced within impulse-buy thresholds, at least for people who already have a DNA test.

Now here’s the cool thing about the current state of the genetic art market: as far as I can tell, there’s nothing out there that is doing all of this. I spot an opportunity!


For more posts in this series, check out the DNArtwork category on this blog.

]]>
https://blog.berniesumption.com/software/dnartwork/dnartwork-competitor-analysis/feed/ 1
Summer Project: Generating Artwork From DNA https://blog.berniesumption.com/software/dnartwork/generating-artwork-from-dna/ Wed, 15 Jun 2016 13:06:26 +0000 http://blog.berniesumption.com/?p=860 For the last few years, I’ve had an idea floating around at the back of my mind. One day, I tell myself, I’m going to create a website selling artwork generated from DNA. Well this summer, I’m going to make it happen!

Here’s how it would work: each customer that purchases an artwork is sent a DNA testing kit, and the results from the DNA test are used to generate an (almost) totally unique artwork suitable for framing. I say “almost” unique, because if one person were to sneakily test the system by purchasing two artworks using different names and payment details then they should get back two identical artworks. Same if two identical twins both purchase artworks. Because that’s how DNA works yo.

This idea is hardly unique – a Google search for artwork from DNA yields many companies already doing this, and in a future post I’ll review the companies already operating in this space and define how my offering will be different.

In the mean time, this post represents the kicking-off of a summer project to build this site. I’m keen to actually release a working product instead of just some interesting tech demos, and to that end here’s my plan:

Build in public

I’m going to be quite open about the design and build process, blogging every step of the way. I believe that ideas are worth very little and good execution is everything, so I have nothing to fear from not keeping my ideas and techniques secret. Being open about how this service is built will help potential customers trust that it really is doing what it claims to do. And making a public declaration that I’ve started the project is a great way to pressure me to finish it :o)

“Perfect” is the enemy of “done”

I have a perfectionism problem. I’m the kind of guy who’ll re-read an email twice before sending, making minor edits and wasting half an hour on it when ten minutes would have sufficed. This product is a relatively complex system and in order to work at all, several systems need to be in place:

  1. Website: sales pitch, customer registration, account management, support forum.
  2. Genetic analysis: obtaining DNA results from customers and processing this into a form suitable for generating artwork.
  3. Artwork: creating a computer program that generates artistic images from the results of genetic analysis.
  4. Print purchasing: integrating with a payment provider and print-on-demand supplier to print and post artworks to customers.

To prevent me from spending too much time trying to perfect any one component I’m going to timebox each of these systems to two weeks, meaning that the whole site should take 2 months.

Use lean principles

The core tenet of The Lean Startup is that the function of any startup is to discover whether an idea can form the basis of a sustainable business, as quickly and cheaply as possible. As such, I’ve already failed since this project is totally a “because I can” enterprise that I don’t expect to make much money from. Still, there are a few important principles to nab from the Lean movement:

  1. Focus on a Minimum Viable Product – don’t let “nice to have” features delay the launch of your core offering.
  2. Start selling immediately – even before your product is ready, put up a sales page and collect email addresses of potential future customers.
  3. Improve your product based on feedback, not intuition – right from the start, make it easy for people to come to you with suggestions and support requests.

For more posts in this series, check out the DNArtwork category on this blog.

]]>