Planet Debian

Subscribe to Planet Debian feed
Planet Debian -
Updated: 24 min 15 sec ago

Chris Lamb: Free software activities in October 2016

1 November, 2016 - 03:48

Here is my monthly update covering what I have been doing in the free software world (previously):

  • Made a large number of improvements to, my hosted service for projects that host their Debian packaging on GitHub to use the Travis CI continuous integration platform to test builds on every code change:
    • Enabled the use of Git submodules. Thanks to @unera & @hosiet. (#30)
    • Managed a contribution from @xhaakon to allow adding an extra repository for custom dependencies. (#17)
    • Fixed an issue where builds did not work under Debian Wheezy or Ubuntu Trusty due to a call to dpkg-buildpackage --show-field. (#28)
    • Fixed an issue where TRAVIS_DEBIAN_EXTRA_REPOSITORY was accidentally required. (#27)
    • Made a number of miscellaneous cosmetic improvements. (f7e5b080 & 037de91cc, etc.)
  • Submitted a pull request to Alabaster, the default theme for the Python Sphinx documentation system, to ensure that "extra navigation links" are rendered reproducibly. (#90)
  • Improved my Chrome extension for the FastMail web interface:
    • Managed a pull request from @jlerner to add an optional confirmation dialogue before sending any message. (#10)
    • Added an optional Ctrl+Enter alias for Alt+Enter to limit searches to the current folder; the latter shortcut is already mapped by my window manager. (d691b07)
    • Various cosmetic changes to the options page. (7b95e887 & 833ff0fe)
  • Submitted two pull requests to mypy, an experimental static type checker for Python:
    • Ensure that the output of --usage is reproducible. (#2234)
    • Update the --usage output to match the — now-reproducible — output. (#2235)
  • Updated django-slack, my library to easily post messages to the Slack group-messaging utility:
    • Merged a feature from @lvpython to add an option to post the message as the authenticated user rather than the specified one. (#59)
    • Merged a documentation update from @ataylor32 regarding the new method of generating access tokens. (#58)
  • Made a number of cosmetic improvements to AptFs, my FUSE-based filesystem that provides a view on unpacked Debian source packages as regular folders.
  • Updated the SSL certificate for, a hosted version of the diffoscope in-depth and content-aware diff utility. Continued thanks to Bytemark for sponsoring the hardware.

Debian & Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most GNU/Linux distributions provide binary (or "compiled") packages to end users. The motivation behind the Reproducible Builds effort is to allow verification that no flaws have been introduced — either maliciously and accidentally — during this compilation process by promising identical binary packages are always generated from a given source.

  • Presented a talk entitled "Reproducible Builds" talk at Software Freedom Kosova, in Prishtina, Republic of Kosovo.

  • I filed my 2,500th bug in the Debian BTS: #840972: golang-google-appengine: accesses the internet during build.

  • In order to build packages reproducibly, one not only needs identical sources but also some external and sharable definition of the environment used for a particular build, stipulating such things such as the version numbers of the required build-dependencies.

    It is not currently clear how to handle these .buildinfo files after the archive software has processed them and how to make them available to the world so I started development on a proof-of-concept server to see what issues arise in practice. It is available at

  • Chaired an IRC meeting and ran a poll to determine a regular time .

  • Submitted two design proposals to our wiki page.

  • Improvements to our testing framework:

    • Move regular "Scheduled in..." messages to the #debian-reproducible-changes IRC channel.
    • Use our log_info method instead of manual echo calls.
    • Correct an "all sources packages" → "all source packages" typo.
    • Submit .buildinfo files to
    • Create GPG key on nodes for at deploy time, not "lazily".

My work in the Reproducible Builds project was also covered in our weekly reports. (#75, #76, #77 & #78).

I also submitted 14 patches to fix specific reproducibility issues in bio-eagle, cf-python, fastx-toolkit, fpga-icestorm, http-icons, lambda-align, mypy, playitslowly, seabios, stumpwm, sympa, tj3, wims-help & xotcl.

Debian LTS

This month I have been paid to work 13 hours on Debian Long Term Support (LTS). In that time I did the following:

  • Seven days of "frontdesk" duties, triaging CVEs, etc.
  • Issued DLA 647-1 for freeimage correcting an out-of-bounds write vulnerability in the XMP image handling functionality.
  • Issued DLA 649-1 for python-django fixing a possible CSRF protection bypass on sites that use Google Analytics.
  • Issued DLA 654-1 for libxfixes preventing an integer overflow when a malicious client sent INT_MAX as a "length".
  • Issued DLA 662-1 for quagga correcting a programming error where two constants were confused that could cause stack overrun in IPv6 routing code.
  • Issued DLA 688-1 for cairo to prevent a DoS attack where a malicious SVG could generate invalid pointers.
Patches contributed
  • gunicorn:
    • 19.6.0-7 — Set supplementary groups when changing uid, add an example systemd .service file to gunicorn-examples, and expand README.Debian to make it clearer what to do now that /etc/gunicorn.d has been removed.
    • 19.6.0-8 — Correct previous supplementary groups patch to be compatible with Python 3.
  • redis:
    • 3:3.2.4-2 — Ensure that sentinel's configuration actually writes to a pidfile location so that systemd can detect that the daemon has started.
    • 3:3.2.5-1 — New upstream release.
  • libfiu:
    • 0.94-8 — Fix FTBFS under Bash due to lack of && in debian/rules.
    • 0.94-9 — Ensure the build is reproducible by sorting injected modules.
  • aptfs (2:0.8-2) — Minor cosmetic changes.

Sponsored uploads
  • libxml-dumper-perl (0.81-1.2) — Move away from a unsupported debhelper compat level 4.
  • netatalk (2.2.5-1.1) — Drop build-dependency on hardening-includes.

QA uploads
  • anon-proxy (00.05.38+20081230-4) — Move to a supported debhelper compatibility level 9.
  • ara (1.0.32) — Make the build reproducible.
  • binutils-m68hc1x (1:2.18-8) — Make the build reproducible & move to a supported debhelper compatibility level.
  • fracplanet (0.4.0-5) — Make the build reproducible.
  • libnss-ldap (265-5) — Make the build reproducible.
  • python-uniconvertor (1.1.5-3) — Fix an "option release requires an argument" FTBFS. (#839375)
  • ripole (0.2.0+20081101.0215-3) — Actually include the ripole binary in package. (#839919) & enable hardening flags.
  • twitter-bootstrap (2.0.2+dfsg-10) — Fix incorrect copyright formatting when building under Bash. (#824592)
  • zpaq (1.10-3) — Make the build reproducible.
Bugs filed (without patches)

I additionally filed 7 bugs for packages that access the internet during build against berkshelf, golang-google-appengine, node-redis, python-eventlet, python-keystoneclient, python-senlinclient & tornado-pyvows.

RC bugs

I also filed 65 FTBFS bugs against android-platform-external-jsilver, auto-multiple-choice, awscli,, bgpdump, cacti-spine, cucumber, check, debci, eximdoc4, freetennis, freezegun, gatos, git/gnuit, gnucash, grads, haskell-debian, haskell-hsopenssl-x509-system, homesick, ice-builder-gradle, kscreen, latex-cjk-japanese-wadalab, libdbd-firebird-perl, libgit2, libp11, libzypp, mozart-stdlib, mqtt-client, mtasc, musicbrainzngs, network-manager-openvpn, network-manager-vpnc, nim, node-lodash, node-once, npgsql, ocamlbuild, ocamldsort, ohai, partclone, plaso, polyglot-maven, projectreactor, python-launchpadlib, python-pygraphviz, python-pygraphviz, python-pygraphviz, python-textile, qbittorrent, qbrew, qconf, qjoypad, rdp-alignment, reel, ruby-foreman, ruby-gettext, ruby-gruff, ruby-rspec-rails, samtools, sbsigntool, spock, sugar, taglib-extras, tornado-pyvows, unifdef, virt-top, vmware-nsx & zshdb.

Debian FTP Team

As a Debian FTP assistant I ACCEPTed 147 packages: ace-link, amazon-s2n, avy, basez, bootstrap-vz, bucklespring, camitk, carettah, cf-python, debian-reference, dfcgen-gtk, efivar, entropybroker, fakesleep, gall, game-data-packager, gitano, glare, gnome-panel, gnome-shell-extension-dashtodock, gnome-shell-extension-refreshwifi, gnome-shell-extension-remove-dropdown-arrows, golang-github-gogits-go-gogs-client, golang-github-gucumber-gucumber, golang-github-hlandau-buildinfo, golang-github-hlandau-dexlogconfig, golang-github-hlandau-goutils, golang-github-influxdata-toml, golang-github-jacobsa-crypto, golang-github-kjk-lzma, golang-github-miekg-dns, golang-github-minio-sha256-simd, golang-github-nfnt-resize, golang-github-nicksnyder-go-i18n, golang-github-pointlander-compress, golang-github-pointlander-jetset, golang-github-pointlander-peg, golang-github-rfjakob-eme, golang-github-thecreeper-go-notify, golang-github-twstrike-gotk3adapter, golang-github-unknwon-goconfig, golang-gopkg-dancannon-gorethink.v1, golang-petname, haskell-argon2, haskell-binary-parsers, haskell-bindings-dsl, haskell-deriving-compat, haskell-hackage-security, haskell-hcwiid, haskell-hsopenssl-x509-system, haskell-megaparsec, haskell-mono-traversable-instances, haskell-prim-uniq, haskell-raaz, haskell-readable, haskell-readline, haskell-relational-record, haskell-safe-exceptions, haskell-servant-client, haskell-token-bucket, haskell-zxcvbn-c, irclog2html, ironic-ui, lace, ledger, libdancer2-plugin-passphrase-perl, libdatetime-calendar-julian-perl, libdbix-class-optimisticlocking-perl, libdbix-class-schema-config-perl, libgeo-constants-perl, libgeo-ellipsoids-perl, libgeo-functions-perl, libgeo-inverse-perl, libio-async-loop-mojo-perl, libmojolicious-plugin-assetpack-perl, libmojolicious-plugin-renderfile-perl, libparams-validationcompiler-perl, libspecio-perl, libtest-time-perl, libtest2-plugin-nowarnings-perl, linux, lua-scrypt, mono, mutt-vc-query, neutron, node-ansi-font, node-buffer-equal, node-defaults, node-formatio, node-fs-exists-sync, node-fs.realpath, node-is-buffer, node-jison-lex, node-jju, node-jsonstream, node-kind-of, node-lex-parser, node-lolex, node-loud-rejection, node-random-bytes, node-randombytes, node-regex-not, node-repeat-string, node-samsam, node-set-value, node-source-map-support, node-spdx-correct, node-static-extend, node-test, node-to-object-path, node-type-check, node-typescript, node-unset-value, nutsqlite, opencv, openssl1.0, panoramisk, perl6, pg-rage-terminator, pg8000, plv8, puppet-module-oslo, pymoc, pyramid-jinja2, python-bitbucket-api, python-ceilometermiddleware, python-configshell-fb, python-ewmh, python-gimmik, python-jsbeautifier, python-opcua, python-pyldap, python-s3transfer, python-testing.common.database, python-testing.mysqld, python-testing.postgresql, python-wheezy.template, qspeakers, r-cran-nleqslv, recommonmark, rolo, shim, swift-im, tendermint-go-clist, tongue, uftrace & zaqar-ui.

Antoine Beaupré: My free software activities, October 2016

1 November, 2016 - 03:15
Debian Long Term Support (LTS)

This is my 7th month working on Debian LTS, started by Raphael Hertzog at Freexian, after a long pause during the summer.

I have worked on the following packages and CVEs:

I have also helped review work on the following packages:

  • imagemagick: reviewed BenH's work to figure out what was done. unfortunately, I forgot to officially take on the package and Roberto started working on it in the meantime. I nevertheless took time to review Roberto's work and outline possible issues with the original patchset suggested
  • tiff: reviewed Raphael's work on the hairy TIFFTAG_* issues, all the gory details in this email

The work on ImageMagick and GraphicsMagick was particularly intriguing. Looking at the source of those programs makes me wonder why were are still using them at all: it's a tangled mess of C code that is bound to bring up more and more vulnerabilities, time after time. It seems there's always an "Magick" vulnerability waiting to be fixed out there... I somehow hoped that the fork would bring more stability and reliability, but it seems they are suffering from similar issues because, fundamentally, they haven't rewritten ImageMagick...

It looks this is something that affects all image programs. The review I have done on the tiff suite give me the same shivering sensation as reviewing the "Magick" code. It feels like all image libraries are poorly implemented and then bound to be exploited somehow... Nevertheless, if I had to use a library of the sort in my software, I would stay away from the "Magick" forks and try something like imlib2 first...

Finally, I also did some minor work on the user and developer LTS documentation and some triage work on samba, xen and libass. I also looked at the dreaded CVE-2016-7117 vulnerability in the Linux kernel to verify its impact on wheezy users. I also looked at implementing a --lts flag for dch (see bug #762715).

It was difficult to get back to work after such a long pause, but I am happy I was able to contribute a significant number of hours. It's a bit difficult to find work sometimes in LTS-land, even if there's actually always a lot of work to be done. For example, I used to be one of the people doing frontdesk work, but those duties are now assigned until the end of the year, so it's unlikely I will be doing any of that for the forseable future. Similarly, a lot of packages were assigned when I started looking at the available packages. There was an interesting discussion on the internal mailing list regarding unlocking package ownership, because some people had packages locked for weeks, sometimes months, without significant activity. Hopefully that situation will improve after that discussion.

Another interesting discussion I participated in is the question of whether the LTS team should be waiting for unstable to be fixed before publishing fixes in oldstable. It seems the consensus right now is that it shouldn't be mandatory to fix issues in unstable before we fix security isssues in oldstable and stable. After all, security support for testing and unstable is limited. But I was happy to learn that working on brand new patches is part of our mandate as part of the LTS work. I did work on such a patch for tar which ended up being adopted by the original reporter, although upstream ended up implementing our recommendation in a better way.

It's coincidentally the first time since I start working on LTS that I didn't get the number of requested hours, which means that there are more people working on LTS. That is a good thing, but I am worried it may also mean people are more spread out and less capable of focusing for longer periods of time on more difficult problems. It also means that the team is growing faster than the funding, which is unfortunate: now is a good time as any to remind you to see if you can make your company fund the LTS project if you are still running Debian wheezy.

Other free software work

It seems like forever that I did such a report, and while I was on vacation, a lot has happened since the last report.


I have done extensive work on Monkeysign, trying to bring it kicking and screaming in the new world of GnuPG 2.1. This was the objective of the 2.1 release, which collected about two years of work and patches, including arbitrary MUA support (e.g. Thunderbird), config files support, and a release on PyPI. I have had to release about 4 more releases to try and fix the build chain, ship the test suite with the program and have a primitive preferences panel. The 2.2 release also finally features Tor suport!

I am also happy to have moved more documentation to Read the docs, part of which I mentionned in in a previous article. The git repositories and issues were also moved to a Gitlab instance which will hopefully improve the collaboration workflow, although we still have issues in streamlining the merge request workflow.

All in all, I am happy to be working on Monkeysign, but it has been a frustrating experience. In the last years, I have been maintaining the project largely on my own: although there are about 20 contributors in Monkeysign, I have committed over 90% of the commits in the code. New contributors recently showed up, and I hope this will release some pressure on me being the sole maintainer, but I am not sure how viable the project is.

Funding free software work

More and more, I wonder how to sustain my contributions to free software. As a previous article has shown, I work a lot on the computer, even when I am not on a full-time job. Monkeysign has been a significant time drain in the last months, and I have done this work on a completely volunteer basis. I wouldn't mind so much except that there is a lot of work I do on a volunteer basis. This means that I sometimes must prioritize paid consulting work, at the expense of those volunteer projects. While most of my paid work usually revolves around free sofware, the benefits of paid work are not always immediately obvious, as the primary objective is to deliver to the customer, and the community as a whole is somewhat of a side-effect.

I have watched with interest joeyh's adventures into crowdfunding which seems to be working pretty well for him. Unfortunately, I cannot claim the incredible (and well-deserved) reputation Joey has, and even if I could, I can't live with 500$ a month.

I would love to hear if people would be interested in funding my work in such a way. I am hesitant in launching a crowdfunding campaign because it is difficult to identify what exactly I am working on from one month to the next. Looking back at ?tag/debian-lts shows that I am all over the place: one month I'll work on a Perl Wiki (Ikiwiki), the next one I'll be hacking at a multimedia home cinema (Kodi). I can hardly think of how to fund those things short of "just give me money to work on anything I feel like", which I can hardly ask for of anyone. Even worse, it feels like the audience here is either friends or colleagues. It would make little sense for me to seek funding from those people: colleagues have the same funding problems I do, and I don't want to empoverish my friends...

So far I have taken the approach of trying to get funding for work I am doing, bit by bit. For example, I have recently been told that LWN actually pays for contributed articles and have started running articles by them before publishing them here. So far this is looking good: they will publish an article I wrote about the Omnia router I have recently received. I give them exclusive rights on the article for two weeks, but I otherwise retain full ownership over the article and will publish them after the exclusive period here.

Hopefully, I will be able to find more such projects that pays for the work I do on a day to day basis.

Open Street Map editing

I have ramped up my OpenStreetMap contributions, having (temporarily) moved to a different location. There are lots of things to map here: trails, gaz stations and lots of other things are missing from the map. Sometimes the effort looks a bit ridiculous, reminding me of my early days of editing OSM. I have registered to OSM Live, a project to fund OSM editors that, I must admit, doesn't help much with funding my work: with the hundreds of edits I did in October, I received the equivalent of 1.80$CAD in Bitcoins. This may be the lowest hourly salary I have ever received, probably going at a rate of 10¢ per hour!

Still, it's interesting to be able to point people to the project if someone wants to contribute to OSM mappers. But mappers should have no illusions about getting a decent salary from this effort, I am sorry to say.


I feel this is similar to the "bounty" model used by the Borg project: I claimed around $80USD in that project for what probably amounts to tens of hours of work, yet another salary that would qualify as "poor".

Another example is a feature I would like to implement in Borg: support for protocols other than SSH. There is currently no bounty on this, but a similar feature, S3 support has one of the largest bounties Borg has ever seen: $225USD. And the claimant for the bounty hasn't actually implemented the feature, instead backing up to S3, the patch (to a third-party tool) actually enables support for Amazon Cloud Drive, a completely different API.

Even at $225, I wouldn't be able to complete any of those features and get a decent salary. As well explained by the Snowdrift reviews, bounties just don't work at all... The ludicrous 10% fee charged by Bountysource made sure I would never do business with them ever again anyways.

Other work

There are probably more things I did recently, but I am having difficulty keeping track of the last 5 months of on and off work, so you will forgive that I am not as exhaustive as I usually am.

James Bromberger: The Debian Cloud Sprint 2016

31 October, 2016 - 21:27

I’m at an airport, about to board the first of three flights across the world, from timezone +8 to timezone -8. I’ll be in transit 27 hours to get to Seattle, Washington state. I’m leaving my wife and two young children behind.

My work has given me a days’ worth of leave under the Corporate Social Responsibility program, and I’m taking three days’ annual leave, to do this. 27 hours each way in transit, for 3 days on the ground.



I started playing in technology as a kid in the 1980s; my first PC was a clone (as they were called) 286 running MS-DOS. It was clunky, and the most I could do to extend it was to write batch scripts. As a child I had no funds for commercial compilers, no network connections (this was pre Internet in Australia), no access to documentation, and no idea where to start programming properly.

It was a closed world.

I hit university in the summer of 1994 to study Computer Science and French. I’d heard of Linux, and soon found myself installing the Linux distributions of the day. The Freedom of the licensing, the encouragement to use, modify, share, was in stark contrast to the world of consumer PCs of the late 1980’s.

It was there at the UCC at UWA I discovered Debian. Some of the kind network/system admins at the University maintained a Debian mirror on the campus LAN, updated regularly and always online. It was fast, and more importantly, free for me to access. Back in the 1990s, bandwidth in Australia was incredibly expensive. The vast distances of the country mean that bandwidth was scarce. Telcos were in races to put fiber between Perth and the Eastern States, and without that in place, IP connectivity was constrained, and thus costly.

Over many long days and nights I huddled down, learning window managers, protocols, programming and scripting languages. I became… a system/network administrator, web developer, dev ops engineer, etc. My official degree workload, algorithmic complexity, protocol stacks, were interesting, but fiddling with Linux based implementations was practical.


After years of consuming the output of Debian – and running many services with it – I decided to put my hand up and volunteer as a Debian Developer: it was time to give back. I had benefited from Debian, and I saw others benefit from it as well.

As the 2000’s started, I had my PGP key in the Debian key ring. I had adopted a package and was maintaining it – load balancing Apache web servers. The web was yet to expand to the traffic levels you see today; most web sites were served from one physical web server. Site Reliability Engineering was a term not yet dreamed of.

What became more apparent was the applicability of Linux, Open Source, and in my line-of-sight Debian to a wider community beyond myself and my university peers. Debain was being used to revive recycled computers that were being donated to charities; in some cases, unable to transfer commercial software licenses with the hardware that was no longer required by organisations that had upgraded. It appeared that Debian was being used as a baseline above which society in general had access to fundamental capability of computing and network services.

The removal of subscriptions, registrations, and the encouragement of distribution meant this occurred at rates that could never be tracked, and more importantly, the consensus was that it should not be automatically tracked. The privacy of the user is paramount – more important than some statistics for the Developer to ponder.

When the Bosnia-Herzegovina war ended in 1995, I recall an email from academics there, having found some connectivity, writing to ask if they would be able to use Debian as part of their re-deployment of services for the Tertiary institutions in the region. This was an unnecessary request as Debian GNU/Linux is freely available, but it was a reminder that, for the country to have tried to procure commercial solutions at that time would have been difficult. Instead, those that could do the task just got on with it.

There’s been many similar project where the grass-roots organisations – non profits, NGOs, and even just loose collectives of individuals – have turned to Linux, Open Source, and sometimes Debian to solve their problems. Many fine projects have been established to make technology accessible to all, regardless of race, gender, nationality, class, or any other label society has used to divide humans. Big hat tip to Humanitarian Open Street Map, Serval Project.

I’ve always loved Debian’s position on being the Universal operating system. Its’ vast range of packages and wide range of computing architectures supported means that quite often a litmus test of “is project X a good project?” was met with “is it packaged for Debian?”. That wide range of architectures has meant that administrators of systems had fewer surprises and a faster adoption cycle when changing platforms, such as the switch from x86 32 bit to x86 64 bit.

Enter the Cloud

I first laid eyes on the AWS Cloud in 2008. It was nothing like the rich environment you see today. The first thing I looked for was my favourite operating system, so that what I already knew and was familiar with was available in this environment to minimise the learning curve. However there were no official images, which was disconcerting.

In 2012 I joined AWS as an employee. Living in Australia they hired me into the field sales team as a Solution Architect – a sort of pre-sales tech – with a customer focused depth in security. It was a wonderful opportunity, and I learnt a great deal. It also made sense (to me, at least) to do something about getting Debian’s images blessed.

It turned out, that I had to almost define what that was: images endorsed by a Debian Developer, handed to the AWS Marketplace team. And so since 2013 I have done so, keeping track of Debian’s releases across the AWS regions, collaborating with other Debian folk on other cloud platforms to attempt a unified approach to generating and maintaining these images. This included (for a stint) generating them into the AWS GovCloud Region, and still into the AWS China (Beijing) Region – the other side of the so-called Great Firewall of China.

So why the trip?

We’ve had focus groups at the Debconf (Debian conference) around the world, but its often difficult to get the right group of people in the same rooms at the same time. So the proposal was to hold a focused Debian Cloud Sprint. Google was good enough to host this, for all the volunteers across all the cloud providers. Furthermore, donated funds were found to secure the travel for a set of people to attend who otherwise could not.

I was lucky enough to be given a flight.

So here I am, in the terminal in Australia: my kids are tucked up in bed, dreaming of the candy they just collected for Halloween. It will be a draining week I am sure, but if it helps set and improve the state of Debian then its worth it.

Enrico Zini: Bremen Freimarkt Parade

31 October, 2016 - 01:10

Look! They are having a parade with people dressed like Debian releases!

Dirk Eddelbuettel: RProtoBuf 0.4.7: Mostly harmless

30 October, 2016 - 09:00

CRAN requested a release updating any URLs for Omegahat to the (actually working) URL. The RProtoBuf package had this in one code comment (errr...) and on bibfile entry. Oh well -- so that caused this 0.4.7 release which arrived on CRAN today. It contains the requested change, and pretty much nothing else.

RProtoBuf provides R bindings for the Google Protocol Buffers ("Protobuf") data encoding and serialization library used and released by Google, and deployed as a language and operating-system agnostic protocol by numerous projects.

The NEWS file summarises the release as follows:

Changes in RProtoBuf version 0.4.7 (2016-10-27)
  • At the request of CRAN, two documentation instances referring to the Omegehat repository were updated to

CRANberries also provides a diff to the previous release. The RProtoBuf page has an older package vignette, a 'quick' overview vignette, a unit test summary vignette, and the pre-print for the JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Dirk Eddelbuettel: drat 0.1.2: Mostly harmless

30 October, 2016 - 08:53

CRAN requested a release updating any URLs for Omegahat to the (actually working) URL. So that caused this 0.1.2 release which arrived on CRAN yesterday. It contains the requested change along with one or two other mostly minor changes which accumulated since the last release.

drat stands for drat R Archive Template, and helps with easy-to-create and easy-to-use repositories for R packages. Since its inception in early 2015 it has found reasonably widespread adoption among R users because repositories is what we use. In other words, friends don't let friends use install_github(). Just kidding. Maybe. Or not.

The NEWS file summarises the release as follows:

Changes in drat version 0.1.2 (2016-10-28)
  • Changes in drat documentation

    • The FAQ vignette added a new question Why use drat

    • URLs were made canonical, was updated from .org

    • Several files (, Description, help pages) were edited

Courtesy of CRANberries, there is a comparison to the previous release. More detailed information is on the drat page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Iain R. Learmonth: Powers to Investigate

30 October, 2016 - 07:17

The Communication Data Bill was draft legislation introduced first in May 2012. It sought to compel ISPs to store details of communications usage so that it can later be used for law enforcement purposes. In 2013 the passage of this bill into law had been blocked and the bill was dead.

In 2014 we saw the Data Retention and Investigatory Powers Act 2014 appear. This seemed to be in response to the Data Retention Directive being successfully challenged at the European Court of Justice by Digital Rights Ireland on human rights grounds, with a judgment given in 2014. It essentially reimplemented the Data Retention Directive along with a whole load of other nasty things.

The Data Retention and Investigatory Powers Act contained a sunset clause with a date set for 2016. This brings us to the Investigatory Powers Bill which it looks will be passing into law shortly.

Among a range of nasty powers, this legislation will be able to force ISPs to record metadata about every website you visit, every connection you make to a server on the Internet. This is sub-optimal for the privacy minded, with my primary concern being that this is a treasure trove of data and it’s going to be abused by someone. It’s going to be too much for someone to resist.

The existence of this power in the bill seemed to confuse the House of Lords:

It is not for me to explain why the Government want in the Bill a power that currently does not exist, because internet connection records do not exist, and which the security services say they do not want but which the noble and learned Lord says might be needed in the future. It is not for me to justify this power; I am saying to the House why I do not believe it is justified. The noble and learned Lord and the noble Lord, Lord Rosser, made the point that this is an existing power, but how can you have an existing power to acquire something that will not exist until the Bill is enacted?

– Lord Paddick (link)

Of course, the internet connection records are meaningless when your traffic is routed via a proxy or VPN, and there is a Kickstarter in progress that I would love to succeed: OnionDSL.

The premise of OnionDSL is that instead of having an IPv4/IPv6 connection to the Internet, you join a private network that does not provide any routing to the global Internet and instead provides only a Tor bridge. I cannot think of anything that I do from home that I cannot do via Tor and have been considering switching to Qubes OS as the operating system on my day-to-day laptop to allow me to direct basically everything through Tor.

The idea of provisioning a non-IP service via DSL is not new to me, I’ve come across it before with cjdns which provides an encrypted IPv6 network using public key cryptography for network address allocation and a distributed hash table for routing. Peering between cjdns nodes can be performed over Ethernet and cjdns over Ethernet could be provisioned in place of the traditional PPP over Ethernet (PPPoE) to provide access directly to cjdns without providing any routing to the global Internet.

If OnionDSL is funded, I think it’s very likely I would be considering becoming a customer. (Assuming the government doesn’t attempt to also outlaw Tor).

Lars Wirzenius: Obnam 1.20 released

29 October, 2016 - 17:30

I have just released version 1.20 of Obnam, my backup program. It's been nine months since the previous release, and that's a long time: I've had an exciting year, and not entirely in a good way. Unfortuntely that's eaten up a lot of my free time and enthusiasm for my hobby projects.

See below for a snippet of NEWS, with a summary of the user-visible changes. A lot of the effort has gone into improving FORMAT GREEN ALBATROSS, but that isn't documented in the NEWS file.

I've received patches and actionable bug reports from a number of people, and I'm grateful for those. I try to credit them by name in the NEWS file.

Obnam NEWS

This file summarizes changes between releases of Obnam.

NOTE: Obnam has an EXPERIMENTAL repository format under development, called green-albatross-20160813. It is NOT meant for real use. It is likely to change in incompatible ways without warning. DO NOT USE it unless you're willing to lose your backup.

Version 1.20, released 2016-10-29
  • The format name for green-albatross is renamed to green-albatross-20160813 and will henceforth be renamed every time there's a change, to avoid confusing Lars because of backwards incompatibilities. When it reaches stability and the on-disk format is frozen, it'll be renamed back to a date-less version.

  • Those using the experimental green-albatross repository format will have to start over with fresh repositories. This release contains backwards incompatible changes that mean existing repositories no longer work. Sorry, but that's what experimental means.

  • A green-albatross change is that the "chunk index" data structure is no longer a single blob, and instead it's broken down into smaller objects. This avoids keeping all of the chunk indexes in memory at once, which should reduce memory use.

  • Remi Rampin started updating and continuing the French translation of the Obnam manual.

  • Lars Wirzenius changed the default so that Obnam reads random data when creating encryption key from /dev/urandom instead of /dev/random. The goal is to make it less likely that Obnam stops at the key generation stage on machines with little entropy. Set weak-random = no in your configuration to override this.

Minor changes:

  • Lars Wirzenius changed obnam forget so that if there is nothing to do, it doesn't even try to connect to the repository.

  • Lars Wirzenius added a chapter on participating in the Obnam project to the manual.

  • Lars Wirzenius changed --one-file-system to work for bind mounts. It only works for bind mounts that exist at the time when Obnam starts, however. Also, /proc/mounts must be an accurate list of mount points.

  • Lars Wirzenius added the gpg command line to the error message about gpg failing.

Bug fixes:

  • The manual and manual page used to claim you could break only the locks for one client. This was not true. The manuals has been fixed.

  • A whole bunch of typo fixes, from Andrea Gelmini.

  • Michel Alexandre Salim fixed a bug in the FUSE (obnam mount) plugin, which was a typo in a method name (get_clientgeneration_ids).

  • Lars Wirzenius fixed obnam restore to require a target set with --to. Jonathan Dowland reported the problem.

  • Lars Wirzenius fixed obnam list-errors so that it doesn't crash on error classes that only exist to make the exception hierarchy neater, such as EncryptionError. Bug reported by Rik Theys.

  • Ian Cambell fixed a bug in obnam kdirstat and its handling of FIFO sockets.

Iain R. Learmonth: live-wrapper 0.4 released!

29 October, 2016 - 10:21

Last week saw the quiet upload of live-wrapper 0.4 to unstable. I would have blogged at the time, but there is another announcement coming later in this blog post that I wanted to make at the same time.

live-wrapper is a wrapper around vmdebootstrap for producing bootable live images using Debian GNU/Linux. Accompanied by the live-tasks package in Debian, this provides the toolchain and configuration necessary for building live images using Cinnamon, GNOME, KDE, LXDE, MATE and XFCE. There is also work ongoing to add a GNUstep image to this.

Building a live image with live-wrapper is easy:

sudo apt install live-wrapper
sudo lwr

This will build you a file named output.iso in the current directory containing a minimal live-image. You can the test this in QEMU:

qemu-system-x86_64 -m 2G -cdrom live.iso

You can find the latest documentation for live-wrapper here and any feedback you have is appreciated. So far it looks that booting from CD and USB with both ISOLINUX (BIOS) and GRUB (EFI) are all working as expected on real hardware.

The second announcement that I wanted to accompany this announcement is that we will be running a vmdebootstrap sprint where we will be working on live-wrapper at the MiniDebConf in Cambridge. I will be working on installer integration while Ana Custura will be investigating bootloaders and their customisation. I’d like to thank the Debian Project and those who have given donations to it for supporting our travel and accomodation costs for this sprint.

Iain R. Learmonth: live-wrapper 0.4 released!

29 October, 2016 - 10:21

Last week saw the quiet upload of live-wrapper 0.4 to unstable. I would have blogged at the time, but there is another announcement coming later in this blog post that I wanted to make at the same time.

live-wrapper is a wrapper around vmdebootstrap for producing bootable live images using Debian GNU/Linux. Accompanied by the live-tasks package in Debian, this provides the toolchain and configuration necessary for building live images using Cinnamon, GNOME, KDE, LXDE, MATE and XFCE. There is also work ongoing to add a GNUstep image to this.

Building a live image with live-wrapper is easy:

sudo apt install live-wrapper
sudo lwr

This will build you a file named output.iso in the current directory containing a minimal live-image. You can the test this in QEMU:

qemu-system-x86_64 -m 2G -cdrom live.iso

You can find the latest documentation for live-wrapper here and any feedback you have is appreciated. So far it looks that booting from CD and USB with both ISOLINUX (BIOS) and GRUB (EFI) are all working as expected on real hardware.

The second announcement that I wanted to accompany this announcement is that we will be running a vmdebootstrap sprint where we will be working on live-wrapper at the MiniDebConf in Cambridge. I will be working on installer integration while Ana Custura will be investigating bootloaders and their customisation. I’d like to thank the Debian Project and those who have given donations to it for supporting our travel and accomodation costs for this sprint.

Jaldhar Vyas: Dawkins Weasel

29 October, 2016 - 09:59

It's already Dhan Terash so I better pick up the pace if I want to beat my blogging challenge before Diwali so in this post I'll discuss a program I wrote earlier this year.

I dread to look up anything on Wikipedia because I always end up going down a rabbit hole and surfacing hours later on a totally unrelated topic. Case in point, some months ago, I ended up on the page of the title. This is an interesting little experiment illustrating how random selection can result in the evolution of a specific form. The algorithm is:

  1. Start with a random string of 28 characters.
  2. Make 100 copies of this string, with a 5% chance per character of that character being replaced with a random character.
  3. Compare each new string with "METHINKS IT IS LIKE A WEASEL", and give each a score (the number of letters in the string that are correct and in the correct position).
  4. If any of the new strings has a perfect score (== 28), halt.
  5. Otherwise, take the highest scoring string, and go to step 2.

I had to try this myself so I wrote a little implementation in C++. A sample run looks like this:

$ ./weasel

My program lets you adjust the input string, the number of copies, and the mutation threshold. I also thought it might be interesting to implement the Generator design pattern. In C++ this is done by making a class which implements begin() and end() methods and atleast a forward iterator.

You can find the source code on Github.

Russ Allbery: Term::ANSIColor 4.06

29 October, 2016 - 06:01

A small maintenance release to my Perl module for generating and manipulating ANSI color escape sequences.

In 4.00, I added 256-color support, using ansi0 through ansi15 for the colors that match the normal 16 colors and then rgbNNN and greyN names for the RGB and greyscale colors. One module user requested the ability to address all of the colors via ansi0 through ansi255 so that there's a consistent naming scheme for all the colors. This release adds that. (The more specific names are still returned by uncolor() when reversing escape sequences.)

This module tends to get included all over the place, so I did spend a bit of time when preparing this release trying to determine if adding a bunch more entries to various internal hash tables and arrays would noticably increase memory usage, since Perl is notoriously bad about memory consumption. It does cause a small increase, but it's on the order of about 100KB, and a minimum Perl program to load the module requires about 5.5MB of memory (aie), so it wasn't enough for me to do anything about.

It does look like if I lazily added entries to the built-in hash tables and instead added some more code to calculate escape sequences on the fly, I could save about 300KB of memory usage in the module. Not sure if it's worth it given how small the memory usage is compared to Perl itself, but maybe I'll look at that later when I'm feeling like fiddling with the module again.

(Oh, and all the documentation was regenerated by DocKnot, since I'm still having fun with that. It needed a few new features, which will be in an upcoming 1.01 release.)

You can get the latest version from the Term::ANSIColor distribution page.

Matthew Garrett: Of course smart homes are targets for hackers

29 October, 2016 - 00:23
The Wirecutter, an in-depth comparative review site for various electrical and electronic devices, just published an opinion piece on whether users should be worried about security issues in IoT devices. The summary: avoid devices that don't require passwords (or don't force you to change a default and devices that want you to disable security, follow general network security best practices but otherwise don't worry - criminals aren't likely to target you.

This is terrible, irresponsible advice. It's true that most users aren't likely to be individually targeted by random criminals, but that's a poor threat model. As I've mentioned before, you need to worry about people with an interest in you. Making purchasing decisions based on the assumption that you'll never end up dating someone with enough knowledge to compromise a cheap IoT device (or even meeting an especially creepy one in a bar) is not safe, and giving advice that doesn't take that into account is a huge disservice to many potentially vulnerable users.

Of course, there's also the larger question raised by the last week's problems. Insecure IoT devices still pose a threat to the wider internet, even if the owner's data isn't at risk. I may not be optimistic about the ease of fixing this problem, but that doesn't mean we should just give up. It is important that we improve the security of devices, and many vendors are just bad at that.

So, here's a few things that should be a minimum when considering an IoT device:
  • Does the vendor publish a security contact? (If not, they don't care about security)
  • Does the vendor provide frequent software updates, even for devices that are several years old? (If not, they don't care about security)
  • Has the vendor ever denied a security issue that turned out to be real? (If so, they care more about PR than security)
  • Is the vendor able to provide the source code to any open source components they use? (If not, they don't know which software is in their own product and so don't care about security, and also they're probably infringing my copyright)
  • Do they mark updates as fixing security bugs? (If not, they care more about hiding security issues than fixing them)
  • Has the vendor ever threatened to prosecute a security researcher? (If so, again, they care more about PR than security)
  • Does the vendor provide a public minimum support period for the device? (If not, they don't care about security or their users)

    I've worked with big name vendors who did a brilliant job here. I've also worked with big name vendors who responded with hostility when I pointed out that they were selling a device with arbitrary remote code execution. Going with brand names is probably a good proxy for many of these requirements, but it's insufficient.

    So here's my recommendations to The Wirecutter - talk to a wide range of security experts about the issues that users should be concerned about, and figure out how to test these things yourself. Don't just ask vendors whether they care about security, ask them what their processes and procedures look like. Look at their history. And don't assume that just because nobody's interested in you, everybody else's level of risk is equal.

  • comments

    Alessio Treglia: The logical contradictions of the Universe

    28 October, 2016 - 19:16


    Is Erwin Schrödinger’s wave function – which did in the atomic and subatomic world an operation altogether similar to the one performed by Newton in the macroscopic world – an objective reality or just a subjective knowledge? Physicists, philosophers and epistemologist have debated at length on this matter. In 1960, theoretical physicist Eugene Wigner has proposed that the observer’s consciousness is the dividing line that triggers the collapse of the wave function[1], and this theory was later taken up and developed in recent years. “The rules of quantum mechanics are correct but there is only one system which may be treated with quantum mechanics, namely the entire material world. There exist external observers which cannot be treated within quantum mechanics, namely human (and perhaps animal) minds, which perform measurements on the brain causing wave function collapse[2].

    The English mathematical physicist and philosopher of science Roger Penrose developed the hypothesis called Orch-OR (Orchestrated objective reduction) according to which consciousness originates from processes within neurons, rather than from the connections between neurons (the conventional view). The mechanism is believed to be a quantum physical process called objective reduction which is orchestrated by the molecular structures of the microtubules of brain cells (which constitute the cytoskeleton of the cells themselves). Together with the physician Stuart Hameroff, Penrose has suggested a direct relationship between the quantum vibrations of microtubules and the formation of consciousness.

    <Read More…[by Fabio Marzocca]>

    Jaldhar Vyas: Get Ready For Bikini Season With These n Weird Tricks

    28 October, 2016 - 15:04

    It all started last June when my son had his Janoi (Yagnopavita) ceremony -- the ritual by which a Brahmana boy becomes "twice-born" and eligible to study the Vedas. As well as a profound religious experience, it is also an important social occasion with a reception for as many friends and family as can attend. (I think our final guest total was ~250.) This meant new outfits for everyone which might be exciting for some people but not me. I still don't know why I couldn't just keep wearing the khes and pitambar from the puja but no, apparently that's a faux pas. So I relented and agreed to wear my "darbari" suit from my wedding. And it didn't fit. I knew I had gained some weight in the intermediate 17 years but the thing was sitcom levels of too small. I ended up having to purchase a new one, a snazzy (and shiny!) maroon number with gold stripes (or were they black stripes?) Problem having been solved, much was eaten, more weight was gained and then I forgot about the whole thing.

    Tip 1: Actually Do Something.

    I have over the years tried to improve my physical condition but it has never gotten very far. For instance I have a treadmill/coatrack and a couple of years ago I began using it in earnest. I got to the point where I actually ran a 10K race without dying. But I did not train systematically and I ended up in some pain which caused me to stop working out for a while and then I never got around to restarting. Diets have also failed because I don't have a clear idea of what and how much I am eating. All I know is that women go into the kitchen and when they come out they have food. By what eldritch process this occurs is a mystery, I just eat whats given to me thankful that the magic happens. Once I was moved to try and help but quickly fell afoul of the lack of well-defined algorithms in Gujarati home cooking.

    "How much saffron should I add?" "this much." "How much is this much in SI units?" "You're annoying me. Get out."

    Fast forward to March of this year. For my birthday, my wife got me a Fitbit fitness tracker. This is what I had needed all this time. It measure heart rate, distance travelled, time slept and several other pieces of info you can use to really plan a fitness regimen rationally. For example, I was chagrined to learn that sometimes when I'm at the computer, I am so immobile that the fitbit thought I was asleep. So I started planning to taken more frequent breaks. (A recent firmware upgrade has added the ability to nudge to walk atleast 250 paces each daytime hour which is handy for this.) Also by checking my heart rate I discovered that I went on the treadmill I ran too fast thereby stressing my body for little gain and ending up going too slow to get much aerobic effect. Now I can pace myself appropriately for maximum cardiac efficiency without ending up injuring myself and giving up. I also get a little more activity each day by simple changes such as taking the stairs instead of the lift and instead of getting off at the 14th street PATH I go all the way to 34th street and walk down.

    Tip 2: You must have data in order to see what you did right or wrong and to plan what you need to do moving forward.

    One caveat about these fitness trackers. They are not anywhere as accurate as a proper checkup from a doctor who specializes in such things. If you want to do any kind of pro or amateur athletics you probably should not rely on them but for the average shlub who just wants to avoid appearing on the news being winched off his sofa by the fire brigade they are good enough.

    Another practice I began was keeping a food diary. It can be a real eye-opener to see how much you are actually eating. It is probably much more than you thought. I am fortunate that my diet is pretty good to begin with. Vegetarian, (not vegan, Hindus eat dairy products,) mostly home-cooked with fresh ingredients, not fried or processed, and I don't drink alcohol. However there were a few optimizations I could make. I drink a lot of soda; atleast two cans a day. I really ought to stop altogether but in lieu of that I have atleast switched from Coke to Coke Zero thereby saving a lot of empty calories. I now eat 4 rotlis with my dinner instead of six. We as a family eat more green vegetables instead of potatos, skim milk instead of whole fat, canola oil instead of corn oil, and less rice and don't slather ghee on everything quite so much.

    One entirely new practice I've adopted that may seem faddish but works for me is intermittent fasting. The idea is to steadily train your body to need less food by eating all your days allowed amount pf calories during a 6-8 hour window and not eating at all during the remaining time. It's hard to get used to for many people but I fast atleast 2-3 times a month for religious reasons anyway so I adapted pretty quickly.

    The fitbit tells me how many calories I am expending and how many I can eat to maintain a healthy level of weight loss but other than that I don't bother with "food groups" or specific diets such as paleo, or low-carb etc. As long as what you eat is reasonably balanced and you are burning more calories than you are adding, it should be enough for weight loss. Indeed from the end of March to now, I've lost 3 stones (20Kg) even with the occasional "cheat" day.

    Tip 3: All published diets are bullshit without scientifically proven efficacy. Don't bother with them. Experiment instead and see what works for you and your metabolism. As long as you are getting all the proper nutrients (you shouldn't need a supplement unless you have an actual medical condition.) and you have a net calorie deficit, it's all good. If you eat food you enjoy, you are more likely to stick to your diet.

    The proper amount of sleep is one area of a healthy lifestyle I am still doing poorly in and the reasons are not all raven-related. I have always had problems with insomnia and was once actually diagnosed with sleep apnea. Losing weight has helped a lot but the fitbit is still reporting that I toss and turn a lot during the night. And that's when I'm in bed in the first place. I stay up much too late which can also lead to subsidiary bad behaviours such as midnight snacking. It's something I need to work on.

    Tip 4: Stop blogging at all hours of the night, It's not doing you any good.

    So that's what I'm doing. Moving forward, I need to deal with the sleep thing and I would also like to start some program of strength-training, I'm doing ok in terms of aerobic exercise but from what I've read, you also have to build up muscles to keep weight loss permanent. The difficulty is that it would involve joining a gym and then actually going to that gym so I've put it off for now. The immediate threat is Diwali (and Thanksgiving and Christmas...) My wife bought 4 lbs of sweets today and I can feel their presence in the fridge calling to me.

    Russ Allbery: DocKnot 1.00

    27 October, 2016 - 12:07

    I'm a bit of a perfectionist about package documentation, and I'm also a huge fan of consistency. As I've slowly accumulated more open source software packages (alas, fewer new ones these days since I have less day-job time to work on them), I've developed a standard format for package documentation files, particularly the README in the package and the web pages I publish. I've iterated on these, tweaking them and messing with them, trying to incorporate all my accumulated wisdom about what information people need.

    Unfortunately, this gets very tedious, since I have dozens of packages at this point and rarely want to go through the effort of changing every one of them every time I come up with a better way of phrasing something or change some aspect of my standard package build configuration. I also have the same information duplicated in multiple locations (the README and the web page for the package). And there's a lot of boilerplate that's common for all of my packages that I don't want to keep copying (or changing when I do things like change all URLs to HTTPS).

    About three years ago, I started seriously brainstorming ways of automating this process. I made a start on it during one self-directed day at my old job at Stanford, but only got it far enough to generate a few basic files. Since then, I keep thinking about it, keep wishing I had it, and keep not putting the work into finishing it.

    During this vacation, after a week and a half of relaxing and reading, I finally felt like doing a larger development project and finally started working on this for long enough to build up some momentum. Two days later, and this is finally ready for an initial release.

    DocKnot uses metadata (which I'm putting in docs/metadata) that's mostly JSON plus some documentation fragments and generates README, the web page for the package (in thread, the macro language I use for all my web pages), and (the other thing I've wanted to do and didn't want to tackle without this tool), a Markdown version of README that will look nice on GitHub.

    The templates that come with the package are all rather specific to me, particularly the thread template which would be unusable by anyone else. I have no idea if anyone else will want to use this package (and right now the metadata format is entirely undocumented). But since it's a shame to not release things as free software, and since I suspect I may need to upload it to Debian since, technically, this tool is required to "build" the README file distributed with my packages, here it is. I've also uploaded it to CPAN (it's my first experiment with the App::* namespace for things that aren't really meant to be used as a standalone Perl module).

    You can get the latest version from the DocKnot distribution page (which is indeed generated with DocKnot). Also now generated with DocKnot are the rra-c-util and C TAP Harness distribution pages. Let me know if you see anything weird; there are doubtless still a few bugs.

    Christoph Egger: Installing a python systemd service?

    26 October, 2016 - 18:16

    As web search engines and IRC seems to be of no help, maybe someone here has a helpful idea. I have some service written in python that comes with a .service file for systemd. I now want to build&install a working service file from the software's I can override the build/build_py commands of setuptools, however that way I still lack knowledge wrt. the bindir/prefix where my service script will be installed. Ideas?

    Steinar H. Gunderson: Why does software development take so long?

    26 October, 2016 - 14:30

    Nageru 1.4.0 is out (and on its way through the Debian upload process right now), so now you can do live video mixing with multichannel audio to your heart's content. I've already blogged about most of the interesting new features, so instead, I'm trying to answer a question: What took so long?

    To be clear, I'm not saying 1.4.0 took more time than I really anticipated (on the contrary, I pretty much understood the scope from the beginning, and there was a reason why I didn't go for building this stuff into 1.0.0); but if you just look at the changelog from the outside, it's not immediately obvious why “multichannel audio support” should take the better part of three months of develoment. What I'm going to say is of course going to be obvious to most software developers, but not everyone is one, and perhaps my experiences will be illuminating.

    Let's first look at some obvious things that isn't the case: First of all, development is not primarily limited by typing speed. There are about 9,000 lines of new code in 1.4.0 (depending a bit on how you count), and if it was just about typing them in, I would be done in a day or two. On a good keyboard, I can type plain text at more than 800 characters per minute—but you hardly ever write code for even a single minute at that speed. Just as when writing a novel, most time is spent thinking, not typing.

    I also didn't spend a lot of time backtracking; most code I wrote actually ended up in the finished product as opposed to being thrown away. (I'm not as lucky in all of my projects.) It's pretty common to do so if you're in an exploratory phase, but in this case, I had a pretty good idea of what I wanted to do right from the start, and that plan seemed to work. This wasn't a difficult project per se; it just needed to be done (which, in a sense, just increases the mystery).

    However, even if this isn't at the forefront of science in any way (most code in the world is pretty pedestrian, after all), there's still a lot of decisions to make, on several levels of abstraction. And a lot of those decisions depend on information gathering beforehand. Let's take a look at an example from late in the development cycle, namely support for using MIDI controllers instead of the mouse to control the various widgets.

    I've kept a pretty meticulous TODO list; it's just a text file on my laptop, but it serves the purpose of a ghetto bugtracker. For 1.4.0, it contains 83 work items (a single-digit number is not ticked off, mostly because I decided not to do those things), which corresponds roughly 1:2 to the number of commits. So let's have a look at what the ~20 MIDI controller items went into.

    First of all, to allow MIDI controllers to influence the UI, we need a way of getting to it. Since Nageru is single-platform on Linux, ALSA is the obvious choice (if not, I'd probably have to look for a library to put in-between), but seemingly, ALSA has two interfaces (raw MIDI and sequencer). Which one do you want? It sounds like raw MIDI is what we want, but actually, it's the sequencer interface (it does more of the MIDI parsing for you, and generally is friendlier).

    The first question is where to start picking events from. I went the simplest path and just said I wanted all events—anything else would necessitate a UI, a command-line flag, figuring out if we wanted to distinguish between different devices with the same name (and not all devices potentially even have names), and so on. But how do you enumerate devices? (Relatively simple, thankfully.) What do you do if the user inserts a new one while Nageru is running? (Turns out there's a special device you can subscribe to that will tell you about new devices.) What if you get an error on subscription? (Just print a warning and ignore it; it's legitimate not to have access to all devices on the system. By the way, for PCM devices, all of these answers are different.)

    So now we have a sequencer device, how do we get events from it? Can we do it in the main loop? Turns out it probably doesn't integrate too well with Qt, but it's easy enough to put it in a thread. The class dealing with the MIDI handling now needs locking; what mutex granularity do we want? (Experience will tell you that you nearly always just want one mutex. Two mutexes give you all sorts of headaches with ordering them, and nearly never gives any gain.) ALSA expects us to poll() a given set of descriptors for data, but on shutdown, how do you break out of that poll to tell the thread to go away? (The simplest way on Linux is using an eventfd.)

    There's a quirk where if you get two or more MIDI messages right after each other and only read one, poll() won't trigger to alert you there are more left. Did you know that? (I didn't. I also can't find it documented. Perhaps it's a bug?) It took me some looking into sample code to find it. Oh, and ALSA uses POSIX error codes to signal errors (like “nothing more is available”), but it doesn't use errno.

    OK, so you have events (like “controller 3 was set to value 47”); what do you do about them? The meaning of the controller numbers is different from device to device, and there's no open format for describing them. So I had to make a format describing the mapping; I used protobuf (I have lots of experience with it) to make a simple text-based format, but it's obviously a nightmare to set up 50+ controllers by hand in a text file, so I had to make an UI for this. My initial thought was making a grid of spinners (similar to how the input mapping dialog already worked), but then I realized that there isn't an easy way to make headlines in Qt's grid. (You can substitute a label widget for a single cell, but not for an entire row. Who knew?) So after some searching, I found out that it would be better to have a tree view (Qt Creator does this), and then you can treat that more-or-less as a table for the rows that should be editable.

    Of course, guessing controller numbers is impossible even in an editor, so I wanted it to respond to MIDI events. This means the editor needs to take over the role as MIDI receiver from the main UI. How you do that in a thread-safe way? (Reuse the existing mutex; you don't generally want to use atomics for complicated things.) Thinking about it, shouldn't the MIDI mapper just support multiple receivers at a time? (Doubtful; you don't want your random controller fiddling during setup to actually influence the audio on a running stream. And would you use the old or the new mapping?)

    And do you really need to set up every single controller for each bus, given that the mapping is pretty much guaranteed to be similar for them? Making a “guess bus” button doesn't seem too difficult, where if you have one correctly set up controller on the bus, it can guess from a neighboring bus (assuming a static offset). But what if there's conflicting information? OK; then you should disable the button. So now the enable/disable status of that button depends on which cell in your grid has the focus; how do you get at those events? (Install an event filter, or subclass the spinner.) And so on, and so on, and so on.

    You could argue that most of these questions go away with experience; if you're an expert in a given API, you can answer most of these questions in a minute or two even if you haven't heard the exact question before. But you can't expect even experienced developers to be an expert in all possible libraries; if you know everything there is to know about Qt, ALSA, x264, ffmpeg, OpenGL, VA-API, libusb, microhttpd and Lua (in addition to C++11, of course), I'm sure you'd be a great fit for Nageru, but I'd wager that pretty few developers fit that bill. I've written C++ for almost 20 years now (almost ten of them professionally), and that experience certainly helps boosting productivity, but I can't say I expect a 10x reduction in my own development time at any point.

    You could also argue, of course, that spending so much time on the editor is wasted, since most users will only ever see it once. But here's the point; it's not actually a lot of time. The only reason why it seems like so much is that I bothered to write two paragraphs about it; it's not a particular pain point, it just adds to the total. Also, the first impression matters a lot—if the user can't get the editor to work, they also can't get the MIDI controller to work, and is likely to just go do something else.

    A common misconception is that just switching languages or using libraries will help you a lot. (Witness the never-ending stream of software that advertises “written in Foo” or “uses Bar” as if it were a feature.) For the former, note that nothing I've said so far is specific to my choice of language (C++), and I've certainly avoided a bunch of battles by making that specific choice over, say, Python. For the latter, note that most of these problems are actually related to library use—libraries are great, and they solve a bunch of problems I'm really glad I didn't have to worry about (how should each button look?), but they still give their own interaction problems. And even when you're a master of your chosen programming environment, things still take time, because you have all those decisions to make on top of your libraries.

    Of course, there are cases where libraries really solve your entire problem and your code gets reduced to 100 trivial lines, but that's really only when you're solving a problem that's been solved a million times before. Congrats on making that blog in Rails; I'm sure you're advancing the world. (To make things worse, usually this breaks down when you want to stray ever so slightly from what was intended by the library or framework author. What seems like a perfect match can suddenly become a development trap where you spend more of your time trying to become an expert in working around the given library than actually doing any development.)

    The entire thing reminds me of the famous essay No Silver Bullet by Fred Brooks, but perhaps even more so, this quote from John Carmack's .plan has struck with me (incidentally about mobile game development in 2006, but the basic story still rings true):

    To some degree this is already the case on high end BREW phones today. I have a pretty clear idea what a maxed out software renderer would look like for that class of phones, and it wouldn't be the PlayStation-esq 3D graphics that seems to be the standard direction. When I was doing the graphics engine upgrades for BREW, I started along those lines, but after putting in a couple days at it I realized that I just couldn't afford to spend the time to finish the work. "A clear vision" doesn't mean I can necessarily implement it in a very small integral number of days.

    In a sense, programming is all about what your program should do in the first place. The “how” question is just the “what”, moved down the chain of abstractions until it ends up where a computer can understand it, and at that point, the three words “multichannel audio support” have become those 9,000 lines that describe in perfect detail what's going on.

    Daniel Pocock: FOSDEM 2017 Real-Time Communications Call for Participation

    26 October, 2016 - 13:39

    FOSDEM is one of the world's premier meetings of free software developers, with over five thousand people attending each year. FOSDEM 2017 takes place 4-5 February 2017 in Brussels, Belgium.

    This email contains information about:

    • Real-Time communications dev-room and lounge,
    • speaking opportunities,
    • volunteering in the dev-room and lounge,
    • related events around FOSDEM, including the XMPP summit,
    • social events (the legendary FOSDEM Beer Night and Saturday night dinners provide endless networking opportunities),
    • the Planet aggregation sites for RTC blogs
    Call for participation - Real Time Communications (RTC)

    The Real-Time dev-room and Real-Time lounge is about all things involving real-time communication, including: XMPP, SIP, WebRTC, telephony, mobile VoIP, codecs, peer-to-peer, privacy and encryption. The dev-room is a successor to the previous XMPP and telephony dev-rooms. We are looking for speakers for the dev-room and volunteers and participants for the tables in the Real-Time lounge.

    The dev-room is only on Saturday, 4 February 2017. The lounge will be present for both days.

    To discuss the dev-room and lounge, please join the FSFE-sponsored Free RTC mailing list.

    To be kept aware of major developments in Free RTC, without being on the discussion list, please join the Free-RTC Announce list.

    Speaking opportunities

    Note: if you used FOSDEM Pentabarf before, please use the same account/username

    Real-Time Communications dev-room: deadline 23:59 UTC on 17 November. Please use the Pentabarf system to submit a talk proposal for the dev-room. On the "General" tab, please look for the "Track" option and choose "Real-Time devroom". Link to talk submission.

    Other dev-rooms and lightning talks: some speakers may find their topic is in the scope of more than one dev-room. It is encouraged to apply to more than one dev-room and also consider proposing a lightning talk, but please be kind enough to tell us if you do this by filling out the notes in the form.

    You can find the full list of dev-rooms on this page and apply for a lightning talk at

    Main track: the deadline for main track presentations is 23:59 UTC 31 October. Leading developers in the Real-Time Communications field are encouraged to consider submitting a presentation to the main track.

    First-time speaking?

    FOSDEM dev-rooms are a welcoming environment for people who have never given a talk before. Please feel free to contact the dev-room administrators personally if you would like to ask any questions about it.

    Submission guidelines

    The Pentabarf system will ask for many of the essential details. Please remember to re-use your account from previous years if you have one.

    In the "Submission notes", please tell us about:

    • the purpose of your talk
    • any other talk applications (dev-rooms, lightning talks, main track)
    • availability constraints and special needs

    You can use HTML and links in your bio, abstract and description.

    If you maintain a blog, please consider providing us with the URL of a feed with posts tagged for your RTC-related work.

    We will be looking for relevance to the conference and dev-room themes, presentations aimed at developers of free and open source software about RTC-related topics.

    Please feel free to suggest a duration between 20 minutes and 55 minutes but note that the final decision on talk durations will be made by the dev-room administrators. As the two previous dev-rooms have been combined into one, we may decide to give shorter slots than in previous years so that more speakers can participate.

    Please note FOSDEM aims to record and live-stream all talks. The CC-BY license is used.

    Volunteers needed

    To make the dev-room and lounge run successfully, we are looking for volunteers:

    • FOSDEM provides video recording equipment and live streaming, volunteers are needed to assist in this
    • organizing one or more restaurant bookings (dependending upon number of participants) for the evening of Saturday, 4 February
    • participation in the Real-Time lounge
    • helping attract sponsorship funds for the dev-room to pay for the Saturday night dinner and any other expenses
    • circulating this Call for Participation (text version) to other mailing lists

    See the mailing list discussion for more details about volunteering.

    Related events - XMPP and RTC summits

    The XMPP Standards Foundation (XSF) has traditionally held a summit in the days before FOSDEM. There is discussion about a similar summit taking place on 2 and 3 February 2017. XMPP Summit web site - please join the mailing list for details.

    We are also considering a more general RTC or telephony summit, potentially in collaboration with the XMPP summit. Please join the Free-RTC mailing list and send an email if you would be interested in participating, sponsoring or hosting such an event.

    Social events and dinners

    The traditional FOSDEM beer night occurs on Friday, 3 February.

    On Saturday night, there are usually dinners associated with each of the dev-rooms. Most restaurants in Brussels are not so large so these dinners have space constraints and reservations are essential. Please subscribe to the Free-RTC mailing list for further details about the Saturday night dinner options and how you can register for a seat.

    Spread the word and discuss

    If you know of any mailing lists where this CfP would be relevant, please forward this email (text version). If this dev-room excites you, please blog or microblog about it, especially if you are submitting a talk.

    If you regularly blog about RTC topics, please send details about your blog to the planet site administrators:

    Planet site Admin contact All projects Free-RTC Planet ( contact XMPP Planet Jabber ( contact SIP Planet SIP ( contact SIP (Español) Planet SIP-es ( contact

    Please also link to the Planet sites from your own blog or web site as this helps everybody in the free real-time communications community.


    For any private queries, contact us directly using the address and for any other queries please ask on the Free-RTC mailing list.

    The dev-room administration team:

    Joachim Breitner: Showcasing Applicative

    26 October, 2016 - 10:45

    My plan for this week’s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction.

    I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone pesentation, I include all the material in this post.1


    In case you want to follow along, start with these imports:

    import Data.Char
    import Data.Maybe
    import Data.List
    import System.Environment
    import System.IO
    import System.Exit
    The parser

    The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student’s homework from last week:

    newtype Parser a = P (String -> Maybe (a, String))
    runParser :: Parser t -> String -> Maybe (t, String)
    runParser (P p) = p
    parse :: Parser a -> String -> Maybe a
    parse p input = case runParser p input of
        Just (result, "") -> Just result
        _ -> Nothing -- handles both no result and leftover input
    noParserP :: Parser a
    noParserP = P (\_ -> Nothing)
    pureParserP :: a -> Parser a
    pureParserP x = P (\input -> Just (x,input))
    instance Functor Parser where
        fmap f p = P $ \input -> do
    	(x, rest) <- runParser p input
    	return (f x, rest)
    instance Applicative Parser where
        pure = pureParserP
        p1 <*> p2 = P $ \input -> do
            (f, rest1) <- runParser p1 input
            (x, rest2) <- runParser p2 rest1
            return (f x, rest2)
    instance Monad Parser where
        return = pure
        p1 >>= k = P $ \input -> do
            (x, rest1) <- runParser p1 input
            runParser (k x) rest1
    anyCharP :: Parser Char
    anyCharP = P $ \input -> case input of
        (c:rest) -> Just (c, rest)
        []       -> Nothing
    charP :: Char -> Parser ()
    charP c = do
        c' <- anyCharP
        if c == c' then return ()
                   else noParserP
    anyCharButP :: Char -> Parser Char
    anyCharButP c = do
        c' <- anyCharP
        if c /= c' then return c'
                   else noParserP
    letterOrDigitP :: Parser Char
    letterOrDigitP = do
        c <- anyCharP
        if isAlphaNum c then return c else noParserP
    orElseP :: Parser a -> Parser a -> Parser a
    orElseP p1 p2 = P $ \input -> case runParser p1 input of
        Just r -> Just r
        Nothing -> runParser p2 input
    manyP :: Parser a -> Parser [a]
    manyP p = (pure (:) <*> p <*> manyP p) `orElseP` pure []
    many1P :: Parser a -> Parser [a]
    many1P p = pure (:) <*> p <*> manyP p
    sepByP :: Parser a -> Parser () -> Parser [a]
    sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) `orElseP` pure []

    A parser using this library for, for example, CSV files could take this form:

    parseCSVP :: Parser [[String]]
    parseCSVP = manyP parseLine
        parseLine = parseCell `sepByP` charP ',' <* charP '\n'
        parseCell = do
            charP '"'
            content <- manyP (anyCharButP '"')
            charP '"'
            return content
    We want EBNF

    Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:

    cell = '"', {not-quote}, '"';
    line = (cell, {',', cell} | ''), newline;
    csv  = {line};

    It is straight-forward to create a Haskell data type to represent an ENBF syntax description. Here is a simle EBNF library (data type and pretty-printer) for your convenience:

    data RHS
      = Terminal String
      | NonTerminal String
      | Choice RHS RHS
      | Sequence RHS RHS
      | Optional RHS
      | Repetition RHS
      deriving (Show, Eq)
    ppRHS :: RHS -> String
    ppRHS = go 0
        go _ (Terminal s)     = surround "'" "'" $ concatMap quote s
        go _ (NonTerminal s)  = s
        go a (Choice x1 x2)   = p a 1 $ go 1 x1 ++ " | " ++ go 1 x2
        go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", "  ++ go 2 x2
        go _ (Optional x)     = surround "[" "]" $ go 0 x
        go _ (Repetition x)   = surround "{" "}" $ go 0 x
        surround c1 c2 x = c1 ++ x ++ c2
        p a n | a > n     = surround "(" ")"
              | otherwise = id
        quote '\'' = "\\'"
        quote '\\' = "\\\\"
        quote c    = [c]
    type Production = (String, RHS)
    type BNF = [Production]
    ppBNF :: BNF -> String
    ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")
    Code to produce EBNF

    We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:

    newtype Grammar a = G RHS
    ppGrammar :: Grammar a -> String
    ppGrammar (G rhs) = ppRHS rhs

    So a value of type Grammar t is a description of the textual representation of the Haskell type t.

    Here is one simple example:

    anyCharG :: Grammar Char
    anyCharG = G (NonTerminal "char")

    Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the speical characters in the syntax described by the grammar:

    charG :: Char -> Grammar ()
    charG c = G (Terminal [c])

    A combinator that creates new grammers from two existing grammers:

    orElseG :: Grammar a -> Grammar a -> Grammar a
    orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)

    We want the convenience of our well-known type classes in order to combine these values some more:

    instance Functor Grammar where
        fmap _ (G rhs) = G rhs
    instance Applicative Grammar where
        pure x = G (Terminal "")
        (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)

    Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar!

    We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = …, but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later.

    Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:

    manyG :: Grammar a -> Grammar [a]
    manyG p = (pure (:) <*> p <*> manyG p) `orElseG` pure []
    many1G :: Grammar a -> Grammar [a]
    many1G p = pure (:) <*> p <*> manyG p
    sepByG :: Grammar a -> Grammar () -> Grammar [a]
    sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) `orElseG` pure []

    Let us run a small example:

    dottedWordsG :: Grammar [String]
    dottedWordsG = many1G (manyG anyCharG <* charG '.')
    *Main> putStrLn $ ppGrammar dottedWordsG
    '', ('', char, ('', char, ('', char, ('', char, ('', char, ('', …

    Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammers about where something can be repeated, so let us just make many a primitive:

    manyG :: Grammar a -> Grammar [a]
    manyG (G rhs) = G (Repetition rhs)

    With this definition, we already get a simple grammer for dottedWordsG:

    *Main> putStrLn $ ppGrammar dottedWordsG
    '', {char}, '.', {{char}, '.'}

    This already looks like a proper EBNF grammer. One thing that is not nice about it is that there is an empty string ('') in a sequence (…,…). We do not want that.

    Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:

    instance Applicative Grammar where
        pure x = G (Terminal "")
        G (Terminal "") <*> G rhs2 = G rhs2
        G rhs1 <*> G (Terminal "") = G rhs1
        (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
    Now we get what we want:
    *Main> putStrLn $ ppGrammar dottedWordsG
    {char}, '.', {{char}, '.'}

    Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:

    parseCSVP :: Grammar [[String]]
    parseCSVP = manyP parseLine
        parseLine = parseCell `sepByP` charG ',' <* charP '\n'
        parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'

    And now we try to rewrite the code to produce Grammar instead of Parser. This is straight forward with the exception of anyCharButP. The parser code for that in inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a “primitive” grammer, i.e. introducing a non-terminal in the EBNF without a production rule – pretty much like we did for anyCharG:

    primitiveG :: String -> Grammar a
    primitiveG s = G (NonTerminal s)
    parseCSVG :: Grammar [[String]]
    parseCSVG = manyG parseLine
        parseLine = parseCell `sepByG` charG ',' <* charG '\n'
        parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'

    Of course the names parse… are not quite right any more, but let us just leave that for now.

    Here is the result:

    *Main> putStrLn $ ppGrammar parseCSVG
    {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), '

    The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:

    newlineG :: Grammar ()
    newlineG = primitiveG "newline"

    Now we get

    *Main> putStrLn $ ppGrammar parseCSVG
    {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline}

    which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

    Code to produce EBNF, with productions

    We currently let our grammers produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:

    newtype Grammar a = G (BNF, RHS)
    runGrammer :: String -> Grammar a -> BNF
    runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]
    ppGrammar :: String -> Grammar a -> String
    ppGrammar main g = ppBNF $ runGrammer main g

    Now we have to adjust all our primitive combinators (but not the derived ones!):

    charG :: Char -> Grammar ()
    charG c = G ([], Terminal [c])
    anyCharG :: Grammar Char
    anyCharG = G ([], NonTerminal "char")
    manyG :: Grammar a -> Grammar [a]
    manyG (G (prods, rhs)) = G (prods, Repetition rhs)
    mergeProds :: [Production] -> [Production] -> [Production]
    mergeProds prods1 prods2 = nub $ prods1 ++ prods2
    orElseG :: Grammar a -> Grammar a -> Grammar a
    orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
        = G (mergeProds prods1 prods2, Choice rhs1 rhs2)
    instance Functor Grammar where
        fmap _ (G bnf) = G bnf
    instance Applicative Grammar where
        pure x = G ([], Terminal "")
        G (prods1, Terminal "") <*> G (prods2, rhs2)
            = G (mergeProds prods1 prods2, rhs2)
        G (prods1, rhs1) <*> G (prods2, Terminal "")
            = G (mergeProds prods1 prods2, rhs1)
        G (prods1, rhs1) <*> G (prods2, rhs2)
            = G (mergeProds prods1 prods2, Sequence rhs1 rhs2)
    primitiveG :: String -> Grammar a
    primitiveG s = G (NonTerminal s)

    The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.

    Did we gain anything? Not yet:

    *Main> putStr $ ppGrammar "csv" (parseCSVG)
    csv = {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline};

    But we can now introduce a function hat lets us tell the system where to give names to a piece of grammer:

    nonTerminal :: String -> Grammar a -> Grammar a
    nonTerminal name (G (prods, rhs))
      = G (prods ++ [(name, rhs)], NonTerminal name)

    Ample use of this in parseCSVG yields the desired result:

    parseCSVG :: Grammar [[String]]
    parseCSVG = manyG parseLine
        parseLine = nonTerminal "line" $
            parseCell `sepByG` charG ',' <* newline
        parseCell = nonTerminal "cell" $
            charG '"' *> manyG (primitiveG "not-quote") <* charG '"
    *Main> putStr $ ppGrammar "csv" (parseCSVG)
    cell = '"', {not-quote}, '"';
    line = (cell, {',', cell} | ''), newline;
    csv = {line};

    This is great!

    Unifying parsing and grammar-generating

    Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionaliy only once, and get both a parser and a grammer description out of it? This way, the two would never be out of sync!

    And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammers.

    The primitiveG function is tricky. We use it in grammers when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Pareser a, used in the parsing code.

    Finally, the type class that we execpt, Applicative (and thus Functor), are added as constraints on our type class:

    class Applicative f => Descr f where
        char :: Char -> f ()
        many :: f a -> f [a]
        orElse :: f a -> f a -> f a
        primitive :: String -> Parser a -> f a
        nonTerminal :: String -> f a -> f a

    The instances are easily written:

    instance Descr Parser where
        char = charP
        many = manyP
        orElse = orElseP
        primitive _ p = p
        nonTerminal _ p = p
    instance Descr Grammar where
        char = charG
        many = manyG
        orElse = orElseG
        primitive s _ = primitiveG s
        nonTerminal s g = nonTerminal s g

    And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:

    many1 :: Descr f => f a -> f [a]
    many1 p = pure (:) <*> p <*> many p
    anyChar :: Descr f => f Char
    anyChar = primitive "char" anyCharP
    dottedWords :: Descr f => f [String]
    dottedWords = many1 (many anyChar <* char '.')
    sepBy :: Descr f => f a -> f () -> f [a]
    sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) `orElse` pure []
    newline :: Descr f => f ()
    newline = primitive "newline" (charP '\n')

    And thus we now have our CSV parser/grammar generator:

    parseCSV :: Descr f => f [[String]]
    parseCSV = many parseLine
        parseLine = nonTerminal "line" $
            parseCell `sepBy` char ',' <* newline
        parseCell = nonTerminal "cell" $
            char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'

    We can now use this definition both to parse and to generate grammers:

    *Main> putStr $ ppGrammar2 "csv" (parseCSV)
    cell = '"', {not-quote}, '"';
    line = (cell, {',', cell} | ''), newline;
    csv = {line};
    *Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
    Just [["ab","cd"],["","de"],[]]
    The INI file parser and grammar

    As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week’s homework) again using applicative style2:

    parseINIP :: Parser INIFile
    parseINIP = many1P parseSection
        parseSection =
            (,) <$  charP '['
                <*> parseIdent
                <*  charP ']'
                <*  charP '\n'
                <*> (catMaybes <$> manyP parseLine)
        parseIdent = many1P letterOrDigitP
        parseLine = parseDecl `orElseP` parseComment `orElseP` parseEmpty
        parseDecl = Just <$> (
            (,) <*> parseIdent
                <*  manyP (charP ' ')
                <*  charP '='
                <*  manyP (charP ' ')
                <*> many1P (anyCharButP '\n')
                <*  charP '\n')
        parseComment =
            Nothing <$ charP '#'
                    <* many1P (anyCharButP '\n')
                    <* charP '\n'
        parseEmpty = Nothing <$ charP '\n'

    Transforming that to a generic description is quite straight-forward. We use primitive again to wrap letterOrDigitP:

    descrINI :: Descr f => f INIFile
    descrINI = many1 parseSection
        parseSection =
            (,) <*  char '['
                <*> parseIdent
                <*  char ']'
                <*  newline
                <*> (catMaybes <$> many parseLine)
        parseIdent = many1 (primitive "alphanum" letterOrDigitP)
        parseLine = parseDecl `orElse` parseComment `orElse` parseEmpty
        parseDecl = Just <$> (
            (,) <*> parseIdent
                <*  many (char ' ')
                <*  char '='
                <*  many (char ' ')
                <*> many1 (primitive "non-newline" (anyCharButP '\n'))
    	    <*  newline)
        parseComment =
            Nothing <$ char '#'
                    <* many1 (primitive "non-newline" (anyCharButP '\n'))
    		<* newline
        parseEmpty = Nothing <$ newline

    This yiels this not very helpful grammar (abbreviated here):

    *Main> putStr $ ppGrammar2 "ini" descrINI
    ini = '[', alphanum, {alphanum}, ']', newline, {alphanum, {alphanum}, {' '}…

    But with a few uses of nonTerminal, we get something really nice:

    descrINI :: Descr f => f INIFile
    descrINI = many1 parseSection
        parseSection = nonTerminal "section" $
            (,) <$  char '['
                <*> parseIdent
                <*  char ']'
                <*  newline
                <*> (catMaybes <$> many parseLine)
        parseIdent = nonTerminal "identifier" $
            many1 (primitive "alphanum" letterOrDigitP)
        parseLine = nonTerminal "line" $
            parseDecl `orElse` parseComment `orElse` parseEmpty
        parseDecl = nonTerminal "declaration" $ Just <$> (
            (,) <$> parseIdent
                <*  spaces
                <*  char '='
                <*  spaces
                <*> remainder)
        parseComment = nonTerminal "comment" $
            Nothing <$ char '#' <* remainder
        remainder = nonTerminal "line-remainder" $
            many1 (primitive "non-newline" (anyCharButP '\n')) <* newline
        parseEmpty = Nothing <$ newline
        spaces = nonTerminal "spaces" $ many (char ' ')
    *Main> putStr $ ppGrammar "ini" descrINI
    identifier = alphanum, {alphanum};
    spaces = {' '};
    line-remainder = non-newline, {non-newline}, newline;
    declaration = identifier, spaces, '=', spaces, line-remainder;
    comment = '#', line-remainder;
    line = declaration | comment | newline;
    section = '[', identifier, ']', newline, {line};
    ini = section, {section};
    Recursion (variant 1)

    What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:

    const = digit, {digit};
    spaces = {' ' | newline};
    atom = const | '(', spaces, expr, spaces, ')', spaces;
    mult = atom, {spaces, '*', spaces, atom}, spaces;
    plus = mult, {spaces, '+', spaces, mult}, spaces;
    expr = plus;

    The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well.

    One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:

    class Applicative f => Descr f where
        recNonTerminal :: String -> (f a -> f a) -> f a
    instance Descr Parser where
        recNonTerminal _ p = let r = p r in r
    instance Descr Grammar where
        recNonTerminal = recNonTerminalG
    recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
    recNonTerminalG name f =
        let G (prods, rhs) = f (G ([], NonTerminal name))
        in G (prods ++ [(name, rhs)], NonTerminal name)
    nonTerminal :: Descr f => String -> f a -> f a
    nonTerminal name p = recNonTerminal name (const p)
    runGrammer :: String -> Grammar a -> BNF
    runGrammer main (G (prods, NonTerminal nt)) | main == nt = prods
    runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

    The change in runGrammer avoids adding a pointless expr = expr production to the output.

    This lets us define a parser/grammar-generator for the arithmetic expressions given above:

    data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer
        deriving Show
    mkPlus :: Expr -> [Expr] -> Expr
    mkPlus = foldl Plus
    mkMult :: Expr -> [Expr] -> Expr
    mkMult = foldl Mult
    parseExpr :: Descr f => f Expr
    parseExpr = recNonTerminal "expr" $ \ exp ->
        ePlus exp
    ePlus :: Descr f => f Expr -> f Expr
    ePlus exp = nonTerminal "plus" $
        mkPlus <$> eMult exp
               <*> many (spaces *> char '+' *> spaces *> eMult exp)
               <*  spaces
    eMult :: Descr f => f Expr -> f Expr
    eMult exp = nonTerminal "mult" $
        mkPlus <$> eAtom exp
               <*> many (spaces *> char '*' *> spaces *> eAtom exp)
               <*  spaces
    eAtom :: Descr f => f Expr -> f Expr
    eAtom exp = nonTerminal "atom" $
        aConst `orElse` eParens exp
    aConst :: Descr f => f Expr
    aConst = nonTerminal "const" $ Const . read <$> many1 digit
    eParens :: Descr f => f a -> f a
    eParens inner =
        id <$  char '('
           <*  spaces
           <*> inner
           <*  spaces
           <*  char ')'
           <*  spaces

    And indeed, this works:

    *Main> putStr $ ppGrammar "expr" parseExpr
    const = digit, {digit};
    spaces = {' ' | newline};
    atom = const | '(', spaces, expr, spaces, ')', spaces;
    mult = atom, {spaces, '*', spaces, atom}, spaces;
    plus = mult, {spaces, '+', spaces, mult}, spaces;
    expr = plus;
    Recursion (variant 1)

    Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:

    newtype Grammar a = G ([String] -> (BNF, RHS))

    The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:

    nonTerminalG :: String -> (Grammar a) -> Grammar a
    nonTerminalG name (G g) = G $ \seen ->
        if name `elem` seen
        then ([], NonTerminal name)
        else let (prods, rhs) = g (name : seen)
             in (prods ++ [(name, rhs)], NonTerminal name)

    After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:

    parseExp :: Descr f => f Expr
    parseExp = nonTerminal "expr" $
    ePlus :: Descr f => f Expr
    ePlus = nonTerminal "plus" $
        mkPlus <$> eMult
               <*> many (spaces *> char '+' *> spaces *> eMult)
               <*  spaces
    eMult :: Descr f => f Expr
    eMult = nonTerminal "mult" $
        mkPlus <$> eAtom
               <*> many (spaces *> char '*' *> spaces *> eAtom)
               <*  spaces
    eAtom :: Descr f => f Expr
    eAtom = nonTerminal "atom" $
        aConst `orElse` eParens parseExp

    Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.


    If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:

    identifier = letter, {letter | digit | '-'};
    spaces = {' ' | newline};
    quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\'';
    terminal = '\'', {quoted-char}, '\'', spaces;
    non-terminal = identifier, spaces;
    option = '[', spaces, rhs, spaces, ']', spaces;
    repetition = '{', spaces, rhs, spaces, '}', spaces;
    group = '(', spaces, rhs, spaces, ')', spaces;
    atom = terminal | non-terminal | option | repetition | group;
    sequence = atom, {spaces, ',', spaces, atom}, spaces;
    choice = sequence, {spaces, '|', spaces, sequence}, spaces;
    rhs = choice;
    production = identifier, spaces, '=', spaces, rhs, ';', spaces;
    bnf = production, {production};

    This grammer is set up so that the precedence of , and | is correctly implemented: a , b | c will parse as (a, b) | c.

    In this syntax for BNF, terminal characters are quoted, i.e. inside '…', a ' is replaced by \' and a \ is replaced by \\ – this is done by the function quote in ppRHS.

    If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:

    *Main> let bnf1 = runGrammer "expr" parseExpr
    *Main> let bnf2 = runGrammer "expr" parseBNF
    *Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
    *Main> f bnf1 == bnf1
    *Main> f bnf2 == bnf2

    The last line is quite meta: We are unsing parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.


    We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.

    Note that it was crucial that our combined parser/grammers are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful.

    The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!

    We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonse. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.

    1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.

    2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.


    Creative Commons License ลิขสิทธิ์ของบทความเป็นของเจ้าของบทความแต่ละชิ้น
    ผลงานนี้ ใช้สัญญาอนุญาตของครีเอทีฟคอมมอนส์แบบ แสดงที่มา-อนุญาตแบบเดียวกัน 3.0 ที่ยังไม่ได้ปรับแก้