Planet Debian

Subscribe to Planet Debian feed
Planet Debian - http://planet.debian.org/
Updated: 18 min 6 sec ago

Ben Hutchings: Debian LTS work, February 2017

16 March, 2017 - 11:44

I was assigned 13 hours of work by Freexian's Debian LTS initiative and carried over 15.25 from January. I worked 19 hours and have returned the remaining 9.25 hours to the general pool.

I prepared a security update for the Linux kernel and issued DLA-833-1. However, I spent most of my time catching up with a backlog of fixes for the Linux 3.2 longterm stable branch. I issued two stable updates (3.2.85, 3.2.86).

Dirk Eddelbuettel: RcppEigen 0.3.2.9.1

15 March, 2017 - 18:40

A new maintenance release 0.3.2.9.1 of RcppEigen, still based on Eigen 3.2.9 is now on CRAN and is now going into Debian soon.

This update ensures that RcppEigen and the Matrix package agree on their #define statements for the CholMod / SuiteSparse library. Thanks to Martin Maechler for the pull request. I also added a file src/init.c as now suggested (soon: requested) by the R CMD check package validation.

The complete NEWS file entry follows.

Changes in RcppEigen version 0.3.2.9.1 (2017-03-14)
  • Synchronize CholMod header file with Matrix package to ensure binary compatibility on all platforms (Martin Maechler in #42)

  • Added file init.c with calls to R_registerRoutines() and R_useDynamicSymbols(); also use .registration=TRUE in useDynLib in NAMESPACE

Courtesy of CRANberries, there is also a diffstat report for the most recent release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Michal Čihař: Life of free software project

15 March, 2017 - 18:00

During last week I've noticed several interesting posts about challenges being free software maintainer. After being active in open source for 16 years I can share much of the feelings I've read and I can also share my dealings with the things.

First of all let me link some of the other posts on the topic:

I guess everybody involved in in some popular free software project knows it - there is much more work to be done than people behind the project can handle. It really doesn't matter it those are bug reports, support requests, new features or technical debt, it's simply too much of that. If you are the only one behind the project it can feel even more pressing.

There can be several approaches how to deal with that, but you have to choose what you prefer and what is going to work for you and your project. I've used all of the below mentioned approaches on some of the projects, but I don't think there is a silver bullet.

Finding more people

Obviously if you can not cope with the work, let's find more people to do the work. Unfortunately it's not that easy. Sometimes people come by, contribute few patches, but it's not that easy to turn them into regular contributor. You should encourage them to stay and to care about the part of the project they have touched.

You can try to attract completely new contributors through programs as Google Summer of Code (GSoC) or Outreachy, but that has it's own challenges as well.

With phpMyAdmin we're participating regularly in GSoC (we've only missed last year as we were not chosen by Google that year) and it indeed helps to bring new people on the board. Many of them even stay around your project (currently 3 of 5 phpMyAdmin team members are former GSoC students). But I think this approach really works only for bigger organizations.

You can also motivate people by money. It's way which is not really much used on free software projects, partly because lack of funding (I'll get to that later) and partly because it doesn't necessarily bring long time contributors, just cash hunters. I've been using Bountysource for some of my projects (Weblate and Gammu) and so far it mostly works other way around - if somebody posts bounty on the issue, it means it's quite important for him to get that fixed, so I use that as indication for myself. On attracting new developers it never really worked well, even when I've tried to post bounties to some easy to fix issues, where newbies could learn our code base and get paid for that. These issues stayed opened for months and in the end I've fixed them myself because they annoyed me.

Don't care too much

I think this is most important aspect - you simply can never fix all the problems. Let's face it and work according to that. There can be various levels of don't caring. I find it always better to try to encourage people to fix their problem, but you can't expect big success rate in that, so you might find it not worth of the time.

What I currently do:

  • I often ignore direct emails asking for fixing something. The project has public issue tracker on purpose. Once you solve the issue there others will have chance to find it when they face similar problem. Solving things privately in mails will probably make you look at similar problems again and again.
  • I try to batch process things. It is really easier to get focused when you work on one project and do not switch contexts. This means people will have to wait until you get to their request, but it also means that you will be able to deal them much more effectively. This is why Free hosting requests for Hosted Weblate get processed once in a month.
  • I don't care about number of unread mails, notifications or whatever. Or actually I try to not get much of these at all. This is really related to above, I might to some things once in a month (or even less) and that's still okay. Maybe you're just getting notifications for things you really don't need to get notified on? Do you really need notification for new issues? Isn't it better just to look at the issue tracker once in a time than constantly feeling the pressure of not read notifications?
  • I don't have to fix every problem. When it seems like something what could be as well fixed by the reporter, I just try to give them guidance how to dig deeper into the issue. Obviously this can't work for all cases, but getting more people on board always helps.
  • I try to focus on things which can save time in future. Many issues turn out to be just some unclear things and once you figure out that, spend few more minutes to improve your documentation to cover that. It's quite likely that this will save your time in future.

If you still can't handle that, you should consider abandoning the project as well. Does it bring something to you other than frustration of not completed work? I know it can be hard decision, in the end it is your child, but sometimes it's the best think you can do.

Get paid to do the work

Are you doing your fulltime job and then work on free software on nights or weekends? It can probably work for some time, but unless you find some way to make these two match, you will lack free time to relax and spend with friends or family. There are several options to make these work together.

You can find job where doing free software will be natural part of it. This worked for me pretty well at SUSE, but I'm sure there are more companies where it will work. It can happen that the job will not cover all your free software activities, but this still helps.

You can also make your project to become your employer. This can be sometimes challenging to make volunteers and paid contractors to work on one project, but I think this can be handled. Such setup currently works currently quite well for phpMyAdmin (we will announce second contractor soon) and works quite well for me with Weblate as well.

Funding free software projects

Once your project is well funded, you can fix many problems by money. You can pay yourself to do the work, hire additional developers, get better infrastructure or travel to conferences to spread word about it. But the question is how to get to the point of being well funded.

There are several crowdfunding platforms which can help you with that (Liberapay, Bountysource salt, Gratipay or Snowdrift to mention some). You can also administer the funding yourself or using some legal entity such as Software Freedom Conservancy which handles this for phpMyAdmin.

But the most important thing is to persuade people and companies to give back. You know there are lot of companies relying on your project, but how to make them fund the project? I really don't know, I still struggle with this as I don't want to be too pushy in asking for money, but I'd really like to see them to give back.

What kind of works is giving your sponsors logo / link placement on your website. If your website is well ranked, you can expect to get quite a lot of SEO sponsors and the question is where to draw a line what you still find acceptable. Obviously the most willing to pay companies will have nothing to do with what you do and they just want to get the link. The industry you can expect is porn, gambling, binary options and various MFA sites. You will get some legitimate sponsors related to your project as well. We felt we've gone too far with phpMyAdmin last year and we've stricten the rules recently, but the outcome is still not visible on our website (as we've just limited new sponsors, but existing contracts will be honored).

Another option is to monetize your project more directly. You can offer consulting services or provide it as a service (this is what I currently do with Weblate). It really depends on the product if you can build customer base on that or not, but certainly this is not something what would work well for all projects.

Thanks for reading this and I hope it's not too chaotic, as I've moved parts there and back while writing and I'm afraid it got too long in the end.

Filed under: Debian English Gammu phpMyAdmin SUSE Weblate | 0 comments

Bits from Debian: Build Android apps with Debian: apt install android-sdk

15 March, 2017 - 18:00

In Debian stretch, the upcoming new release, it is now possible to build Android apps using only packages from Debian. This will provide all of the tools needed to build an Android app targeting the "platform" android-23 using the SDK build-tools 24.0.0. Those two are the only versions of "platform" and "build-tools" currently in Debian, but it is possible to use the Google binaries by installing them into /usr/lib/android-sdk.

This doesn't cover yet all of the libraries that are used in the app, like the Android Support libraries, or all of the other myriad libraries that are usually fetched from jCenter or Maven Central. One big question for us is whether and how libraries should be included in Debian. All the Java libraries in Debian can be used in an Android app, but including something like Android Support in Debian would be strange since they are only useful in an Android app, never for a Debian app.

Building apps with these packages

Here are the steps for building Android apps using Debian's Android SDK on Stretch.

  1. sudo apt install android-sdk android-sdk-platform-23
  2. export ANDROID_HOME=/usr/lib/android-sdk
  3. In build.gradle, set compileSdkVersion to 23 and buildToolsVersion to 24.0.0
  4. run gradle build

The Gradle Android Plugin is also packaged. Using the Debian package instead of the one from online Maven repositories requires a little configuration before running gradle. In the buildscript block:

  • add maven { url 'file:///usr/share/maven-repo' } to repositories
  • use compile 'com.android.tools.build:gradle:debian' to load the plugin

Currently there is only the target platform of API Level 23 packaged, so only apps targeted at android-23 can be built with only Debian packages. There are plans to add more API platform packages via backports. Only build-tools 24.0.0 is available, so in order to use the SDK, build scripts need to be modified. Beware that the Lint in this version of Gradle Android Plugin is still problematic, so running the :lint tasks might not work. They can be turned off with lintOptions.abortOnError in build.gradle. Google binaries can be combined with the Debian packages, for example to use a different version of the platform or build-tools.

Why include the Android SDK in Debian?

While Android developers could develop and ship apps right now using these Debian packages, this is not very flexible since only build-tools-24.0.0 and android-23 platform are available. Currently, the Debian Android Tools Team is not aiming to cover the most common use cases. Those are pretty well covered by Google's binaries (except for the proprietary license on the Google binaries), and are probably the most work for the Android Tools Team to cover. The current focus is on use cases that are poorly covered by the Google binaries, for example, like where only specific parts of the whole SDK are used. Here are some examples:

  • tools for security researchers, forensics, reverse engineering, etc. which can then be included in live CDs and distros like Kali Linux
  • a hardened APK signing server using apksigner that uses a standard, audited, public configuration of all reproducibly built packages
  • Replicant is a 100% free software Android distribution, so of course they want to have a 100% free software SDK
  • high security apps need a build environment that matches their level of security, the Debian Android Tools packages are reproducibly built only from publicly available sources
  • support architectures besides i386 and amd64, for example, the Linaro LAVA setup for testing ARM devices of all kinds uses the adb packages on ARM servers to make their whole testing setup all ARM architecture
  • dead simple install with strong trust path with mirrors all over the world

In the long run, the Android Tools Team aims to cover more use cases well, and also building the Android NDK. This all will happen more quickly if there are more contributors on the Android Tools team! Android is the most popular mobile OS, and can be 100% free software like Debian. Debian and its derivatives are one of the most popular platforms for Android development. This is an important combination that should grow only more integrated.

Last but not least, the Android Tools Team wants feedback on how this should all work, for example, ideas for how to nicely integrate Debian's Java libraries into the Android gradle workflow. And ideally, the Android Support libraries would also be reproducibly built and packaged somewhere that enforces only free software. Come find us on IRC and/or email! https://wiki.debian.org/AndroidTools#Communication_Channels

Keith Packard: Valve

15 March, 2017 - 02:10
Consulting for Valve in my spare time

Valve Software has asked me to help work on a couple of Linux graphics issues, so I'll be doing a bit of consulting for them in my spare time. It should be an interesting diversion from my day job working for Hewlett Packard Enterprise on Memory Driven Computing and other fun things.

First thing on my plate is helping support head-mounted displays better by getting the window system out of the way. I spent some time talking with Dave Airlie and Eric Anholt about how this might work and have started on the kernel side of that. A brief synopsis is that we'll split off some of the output resources from the window system and hand them to the HMD compositor to perform mode setting and page flips.

After that, I'll be working out how to improve frame timing reporting back to games from a composited desktop under X. Right now, a game running on X with a compositing manager can't tell when each frame was shown, nor accurately predict when a new frame will be shown. This makes smooth animation rather difficult.

John Goerzen: Parsing the GOP’s Health Insurance Statistics

14 March, 2017 - 22:35

There has been a lot of noise lately about the GOP health care plan (AHCA) and the differences to the current plan (ACA or Obamacare). A lot of statistics are being misinterpreted.

The New York Times has an excellent analysis of some of this. But to pick it apart, I want to highlight a few things:

Many Republicans are touting the CBO’s estimate that, some years out, premiums will be 10% lower under their plan than under the ACA. However, this carries with it a lot of misleading information.

First of all, many are spinning this as if costs would go down. That’s not the case. The premiums would still rise — they would just have risen less by the end of the period than under ACA. That also ignores the immediate spike and throwing millions out of the insurance marketplace altogether.

Now then, where does this 10% number come from? First of all, you have to understand the older people are substantially more expensive to the health system, and therefore more expensive to insure. ACA limited the price differential from the youngest to the oldest people, which meant that in effect some young people were subsidizing older ones on the individual market. The GOP plan removes that limit. Combined with other changes in subsidies and tax credits, this dramatically increases the cost to older people. For instance, the New York Times article cites a CBO estimate that “the price an average 64-year-old earning $26,500 would need to pay after using a subsidy would increase from $1,700 under Obamacare to $14,600 under the Republican plan.”

They further conclude that these exceptionally high rates would be so unaffordable to older people that the older people will simply stop buying insurance on the individual market. This means that the overall risk pool of people in that market is healthier, and therefore the average price is lower.

So, to sum up: the reason that insurance premiums under the GOP plan will rise at a slightly slower rate long-term is that the higher-risk people will be unable to afford insurance in the first place, leaving only the cheaper people to buy in.

Reproducible builds folks: Reproducible Builds: week 98 in Stretch cycle

14 March, 2017 - 13:41

Here's what happened in the Reproducible Builds effort between Sunday March 5 and Saturday March 11 2017:

Upcoming events Packages filed

Chris Lamb:

Toolchain development
  • Guillem Jover uploaded dpkg 1.18.23 to unstable, declaring .buildinfo format 1.0 as "stable".

  • Jams McCoy uploaded devscripts 2.17.2 to unstable addingd support for .buildinfo files to the debsign utility via patches from Ximin Luo and Guillem Jover.

  • Hans-Christoph Steiner noted that the first reproducibility-related patch in the Android SDK was marked as confirmed.

Reviews of unreproducible packages

39 package reviews have been added, 7 have been updated and 9 have been removed in this week, adding to our knowledge about identified issues.

2 issue types have been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (2)
buildinfo.debian.net development reproducible-website development tests.reproducible-builds.org

Hans-Christoph Steiner gave a progress report on testing F-Droid: we now have a complete vagrant workflow working in nested KVM! So we can provision a new KVM guest, then package it using vagrant box all inside of a KVM guest (which is a profitbricks build node). So we finally have a working setup on jenkins.debian.net. Next up is fixing bugs in our libvirt snapshoting support.

Misc.

This week's edition was written by Chris Lamb, Holger Levsen, Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Sean Whitton: Initial views of 5th edition DnD

14 March, 2017 - 06:37

I’ve been playing in a 5e campaign for around two months now. In the past ten days or so I’ve been reading various source books and Internet threads regarding the design of 5th edition. I’d like to draw some comparisons and contrasts between 5th edition, and the 3rd edition family of games (DnD 3.5e and Paizo’s Pathfinder, which may be thought of as 3.75e).

The first thing I’d like to discuss is that wizards and clerics are no longer Vancian spellcasters. In rules terms, this is the idea that individual spells are pieces of ammunition. Spellcasters have a list of individual spells stored in their heads, and as they cast spells from that list, they cross off each item. Barring special rules about spontaneously converting prepared spells to healing spells, for clerics, the only way to add items back to the list is to take a night’s rest. Contrast this with spending points from a pool of energy in order to use an ability to cast a fireball. Then the limiting factor on using spells is having enough points in your mana pool, not having further castings of the spell waiting in memory.

One of the design goals of 5th edition was to reduce the dominance of spellcasters at higher levels of play. The article to which I linked in the previous paragraph argues that this rebalancing requires the removal of Vancian magic. The idea, to the extent that I’ve understood it, is that Vancian magic is not an effective restriction on spellcaster power levels, so it is to be replaced with other restrictions—adding new restrictions while retaining the restrictions inherent in Vancian magic would leave spellcasters crippled.

A further reason for removing Vancian magic was to defeat the so-called “five minute adventuring day”. The compat ability of a party that contains higher level Vancian spellcasters drops significantly once they’ve fired off their most powerful combat spells. So adventuring groups would find themselves getting into a fight, and then immediately retreating to fully rest up in order to get their spells back. This removes interesting strategic and roleplaying possibilities involving the careful allocation of resources, and continuing to fight as hit points run low.

There are some other related changes. Spell components are no longer used up when casting a spell. So you can use one piece of bat guano for every fireball your character ever casts, instead of each casting requiring a new piece. Correspondingly, you can use a spell focus, such as a cool wand, instead of a pouch full of material components—since the pouch never runs out, there’s no mechanical change if a wizard uses an arcane focus instead. 0th level spells may now be cast at will (although Pathfinder had this too). And there are decent 0th level attack spells, so a spellcaster need not carry a crossbow or shortbow in order to have something to do on rounds when it would not be optimal to fire off one of their precious spells.

I am very much in favour of these design goals. The five minute adventuring day gets old fast, and I want it to be possible for the party to rely on the cool abilities of non-spellcasters to deal with the challenges they face. However, I am concerned about the flavour changes that result from the removal of Vancian magic. These affect wizards and clerics differently, so I’ll take each case in turn.

Firstly, consider wizards. In third edition, a wizard had to prepare and cast Read Magic (the only spell they could prepare without a spellbook), and then set about working through their spellbook. This involved casting the spells they wanted to prepare, up until the last few triggering words or gestures that would cause the effect of the spell to manifest. They would commit these final parts of the spell to memory. When it came to casting the spell, the wizard would say the final few words and make the required gestures, and bring out relevant material components from their component pouch. The completed spell would be ripped out of their mind, to manifest its effect in the world. We see that the casting of a spell is a highly mentally-draining activity—it rips the spell out of the caster’s memory!—not to be undertaken lightly. Thus it is natural that a wizard would learn to use a crossbow for basic damage-dealing. Magic is not something that comes very naturally to the wizard, to be deployed in combat as readily as the fighter swings their sword. They are not a superhero or video game character, “pew pew”ing their way to victory. This is a very cool starting point upon which to roleplay an academic spellcaster, not really available outside of tabletop games. I see it as a distinction between magical abilities and real magic.

Secondly, consider clerics. Most of the remarks in the previous paragraph apply, suitably reworked to be in terms of requesting certain abilities from the deity to whom the cleric is devoted. Additionally, there is the downgrading of the importance of the cleric’s healing magic in 5th edition. Characters can heal themselves by taking short and long rests. Previously, natural healing was very slow, so a cleric would need to convert all their remaining magic to healing spells at the end of the day, and hope that it was enough to bring the party up to fighting shape. Again, this made the party of adventurers seem less like superheroes or video game characters. Magic had a special, important and unique role, that couldn’t be replaced by the abilities of other classes.

There are some rules in the back of the DMG—“Slow Natural Healing”, “Healing Kit Dependency”, “Lingering Wounds”—which can be used to make healing magic more important. I’m not sure how well they would work without changes to the cleric class.

I would like to find ways to restore the feel and flavour of Vancian clerics and wizards to 5th edition, without sacrificing the improvements that have been made that let other party members do cool stuff too. I hope it is possible to keep magic cool and unique without making it dominate the game. It would be easy to forbid the use of arcane foci, and say that material component pouches run out if the party do not visit a suitable marketplace often enough. This would not have a significant mechanical effect, and could enhance roleplaying possibilities. I am not sure how I could deal with the other issues I’ve discussed without breaking the game.

The second thing I would like to discuss is bounded accuracy. Under this design principle, the modifiers to dice rolls grow much more slowly. The gain of hit points remains unbounded. Under third edition, it was mechanically impossible for a low-level monster to land a hit on a higher-level adventurer, rendering them totally useless even in overwhelming numbers. With bounded accuracy, it’s always possible for a low-level monster to hit a PC, even if they do insigificant damage. That means that multiple low-level monsters pose a threat.

This change opens up many roleplaying opportunities by keeping low-level character abilities relevant, as well as monster types that can remain involves in stories without giving them implausible new abilities so they don’t fall far behind the PCs. However, I’m a little worried that it might make high level player characters feel a lot less powerful to play. I want to cease a be a fragile adventurer and become a world-changing hero at later levels, rather than forever remain vulnerable to the things that I was vulnerable to at the start of the game. This desire might just be the result of the video games which I played growing up. In the JRPGs I played and in Diablo II, enemies in earlier areas of the map were no threat at all once you’d levelled up by conquering higher-level areas. My concerns about bounded accuracy might just be that it clashes with my own expectations of how fantasy heroes work. A good DM might be able to avoid these worries entirely.

The final thing I’d like to discuss is the various simplifications to the rules of 5th edition, when it is compared with 3rd edition and Pathfinder. Attacks of opportunity are only provoked when leaving a threatened square; you can go ahead and cast a spell when in melee with someone. There is a very short list of skills, and party members are much closer to each other in skills, now that you can’t pump more and more ranks into one or two abilities. Feats as a whole are an optional rule.

At first I was worried about these simplifications. I thought that they might make character building and tactics in combat a lot less fun. However, I am now broadly in favour of all of these changes, for two reasons. Firstly, they make the game so much more accessible, and make it far more viable to play without relying on a computer program to fill in the boxes on your character sheet. In my 5th edition group, two of us have played 3rd edition games, and the other four have never played any tabletop games before. But nobody has any problems figuring out their modifiers because it is always simply your ability bonus or penalty, plus your proficiency bonus if relevant. And advantage and disadvantage is so much more fun than getting an additional plus or minus two. Secondly, these simplifications downplay the importance of the maths, which means it is far less likely to be broken. It is easier to ensure that a smaller core of rules is balanced than it is to keep in check a larger mass of rules, constantly being supplemented by more and more addon books containing more and more feats and prestige classes. That means that players make their characters cool by roleplaying them in interesting ways, not making them cool by coming up with ability combos and synergies in advance of actually sitting down to play. Similarly, DMs can focus on flavouring monsters, rather than writing up longer stat blocks.

I think that this last point reflects what I find most worthwhile about tabletop RPGs. I like characters to encounter cool NPCs and cool situations, and then react in cool ways. I don’t care that much about character creation. (I used to care more about this, but I think it was mainly because of interesting options for magic items, which hasn’t gone away.) The most important thing is exercising group creativity while actually playing the game, rather than players and DMs having to spend a lot of time preparing the maths in advance of playing. Fifth edition enables this by preventing the rules from getting in the way, because they’re broken or overly complex. I think this is why I love Exalted: stunting is vital, and there is social combat. I hope to be able to work out a way to restore Vancian magic, but even without that, on balance, fifth edition seems like a better way to do group storytelling about fantasy heroes. Hopefully I will have an opportunity to DM a 5th edition campaign. I am considering disallowing all homebrew and classes and races from supplemental books. Stick to the well-balanced core rules, and do everything else by means of roleplaying and flavour. This is far less gimmicky, if more work for unimaginative players (such as myself!).

Some further interesting reading:

Ross Gammon: February 2017 – My Free Software activities summary

14 March, 2017 - 04:06

When I sat down to write this blog, I thought I hadn’t got much done in February. But as it took  me quite a while to write up, there must have actually been a little bit of progress. With my wife starting a new job, there have been some adjustments in family life, and I have struggled just to keep up with all the Debian and Ubuntu emails. Anyway……..

Debian Ubuntu
  • Tested Ubuntu Studio 16.02.2 point release, marked as ready, and updated the Release Notes.
  • Started updating my previous Gramps backport in Ubuntu to Gramps 4.2.5. The package builds fine, and I have tested that it installs and works. I just need to update the bug.
  • Prepared updates to the ubuntustudio-default-settings & ubuntustudio-meta packages. There were some deferred changes from before Yakkety was released, including moving the final bit of configuration left in the ubuntustudio-lightdm-theme package to ubuntustudio-default-settings. Jeremy Bicha sponsored the uploads after suggesting moving away from some transitional ttf font packages in ubuntustudio-meta.
  • Tested the Ubuntu Studio 17.04 First Beta release, marked as ready, and prepared the Release Notes.
  • Upgraded my music studio Ubuntu Studio computer to Yakkety 16.1o.
  • Got accepted as an Ubuntu Contributing Developer by the Developer Membership Board.
Other
  • After a merge of my Family Tree with the Family Tree of my wife in Gramps a long way back, I finally started working through the database merging duplicates and correcting import errors.
  • Worked some more on the model railway, connecting up the other end of the tunnel section with the rest of the railway.
Plan status from last month & update for next month Debian

For the Debian Stretch release:

  • Keep an eye on the Release Critical bugs list, and see if I can help fix any. – In Progress

Generally:

  • Finish the Gramps 5.2.5 backport for Jessie. – Done
  • Package all the latest upstream versions of my Debian packages, and upload them to Experimental to keep them out of the way of the Stretch release.
  • Begin working again on all the new stuff I want packaged in Debian.
Ubuntu
  • Finish the ubuntustudio-lightdm-theme, ubuntustudio-default-settings transition including an update to the ubuntustudio-meta packages. – Done
  • Reapply to become a Contributing Developer. – Done
  • Start working on an Ubuntu Studio package tracker website so that we can keep an eye on the status of the packages we are interested in. – Started
  • Start testing & bug triaging Ubuntu Studio packages. – In progress
  • Test Len’s work on ubuntustudio-controls – In progress
  • Do the Ubuntu Studio Zesty 17.04 Final Beta release.
Other
  • Give JMRI a good try out and look at what it would take to package it. – In progress
  • Also look at OpenPLC for simulating the relay logic of real railway interlockings (i.e. a little bit of the day job at home involving free software – fun!). – In progress

Michal Čihař: Weblate users survey

13 March, 2017 - 18:00

Weblate is growing quite well in last months, but sometimes it's development is really driven by people who complain instead of following some roadmap with higher goals. I think it's time to change it at least a little bit. In order to get broader feedback I've sent out short survey to active project owners in Hosted Weblate week ago.

I've decided to target at smaller audience for now, though publicly open survey might follow later (but it's always harder to evaluate feedback across different user groups).

Overall feelings were really positive, most people find Weblate better than other similar services they have used. This is really something I like to hear :-).

But the most important part for me was where users want to see improvements. This somehow matches my expectation that we really should improve the user interface.

We have quite a lot features, which are really hidden in the user interface. Also interface for some of the features is far from being intuitive. This all probably comes from the fact that we really don't have anybody experienced with creating user interfaces right now. It's time to find somebody who will help us. In case you are able to help or know somebody who might be interested in helping, please get in touch. Weblate is free software, but this can still be paid job.

Last part of the survey was focused on some particular features, but the outcome was not as clear as I hoped for as almost all feature group attracted about same attention (with one exception being extending the API, which was not really wanted by most of the users).

Overall I think doing some survey like this is useful and I will certainly repeat it (probably yearly or so), to see where we're moving and what our users want. Having feedback from users is important for every project and this seemed to worked quite well. Anyway if you have further feedback, don't hesitate to use our issue tracker at GitHub or contact me directly.

Filed under: Debian English phpMyAdmin SUSE Weblate | 0 comments

Iustin Pop: A recipe for success

13 March, 2017 - 05:38

It is said that with age comes wisdom. I would be happy for that to be true, because today I must have been very very young then.

For example, if you want to make a long bike ride in order to hit some milestone, like your first metric century, it is not indicated to follow ANY of the following points:

  • instead of doing this in the season, when you're fit, wait over the winter, during which you should indulge in food and drink with only an occasional short bike ride, so that most of your fitness is gone and replaced by a few extra kilograms;
  • instead of choosing a flat route that you've done before, extending it a bit to hit the target distance, think about taking the route from one of the people you follow on Strava (and I mean real cyclists here); bonus points if you choose one they mention was about training instead of a freeride and gave it a meaningful name like "The ride of 3 peaks", something with 1'500m+ altitude gain…
  • in order to not get bogged down by too much by extra weight (those winter kilograms are enough!), skimp on breakfast (just a very very light one); together with the energy bar you eat, something like 400 calories…
  • take the same amount of food you take for much shorter and flatter rides; bonus points if you don't check the actual calories in the food, and instead of the presumed 700+ calories you think you're carrying (which might be enough, if you space them correctly, given how much you can absorb per hour), take at most 300 calories with you, because hey, your body is definitely used with long efforts in which you convert fat to energy on the fly, right? especially after said winter pause!
  • since water is scarce in the Swiss outdoors (not!), especially when doing a road bike ride, carry lots of water with you (full hydro-pack, 3l) instead of an extra banana or energy bar, or a sandwich, or nuts, or a steak… mmmm, steak!
  • and finally and most importantly don't do the ride indoors on the trainer, even though it can pretty realistically simulate the effort, but instead do it for real outside, where you can't simply stop when you had enough, because you have to get back home…

For bonus points, if you somehow manage to reach the third peak in the above ride, and have mostly only flat/down to the destination, do the following: be so glad you're done with climbing, that you don't pay attention to the map and start a wrong descent, on a busy narrow road, so that you can't stop immediately as you realise you've lost the track; it will cost you only an extra ~80 meters of height towards the end of the ride. Which are pretty cheap, since all the food is gone and the water almost as well, so the backpack is light. Right.

However, if you do follow all the above, you're rewarded with a most wonderful thing for the second half of the ride: your will receive a +5 boost on your concentration skill. You will be able to focus on, and think about a single thing for hours at a time, examining it (well, its contents) in minute detail.

Plus, when you get home and open that thing—I mean, of course, the FRIDGE with all the wonderful FOOD it contains—everything will taste MAGICAL! You can now recoup the roughly 1500 calories deficit on the ride, and finally no longer feel SO HUNGRY.

That's all. Strava said "EXTREME" suffer score, albeit less than 20% points in the red, which means I was just slugging through the ride (total time confirms it), like a very very very old man. But definitely not a wise one.

Mike Hommey: When the memory allocator works against you

12 March, 2017 - 08:47

Cloning mozilla-central with git-cinnabar requires a lot of memory. Actually too much memory to fit in a 32-bits address space.

I hadn’t optimized for memory use in the first place. For instance, git-cinnabar keeps sha-1s in memory as hex values (40 bytes) rather than raw values (20 bytes). When I wrote the initial prototype, it didn’t matter that much, and while close(ish) to the tipping point, it didn’t require more than 2GB of memory at the time.

Time passed, and mozilla-central grew. I suspect the recent addition of several thousands of commits and files has made things worse.

In order to come up with a plan to make things better (short or longer term), I needed data. So I added some basic memory resource tracking, and collected data while cloning mozilla-central.

I must admit, I was not ready for what I witnessed. Follow me for a tale of frustrations (plural).

I was expecting things to have gotten worse on the master branch (which I used for the data collection) because I am in the middle of some refactoring and did many changes that I was suspecting might have affected memory usage. I wasn’t, however, expecting to see the clone command using 10GB(!) memory at peak usage across all processes.

(Note, those memory sizes are RSS, minus “shared”)

It also was taking an unexpected long time, but then, I hadn’t cloned a large repository like mozilla-central from scratch in a while, so I wasn’t sure if it was just related to its recent growth in size or otherwise. So I collected data on 0.4.0 as well.

Less time spent, less memory usage… ok. There’s definitely something wrong on master. But wait a minute, that slope from ~2GB to ~4GB on the git-remote-hg process doesn’t actually make any kind of sense. I mean, I’d understand it if it were starting and finishing with the “Import manifest” phase, but it starts in the middle of it, and ends long before it finishes. WTH?

First things first, since RSS can be a variety of things, I checked /proc/$pid/smaps and confirmed that most of it was, indeed, the heap.

That’s the point where you reach for Google, type something like “python memory profile” and find various tools. One from the results that I remembered having used in the past is guppy’s heapy.

Armed with pdb, I broke execution in the middle of the slope, and tried to get memory stats with heapy. SIGSEGV. Ouch.

Let’s try something else. I reached out to objgraph and pympler. SIGSEGV. Ouch again.

Tried working around the crashes for a while (too long while, retrospectively, hindsight is 20/20), and was somehow successful at avoiding them by peaking at a smaller set of objects. But whatever I did, despite being attached to a process that had 2.6GB RSS, I wasn’t able to find more than 1.3GB of data. This wasn’t adding up.

It surely didn’t help that getting to that point took close to an hour each time. Retrospectively, I wish I had investigated using something like Checkpoint/Restore in Userspace.

Anyways, after a while, I decided that I really wanted to try to see the whole picture, not smaller peaks here and there that might be missing something. So I resolved myself to look at the SIGSEGV I was getting when using pympler, collecting a core dump when it happened.

Guess what? The Debian python-dbg package does not contain the debug symbols for the python package. The core dump was useless.

Since I was expecting I’d have to fix something in python, I just downloaded its source and built it. Ran the command again, waited, and finally got a backtrace. First Google hit for the crashing function? The exact (unfixed) crash reported on the python bug tracker. No patch.

Crashing code is doing:

((f)->f_builtins != (f)->f_tstate->interp->builtins)

And (f)->f_tstate is NULL. Classic NULL deref.

Added a guard (assessing it wouldn’t break anything). Ran the command again. Waited. Again. SIGSEGV.

Facedesk. Another crash on the same line. Did I really use the patched python? Yes. But this time (f)->f_tstate->interp is NULL. Sigh.

Same player, shoot again.

Finally, no crash… but still stuck on only 1.3GB accounted for. Ok, I know not all python memory profiling tools are entirely reliable, let’s try heapy again. SIGSEGV. Sigh. No debug info on the heapy module, where the crash happens. Sigh. Rebuild the module with debug info, try again. The backtrace looks like heapy is recursing a lot. Look at %rsp, compare with the address space from /proc/$pid/maps. Confirmed. A stack overflow. Let’s do ugly things and increase the stack size in brutal ways.

Woohoo! Now heapy tells me there’s even less memory used than the 1.3GB I found so far. Like, half less. Yeah, right.

I’m not clear on how I got there, but that’s when I found gdb-heap, a tool from Red Hat’s David Malcolm, and the associated talk “Dude, where’s my RAM?” A deep dive into how Python uses memory (slides).

With a gdb attached, I would finally be able to rip python’s guts out and find where all the memory went. Or so I thought. The gdb-heap tool only found about 600MB. About as much as heapy did, for that matter, but it could be coincidental. Oh. Kay.

I don’t remember exactly what went through my mind then, but, since I was attached to a running process with gdb, I typed the following on the gdb prompt:

gdb> call malloc_stats()

And that’s when the truth was finally unvealed: the memory allocator was just acting up the whole time. The ouput was something like:

Arena 0:
system bytes    =  some number above (but close to) 2GB
in use bytes    =  some number above (but close to) 600MB

Yes, the glibc allocator was just telling it had allocated 600MB of memory, but was holding onto 2GB. I must have found a really bad allocation pattern that causes massive fragmentation.

One thing that David Malcolm’s talk taught me, though, is that python uses its own allocator for small sizes, so the glibc allocator doesn’t know about them. And, roughly, adding the difference between RSS and what glibc said it was holding to to the use bytes it reported somehow matches the 1.3GB I had found so far.

So it was time to see how those things evolved in time, during the entire clone process. I grabbed some new data, tracking the evolution of “system bytes” and “in use bytes”.

There are two things of note on this data:

  • There is a relatively large gap between what the glibc allocator says it has gotten from the system, and the RSS (minus “shared”) size, that I’m expecting corresponds to the small allocations that python handles itself.
  • Actual memory use is going down during the “Import manifests” phase, contrary to what the evolution of RSS suggests.

In fact, the latter is exactly how git-cinnabar is supposed to work: It reads changesets and manifests chunks, and holds onto them while importing files. Then it throws away those manifests and changesets chunks one by one while it imports them. There is, however, some extra bookkeeping that requires some additional memory, but it’s expected to be less memory consuming than keeping all the changesets and manifests chunks in memory.

At this point, I thought a possible explanation is that since both python and glibc are mmap()ing their own arenas, they might be intertwined in a way that makes things not go well with the allocation pattern happening during the “Import manifest” phase (which, in fact, allocates and frees increasingly large buffers for each manifest, as manifests grow in size in the mozilla-central history).

To put the theory at work, I patched the python interpreter again, making it use malloc() instead of mmap() for its arenas.

“Aha!” I thought. That definitely looks much better. Less gap between what glibc says it requested from the system and the RSS size. And, more importantly, no runaway increase of memory usage in the middle of nowhere.

I was preparing myself to write a post about how mixing allocators could have unintended consequences. As a comparison point, I went ahead and ran another test, with the python allocator entirely disabled, this time.

Heh. It turns out glibc was acting up all alone. So much for my (plausible) theory. (I still think mixing allocators can have unintended consequences.)

(Note, however, that the reason why the python allocator exists is valid: without it, the overall clone took almost 10 more minutes)

And since I had been getting all this data with 0.4.0, I gathered new data without the python allocator with the master branch.

This paints a rather different picture than the original data on that branch, with much less memory use regression than one would think. In fact, there isn’t much difference, except for the spike at the end, which got worse, and some of the noise during the “Import manifests” phase that got bigger, implying larger amounts of temporary memory used. The latter may contribute to the allocation patterns that throw glibc’s memory allocator off.

It turns out tracking memory usage in python 2.7 is rather painful, and not all the tools paint a complete picture of it. I hear python 3.x is somewhat better in that regard, and I hope it’s true, but at the moment, I’m stuck with 2.7. The most reliable tool I’ve used here, it turns out, is pympler. Or rebuilding the python interpreter without its allocator, and asking the system allocator what is allocated.

With all this data, I now have some defined problems to tackle, some easy (the spike at the end of the clone), and some less easy (working around glibc allocator’s behavior). I have a few hunches as to what kind of allocations are causing the runaway increase of RSS. Coincidentally, I’m half-way through a refactor of the code dealing with manifests, and it should help dealing with the issue.

But that will be the subject of a subsequent post.

Steve Kemp: How I started programming

12 March, 2017 - 05:00

I've written parts of this story in the past, but never in one place and never in much detail. So why not now?

In 1982 my family moved house, so one morning I went to school and at lunch-time I had to walk home to a completely different house.

We moved sometime towards the end of the year, and ended up spending lots of money replacing the windows of the new place. For people in York I was born in Farrar Street, Y010 3BY, and we moved to a place on Thief Lane, YO1 3HS. Being named as it was I "ironically" stole at least two street-signs and hung them on my bedroom wall. I suspect my parents were disappointed.

Anyway the net result of this relocation, and the extra repairs meant that my sisters and I had a joint Christmas present that year, a ZX Spectrum 48k.

I tried to find pictures of what we received but unfortunately the web doesn't remember the precise bundle. All together though we received:

I know we also received Horace and the Spiders, and I have vague memories of some other things being included, including a Space Invaders clone. No doubt my parents bought them separately.

Highlights of my Spectrum-gaming memories include R-Type, Strider, and the various "Dizzy" games. Some of the latter I remember very fondly.

Unfortunately this Christmas was pretty underwhelming. We unpacked the machine, we cabled it up to the family TV-set - we only had the one, after all - and then proceeded to be very disappointed when nothing we did resulted in a successful game! It turns out our cassette-deck was not good enough. Being back in the 80s the shops were closed over Christmas, and my memory is that it was around January before we received a working tape-player/recorder, such that we could load games.

Happily the computer came with manuals. I read one, skipping words and terms I didn't understand. I then read the other, which was the spiral-bound orange book. It contained enough examples and decent wording that I learned to write code in BASIC. Not bad for an 11/12 year old.

Later I discovered that my local library contained "computer books". These were colourful books that promised "The Mystery of Silver Mounter", or "Write your own ADVENTURE PROGRAMS". But were largely dry books that contained nothing but multi-page listings of BASIC programs to type in. Often with adjustments that had to be made for your own computer-flavour (BASIC varying between different systems).

If you want to recapture the magic scroll to the foot of this Osbourne page and you can download them!

Later I taught myself Z80 Assembly Language, partly via the Spectrum manual and partly via such books as these two (which I still own 30ish years later):

  • Understanding your Spectrum, Basic & Machine Code Programming.
    • by Dr Ian Logan
  • An introduction to Z80 Machine Code.
    • R.A & J.W Penfold

Pretty much the only reason I continued down this path is because I wanted infinite/extra lives in the few games I owned. (Which were largely pirated via the schoolboy network of parents with cassette-copiers.)

Eventually I got some of my l33t POKES printed in magazines, and received free badges from the magazines of the day such as Your Sinclair & Sinclair User. For example I was "Hacker of the Month" in the Your Sinclair issue 67 , Page 32, apparently because I "asked so nicely in my letter".

Terrible scan is terrible:

Anyway that takes me from 1980ish to 1984. The only computer I ever touched was a Spectrum. Friends had other things, and there were Sega consoles, but I have no memories of them. Suffice it to say that later when I first saw a PC (complete with Hercules graphics, hard drives, and similar sourcery, running GEM IIRC) I was pleased that Intel assembly was "similar" to Z80 assembly - and now I know the reason why.

Some time in the future I might document how I got my first computer job. It is hillarious. As was my naivete.

John Goerzen: Silent Data Corruption Is Real

12 March, 2017 - 04:34

Here’s something you never want to see:

ZFS has detected a checksum error:

   eid: 138
 class: checksum
  host: alexandria
  time: 2017-01-29 18:08:10-0600
 vtype: disk

This means there was a data error on the drive. But it’s worse than a typical data error — this is an error that was not detected by the hardware. Unlike most filesystems, ZFS and btrfs write a checksum with every block of data (both data and metadata) written to the drive, and the checksum is verified at read time. Most filesystems don’t do this, because theoretically the hardware should detect all errors. But in practice, it doesn’t always, which can lead to silent data corruption. That’s why I use ZFS wherever I possibly can.

As I looked into this issue, I saw that ZFS repaired about 400KB of data. I thought, “well, that was unlucky” and just ignored it.

Then a week later, it happened again. Pretty soon, I noticed it happened every Sunday, and always to the same drive in my pool. It so happens that the highest I/O load on the machine happens on Sundays, because I have a cron job that runs zpool scrub on Sundays. This operation forces ZFS to read and verify the checksums on every block of data on the drive, and is a nice way to guard against unreadable sectors in rarely-used data.

I finally swapped out the drive, but to my frustration, the new drive now exhibited the same issue. The SATA protocol does include a CRC32 checksum, so it seemed (to me, at least) that the problem was unlikely to be a cable or chassis issue. I suspected motherboard.

It so happened I had a 9211-8i SAS card. I had purchased it off eBay awhile back when I built the server, but could never get it to see the drives. I wound up not filling it up with as many drives as planned, so the on-board SATA did the trick. Until now.

As I poked at the 9211-8i, noticing that even its configuration utility didn’t see any devices, I finally started wondering if the SAS/SATA breakout cables were a problem. And sure enough – I realized I had a “reverse” cable and needed a “forward” one. $14 later, I had the correct cable and things are working properly now.

One other note: RAM errors can sometimes cause issues like this, but this system uses ECC DRAM and the errors would be unlikely to always manifest themselves on a particular drive.

So over the course of this, had I not been using ZFS, I would have had several megabytes of reads with undetected errors. Thanks to using ZFS, I know my data integrity is still good.

Enrico Zini: On the meaning of "we"

11 March, 2017 - 20:11

Rather than as a word of endearment, I'm starting to see "we" as a word of entitlement.

In some moments of insecurity, I catch myself "wee"-ing over other people, to claim them as mine.

Jonathan Dowland: Nintendo NES Classic Mini

10 March, 2017 - 18:45

After months of trying, I've finally got my hands on a Nintendo NES Classic Mini. It's everything I wish retropie was: simple, reliable, plug-and-play gaming. I didn't have a NES at the time, so the games are all mostly new to me (although I'm familiar with things like Super Mario Brothers).

NES classic and 8bitdo peripherals

The two main complaints about the NES classic are the very short controller cable and the need to press the "reset" button on the main unit to dip in and out of games. Both are addressed by the excellent 8bitdo Retro Receiver for NES Classic bundle. You get a bluetooth dongle that plugs into the classic and a separate wireless controller. The controller is a replica of the original NES controller. However, they've added another two buttons on the right-hand side alongside the original "A" and "B", and two discrete shoulder buttons which serve as turbo-repeat versions of "A" and "B". The extra red buttons make it look less authentic which is a bit of a shame, and are not immediately useful on the NES classic (but more on that in a minute).

With the 8bitdo controller, you can remotely activate the Reset button by pressing "Down" and "Select" at the same time. Therefore the whole thing can be played from the comfort of my sofa.

That's basically enough for me, for now, but in the future if I want to expand the functionality of the classic, it's possible to mod it. A hack called "Hakchi2" lets you install additional NES ROMs; install retroarch-based emulator cores and thus play SNES, Megadrive, N64 (etc. etc.) games; as well as other hacks like adding "down+select" Reset support to the wired controller. If you were playing non-NES games on the classic, then the extra buttons on the 8bitdo become useful.

Reproducible builds folks: Reproducible Builds: week 97 in Stretch cycle

10 March, 2017 - 15:41

Here's what happened in the Reproducible Builds effort between Sunday February 26 and Saturday March 4 2017:

Upcoming Events

Ed Maste will present Reproducible Builds in FreeBSD at AsiaBSDCon 2017.

Ximin Luo will present Reproducible builds, its uses and the future at Open Source Days in Copenhagen on March 18.

Holger Levsen will give a talk at the German Unix User Group's "Frühjahrsfachgespräch" in Darmstadt, Germany, about Reproducible Builds everywhere on March 23.

Verifying Software Freedom with Reproducible Builds will be presented by Vagrant Cascadian at Libreplanet2017 in Boston, March 25th-26th.

Media coverage

Aspiration Tech published a very detailed report on our Reproducible Builds World Summit 2016 in Berlin.

Reproducible work in other projects

Duncan published a very thorough post on the Rust Programming Language Forum about reproducible builds in the Rust compiler and toolchain.

In particular, he produced a table recording the reproducibility of different build products under different individual variations, totalling 187 build+variation combinations.

Packages reviewed and fixed, and bugs filed

Chris Lamb:

Dhole:

Reviews of unreproducible packages

60 package reviews have been added, 8 have been updated and 13 have been removed in this week, adding to our knowledge about identified issues.

1 issue type has been added:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (3)
diffoscope development

diffoscope 78 was uploaded to unstable and jessie-backports by Mattia Rizzolo. It included contributions from:

  • Chris Lamb:
    • Make tests that call xxd work on jessie again. (Closes: #855239)
    • tests: Move normalize_zeros to more generic utils.data module.
  • Brett Smith:
    • comparators.json: Catch bad JSON errors on Python pre-3.5. (Closes: #855233)
  • Ed Maste:
    • Use BSD-style stat(1) on FreeBSD. (Closes: #855169)

In addition, the following changes were made on the experimental branch:

  • Chris Lamb (4):
    • Tidy cbfs tests.
    • Correct "exercice" -> "exercise" typo.
    • Support newer versions of cbfstool to avoid test failure. (Closes: #856446)
    • Skip icc test that varies on endian if the (Debian-specific) patch is not present. (Closes: #856447)
reproducible-website development
  • anonmos1:
    • Replace root with 0 when giving UIDs/GIDs to GNU tar.
  • Holger Levsen and Chris Lamb:
    • Publish report by Aspiration Tech about RWS Berlin 2016.
tests.reproducible-builds.org
  • Ed Maste continued his work on testing FreeBSD for reproducibility but hasn't reached the magical 100% mark yet.
  • Holger Levsen adjusted the Debian builders scheduling frequency, mostly to adopt to armhf having become faster due to the two new nodes.
Misc.

This week's edition was written by Ximin Luo, Chris Lamb, Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Martín Ferrari: SunCamp happening again this May!

10 March, 2017 - 14:36

As I announced in mailing lists a few days ago, the Debian SunCamp (DSC2017) is happening again this May.

SunCamp different to most other Debian events. Instead of a busy schedule of talks, SunCamp focuses on the hacking and socialising aspect, without making it just a Debian party/vacation.

DSC2016 - Hacking and discussing

The idea is to have 4 very productive days, staying in a relaxing and comfy environment, working on your own projects, meeting with your team, or presenting to fellow Debianites your most recent pet project.

DSC2016 - Tincho talking about Prometheus

We have tried to make this event the simplest event possible, both for organisers and attendees. There will be no schedule, except for the meal times at the hotel. But these can be ignored too, there is a lovely bar that serves snacks all day long, and plenty of restaurants and cafés around the village.

The SunCamp is an event to get work done, but there will be time for relaxing and socialising too.

DSC2016 - Well deserved siesta DSC2016 - Playing Pétanque

Do you fancy a hack-camp in a place like this?

One of the things that makes the event simple, is that we have negotiated a flat price for accommodation that includes usage of all the facilities in the hotel, and optionally food. We will give you a booking code, and then you arrange your accommodation as you please, you can even stay longer if you feel like it!

The rooms are simple but pretty, and everything has been renovated very recently.

We are not preparing a talks programme, but we will provide the space and resources for talks if you feel inclined to prepare one.

You will have a huge meeting room, divided in 4 areas to reduce noise, where you can hack, have team discussions, or present talks.

Do you want to see more pictures? Check the full gallery

Debian SunCamp 2017

Hotel Anabel, LLoret de Mar, Province of Girona, Catalonia, Spain

May 18-21, 2017

Tempted already? Head to the wikipage and register now, it is only 2 months away!

Please try to reserve your room before the end of March. The hotel has reserved a number of rooms for us until that time. You can reserve a room after March, but we can't guarantee the hotel will still have free rooms.

Comment

Steinar H. Gunderson: Tired

10 March, 2017 - 01:28

To be honest, at this stage I'd actually prefer ads in Wikipedia to having ever more intrusive begging for donations. Please go away soon.

Petter Reinholdtsen: Detecting NFS hangs on Linux without hanging yourself...

9 March, 2017 - 21:20

Over the years, administrating thousand of NFS mounting linux computers at the time, I often needed a way to detect if the machine was experiencing NFS hang. If you try to use df or look at a file or directory affected by the hang, the process (and possibly the shell) will hang too. So you want to be able to detect this without risking the detection process getting stuck too. It has not been obvious how to do this. When the hang has lasted a while, it is possible to find messages like these in dmesg:

nfs: server nfsserver not responding, still trying
nfs: server nfsserver OK

It is hard to know if the hang is still going on, and it is hard to be sure looking in dmesg is going to work. If there are lots of other messages in dmesg the lines might have rotated out of site before they are noticed.

While reading through the nfs client implementation in linux kernel code, I came across some statistics that seem to give a way to detect it. The om_timeouts sunrpc value in the kernel will increase every time the above log entry is inserted into dmesg. And after digging a bit further, I discovered that this value show up in /proc/self/mountstats on Linux.

The mountstats content seem to be shared between files using the same file system context, so it is enough to check one of the mountstats files to get the state of the mount point for the machine. I assume this will not show lazy umounted NFS points, nor NFS mount points in a different process context (ie with a different filesystem view), but that does not worry me.

The content for a NFS mount point look similar to this:

[...]
device /dev/mapper/Debian-var mounted on /var with fstype ext3
device nfsserver:/mnt/nfsserver/home0 mounted on /mnt/nfsserver/home0 with fstype nfs statvers=1.1
        opts:   rw,vers=3,rsize=65536,wsize=65536,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=129.240.3.145,mountvers=3,mountport=4048,mountproto=udp,local_lock=all
        age:    7863311
        caps:   caps=0x3fe7,wtmult=4096,dtsize=8192,bsize=0,namlen=255
        sec:    flavor=1,pseudoflavor=1
        events: 61063112 732346265 1028140 35486205 16220064 8162542 761447191 71714012 37189 3891185 45561809 110486139 4850138 420353 15449177 296502 52736725 13523379 0 52182 9016896 1231 0 0 0 0 0 
        bytes:  166253035039 219519120027 0 0 40783504807 185466229638 11677877 45561809 
        RPC iostats version: 1.0  p/v: 100003/3 (nfs)
        xprt:   tcp 925 1 6810 0 0 111505412 111480497 109 2672418560317 0 248 53869103 22481820
        per-op statistics
                NULL: 0 0 0 0 0 0 0 0
             GETATTR: 61063106 61063108 0 9621383060 6839064400 453650 77291321 78926132
             SETATTR: 463469 463470 0 92005440 66739536 63787 603235 687943
              LOOKUP: 17021657 17021657 0 3354097764 4013442928 57216 35125459 35566511
              ACCESS: 14281703 14290009 5 2318400592 1713803640 1709282 4865144 7130140
            READLINK: 125 125 0 20472 18620 0 1112 1118
                READ: 4214236 4214237 0 715608524 41328653212 89884 22622768 22806693
               WRITE: 8479010 8494376 22 187695798568 1356087148 178264904 51506907 231671771
              CREATE: 171708 171708 0 38084748 46702272 873 1041833 1050398
               MKDIR: 3680 3680 0 773980 993920 26 23990 24245
             SYMLINK: 903 903 0 233428 245488 6 5865 5917
               MKNOD: 80 80 0 20148 21760 0 299 304
              REMOVE: 429921 429921 0 79796004 61908192 3313 2710416 2741636
               RMDIR: 3367 3367 0 645112 484848 22 5782 6002
              RENAME: 466201 466201 0 130026184 121212260 7075 5935207 5961288
                LINK: 289155 289155 0 72775556 67083960 2199 2565060 2585579
             READDIR: 2933237 2933237 0 516506204 13973833412 10385 3190199 3297917
         READDIRPLUS: 1652839 1652839 0 298640972 6895997744 84735 14307895 14448937
              FSSTAT: 6144 6144 0 1010516 1032192 51 9654 10022
              FSINFO: 2 2 0 232 328 0 1 1
            PATHCONF: 1 1 0 116 140 0 0 0
              COMMIT: 0 0 0 0 0 0 0 0

device binfmt_misc mounted on /proc/sys/fs/binfmt_misc with fstype binfmt_misc
[...]

The key number to look at is the third number in the per-op list. It is the number of NFS timeouts experiences per file system operation. Here 22 write timeouts and 5 access timeouts. If these numbers are increasing, I believe the machine is experiencing NFS hang. Unfortunately the timeout value do not start to increase right away. The NFS operations need to time out first, and this can take a while. The exact timeout value depend on the setup. For example the defaults for TCP and UDP mount points are quite different, and the timeout value is affected by the soft, hard, timeo and retrans NFS mount options.

The only way I have been able to get working on Debian and RedHat Enterprise Linux for getting the timeout count is to peek in /proc/. But according to Solaris 10 System Administration Guide: Network Services, the 'nfsstat -c' command can be used to get these timeout values. But this do not work on Linux, as far as I can tell. I asked Debian about this, but have not seen any replies yet.

Is there a better way to figure out if a Linux NFS client is experiencing NFS hangs? Is there a way to detect which processes are affected? Is there a way to get the NFS mount going quickly once the network problem causing the NFS hang has been cleared? I would very much welcome some clues, as we regularly run into NFS hangs.

Pages

Creative Commons License ลิขสิทธิ์ของบทความเป็นของเจ้าของบทความแต่ละชิ้น
ผลงานนี้ ใช้สัญญาอนุญาตของครีเอทีฟคอมมอนส์แบบ แสดงที่มา-อนุญาตแบบเดียวกัน 3.0 ที่ยังไม่ได้ปรับแก้