2014-12-16

The 2038 problem

I was inspired - perhaps that's not quite the right word - by this article on the Year 2038 bug in the Daily Mail:

Will computers be wiped out on 19 January 2038? Outdated PC systems will not be able to cope with time and date, experts warn Psy's Gangnam Style was recently viewed so many times on YouTube that the site had to upgrade the way figures are shown on the site.
  1. The site 'broke' because it runs on a 32-bit system, which uses four-bytes
  2. These systems can only handle a finite number of binary digits
  3. A four-byte format assumes time began on 1 January, 1970, at 12:00:00
  4. At 03:14:07 UTC on Tuesday, 19 January 2038, the maximum number of seconds that a 32-bit system can handle will have passed since this date
  5. This will cause computers to run negative numbers, and dates [sic]
  6. Anomaly could cause software to crash and computers to be wiped out
I've numbered the points for ease of reference. Let's explain to author Victoria Woollaston (Deputy Science and Technology editor) where she went wrong. The starting axiom is that you can represent 4,294,967,296 distinct numbers with 32 binary digits of information.

1. YouTube didn't (as far as I can see) "break".

Here's the original YouTube post on the event on Dec 1st:

We never thought a video would be watched in numbers greater than a 32-bit integer (=2,147,483,647 views), but that was before we met PSY. "Gangnam Style" has been viewed so many times we had to upgrade to a 64-bit integer (9,223,372,036,854,775,808)!
When they say "integer" they mean it in the correct mathematical sense: a whole number which may be negative, 0 or positive. Although 32 bits can represent 4bn+ numbers as noted above, if you need to represent negative numbers as well as positive then you need to reserve one of those bits to represent that information (all readers about to comment about two's complement representation can save themselves the effort, the difference isn't material.) That leaves you just over 2bn positive and 2bn negative numbers. It's a little bit surprising that they chose to use integers rather than unsigned (natural) numbers as negative view counts don't make sense but hey, whatever.
Presumably they saw Gangnam Style reach 2 billion views and decided to pre-emptively upgrade their views field from signed 32 bit to signed 64 bit. This is likely not a trivial change - if you're using a regular database, you'd do it via a schema change that requires reprocessing the entire database, and I'd guess that YouTube's database is quite big but it seemed to be in place by the time we hit the signed 32 bit integer limit.

2. All systems can only handle a finite number of binary digits.

For fuck's sake. We don't have infinite storage anywhere in the world. The problem is that the finite number of binary digits (32) in 4-byte representation is too small. 8 byte representation has twice the number of binary digits (64, which is still finite) and so can represent many more numbers.

3. The number of bytes has no relationship to the information it represents.

Unix computers (Linux, BSD, OS X etc.) represent time as seconds since the epoch. The epoch is defined as 00:00:00 Coordinated Universal Time (UTC - for most purposes, the same as GMT), Thursday, 1 January 1970. The Unix standard was to count those seconds in a 32 bit signed integer. Now it's clear that 03:14:08 UTC on 19 January 2038 will see that number of seconds exceed what can be stored in a 32 bit signed integer, and the counter will wrap around to a negative number. What happens then is anyone's guess and very application dependent, but it's probably not good.
There is a move towards 64-bit computing in the Unix world, which will include migration of these time representations to 64 bit. Because this move is happening now, we have 23 years to complete it before we reach our Armageddon date. I don't expect there to be many 32 bit systems left operating by then - their memory will be rotted, their disk drives stuck. Only emulated systems will be still working, and everyone knows about the 2038 problem.

4. Basically correct, if grammatically poor

5. Who taught you English, headline writer?

As noted above, what will actually happen on the date in question is heavily dependent on how each program using the information behaves. The most likely result is a crash of some form, but you might see corruption of data before that happens. It won't be good. Luckily it's easy to test programs by just advancing the clock forwards and seeing what happens when the time ticks over. Don't try this on a live system, however.

6. Software crash, sure. Computer being "wiped out"? Unlikely

I can see certain circumstances where a negative date could cause a hard drive to be wiped, but I'd expect it to be more common for hard drives to be filled up - if a janitor process is cleaning up old files, it'll look for files with modification time below a certain value (say, all files older than 5 minutes ago). Files created before the positive-to-negative date point won't be cleaned up by janitors running after that point. So we leave those stale files lying around, but files created after that will still be eligible for clean-up - they have a negative time which is less than the janitor's negative measurement point.

I'm sure there will be date-related breakage as we approach 2038 - if a bank system managers 10 year bonds, then we will see breakage as their expiry time goes past january 2038, so the bank will see breakage in 2028. But hey, companies are already selling 50 year bonds so bank systems have had to deal with this problem already.

Thank goodness that I can rely on the Daily Mail journalists' expertise in all the articles that I don't actually know anything about.

2014-12-08

2014-12-05

Whoda thunk? An actual piece of journalism on the University of Virginia "frat house gang rape" story

It seems as if the wheels are coming off Sabrina Rubin Erdely's story in Rolling Stone of gang rape on the University of Virginia's campus.

In the face of new information, there now appear to be discrepancies in Jackie's account, and we have come to the conclusion that our trust in her [my italics] was misplaced. [...] We are taking this seriously and apologize to anyone who was affected by the story.
That's certainly a novel way of writing "our unquestioning acceptance of her decidedly dodgy tale" and "had their reputations dragged through the dirt in the national media".

My favourite wonk, Megan McArdle, has a must-read piece on how this happened and how the crazy rush to publish a decidedly dodgy and unverified story has been one of the worst things to happen to real campus rape victims in a long time:

So now the next time a rape victim tells her story to a journalist, they will both be trying to reach an audience that remembers the problems with this article, and the Duke lacrosse case, and wonders if any of these stories are ever true. That inference will be grotesquely false, but it is the predictable result of accepting sensational stories without carefully checking. The greatest damage this article has done is not to journalism, or even to Rolling Stone. It is to the righteous fight for rape victims everywhere.
Go read the whole thing, and despair at the media environment that splashed Erdely's story over the national news but will fail to discuss the points in McArdle's article in anything but the most oblique terms.

2014-11-26

Unexpected consequences of Obamacare and immigration amnesties

I'm not sure why this hasn't generated more outrage yet: the Washington Times has spotted that President Obama's plan to legalize employment for illegal immigrants might screw over American workers even more than initially suspected:

President Obama's temporary amnesty, which lasts three years, declares up to 5 million illegal immigrants to be lawfully in the country and eligible for work permits, but it still deems them ineligible for public benefits such as buying insurance on Obamacare's health exchanges.
Seems sensible enough, although the amnesty beneficiaries might well be eligible for the Earned Income Tax Credit if they have kids. But there's a consequence for the lack of health exchange rights:
Under the Affordable Care Act, that means businesses who hire them won't have to pay a penalty for not providing them health coverage [my emphasis] — making them $3,000 more attractive than a similar native-born worker, whom the business by law would have to cover.
Oopsie. Since the immigrants will tend to participate in the lower-paid end of the employment spectrum, that means the $3000 delta will be a huge fraction of the wage. That's quite the competitive advantage. Sure, it means in practice that they won't have ACA-compliant health care - and in fact I'd expect many employers to pay their amnestied workers a higher headline wage to compensate for this lack of employer-supported healthcare. Nevertheless, once it's legal to employ these workers openly, the wage differential makes them look very attractive.

This won't affect unionized jobs where wages can't easily be varied, but in the private sector the medium-sized businesses who have more than 50 employees will start sucking up all the amnestied labor they can and will stop hiring the locals. Small businesses which have pushed workers into part-time slots to avoid the ACA can now replace two part-time workers with a full-time amnestied worker.

This is what happens when you create a baroque, complicated legal framework for employment and health insurance. When you subsequently make changes, you will find that they have unexpected effects.

2014-11-23

Anatomy of a timeshare sale

Dear readers, the things I do on your behalf. Herewith my notes from participating in a recent timeshare sales session which was the condition of a fairly well discounted holiday which my partner and I recently enjoyed.

The vacation property itself was very pretty - manicured lawns, artfully trimmed flowering bushes and a background of blue skies and the sound of crashing waves. The sales office itself was tucked away in a corner of the imposing main clubhouse, presumably because once you’re an owner you don’t like to be reminded of how and where they got you. It was a reasonably high traffic operation, several other couples there waiting or coming through - note that there were no singles, only couples. I'd guess they’re maximising their chances of finding a weak spot and then leveraging it to pressure the other party. Divide and conquer FTW!

The waiting room had the usual free beverages to enjoy for the few minutes we were waiting. Coffee was from a press-top urn and was awful. Normally I'm OK with urn coffee in a pinch, but my goodness this stuff was dreadful; I had to fall back to Lipton tea. This was scheduled to be a 2 hour session so my tolerance for coffee absence would be tested to its limit.

I'll call our sales rep "Nick", who was audibly a New Yorker. He led us down to his office and the presentation started after a few minutes of soft soap "how was your vacation so far? what have you enjoyed?" which was fairly obviously an intelligence-gathering exercise.

Nick started the sell emphasising that this was not a high pressure sales session. He then described the "price integrity" of his company, that they never discounted or negotiated on price (yeah, sure, you betcha snookums) and referenced back to how much we'd enjoyed the holiday so far to stimulate the guilt gland. He then noted the extra financial incentives if we bought right now, today, with a yes/no decision at the end of the session. What was that about "no high pressure sales", Nick? He outlined our aim today which was to decide whether our future vacations would be better with or without TIMESHARECO ownership, which was studiously neutral so far. At the end of the session we would be meeting with the company inventory manager for details on prices, incentives etc.

About 10-15 minutes in and Nick took a break to "get some water". Presumably this was to check with his boss on his boss's read on the situation so far. I didn't think to check for a video or audio monitor in the office; nothing was obvious, and I'm guessing that there wasn't any eavesdropping going on. Certainly nothing subsequently made me suspect that.

Nick started the next session reviewing our past holidays and latched on to our holiday last year as similar to the kind of thing he was selling. He asked us to name our "dream" 3-5 money-no-object vacations which we did. He picked out quality as a factor in our holidays and started talking numbers on room prices, picking a $200/night base price.

We learned after casual conversation from me that he had retired from a job as a retirement plan sales manager, but had come back into the timeshare sales game after a couple of years. In light of the later discussions, this made a lot of sense. He likened the scheme he was selling as a "401(k)" (money purchase pension scheme) for holidays - invest money and get a steady yield of vacations.

During the meeting he took very short but effective notes on a single sheet of paper, only a few words per concept; around now he read back to us a summary of what he'd noted, and pretty much nailed everything. I was very impressed at his technical skill. I also approved of the strategic placing of his office with a genuinely lovely garden and waterfall view - he sat with his back to it, so it clearly wasn't intended for his benefit. I bet the room views aren't like that (except for the show rooms.)

Now we come on to the numbers. He was trying to sell on the basis of 7 days stay, $200/night, over 20 years - that if we did this with his company then it would be cheaper than renting a hotel room each year. He presented a table showing cost of hotel rooms in brackets - but quoting in non-constant dollars. The chart spanned 40 years - so starting from the mid-1970 when 11% annual inflation was the average - but actually only 7% over past 10 years (I did the math). Later, checking the US inflation calculator it's clear that 1974-1984 is by far the steepest inflation decade of the past 40 year - 110% compared to 42% (1984-1994), 27% (1994-2004) or 25% (2004-2014).

I innocently asked him "but aren't wages inflating too, so shouldn't this be expressed in constant dollars or at least expressed in terms of purchasing power? And aren't hotel prices determined by supply/demand - what you can persuade people to pay, not what your costs are, so heavily influenced by wages?" at which point he pretended confusion. I also asked why he was looking at a 40 year basis when we were talking about a future span of 20 years, which met with a similar response.

Now it makes sense why he used to sell retirement plans... he's essentially selling a financial plan. He's saying that if we give TIMESHARECO about 20 grand then they can invest it in property and meet the cost of our stays over the next 20 years while presumably turning a small profit including his commission. And yet, they can't persuade the major financial establishments to make the same investments and profit directly. I wonder why?

Now the "here's all the places you can stay!" list. About 70 locations in 10 countries - not a massive amount, but they have "affiliates" in 100 countries with over 5000 resorts you can stay at. Minimum of 3 nights per stay, no max, which seems reasonable. With your purchase of the plan you get X points per year to spend on properties, and can transfer points between years. It costs $100 to carry forward non-spent points, but $0 to borrow them from future years - cheaper to take a loan than save up. What's wrong with this picture? It means that they want the additional money they get from you actually staying, of which more later.

We toured through photo sets of properties in countries we might visit, though only TIMESHARECO properties not affiliates - which was a nice sleight of hand. Apparently TIMESHARECO "reviews" the quality of the affiliates to ensure they're up to scratch. I'm sure you're as reassured as I was. It's a first-come first-served model for all properties. Nick claimed that there was a low probability of all affiliate properties being full in an area even in busy time e.g. spring break but didn't address TIMESHARECO numbers directly. So they almost certainly have a problem with availability during this times. Affiliates charges $200 per booking which is a nice little earner and pushes you towards fewer, longer holidays in affiliates.

He gave us a brochure for the affiliate program: RCI. According to their SSL cert information they are Wyndham Worldwide Corporation based in Parsippany, New Jersey. Their stock is up about 15% y/y so clearly the timeshare business is doing well out of the boom.

Nick took another break, this time more extended than the previous one, presumably to allow replanning of his sales approach. I couldn't help but notice that he didn't offer us a refill of our beverages.

He mentioned in passing that there was also a maintenance fee which covers insurance for the property, in response to an earlier question I had about "what if the property we buy rights to burns down?" We fenced for a few minutes, then 70 mins after the start of the discussion he gave up, said that we didn't have to tour the property if we didn't want to - we didn't - and handed over the bonus gifts that we were due to receive at the end of the property. He did try a last gasp attempt with vacation offer similar to what we had already enjoyed, with another timeshare presentation linked in. I'm sure that if we'd taken this up then we'd have been lined up with their Top Gun negotiator. But we said no thanks, and left.

Overall a fascinating view into the world of timeshare sales. I didn't feel in any danger of buying at any point, but I give Nick his due that he tried very hard and used most of the tricks in the book without resorting to what I'd regard as "high pressure" sales. Perhaps the fact that I was taking notes alarmed him a little; he emphasised at the start that he'd give us all the items discussed in writing, but of course with us leaving before closing this didn't happen (if it would have happened). Credit to him that he recognised when he was beaten and didn't waste our time or his beyond that point. It also turned out to be remarkably easy to elicit information about him and divert him off course for a few minutes. Presumably this was because he thought that he was making a social connection and common ground.

The offer itself of course was completely overpriced - I checked out the secondary market in TIMESHARECO properties and they were a) heavily discounted, around 60% of face value and b) not selling, though of course these are related and just give you a ballpark idea of the market clearing price. The annual maintenance cost was around $1300 - i.e. the same as 6 nights of hotel stays at $200/night. If you buy in the primary market, you are a total mug or you have lots of money, the holiday model fits you and you don't mind paying a healthy excess for the convenience.

2014-11-12

Lipstick on a postal pig

I can't help but share this lunacy with you. The (American) Center For Economic and Policy Research thinks that the problem with the US Postal Service isn't the lackadaisical, contemptuous, inefficient distribution of mail which it perpetrates. It's just not properly utilized. Instead, we should allow it to run banking services at the same efficiency with which it delivers mail:

[...] the Postal Service could improve its finances by expanding rather than contracting. Specifically, it can return to providing basic banking services, as it did in the past and many other postal systems still do. This course has been suggested by the Postal Service's Inspector General.
This route takes advantage of the fact that the Postal Service has buildings in nearly every neighborhood in the country. These offices can be used to provide basic services to a large unbanked population that often can't afford fees associated with low balance accounts. As a result they often end up paying exorbitant fees to check cashing services, pay day lenders and other non-bank providers of financial services.
Of course, the reason that banks have run a mile from providing banking services to clients with low income or dubious immigration status, running away from a steady (albeit low) income stream, is due to... government regulatory pressure. Who'd have thought that the government would have caused these problems?

Now the CEPR is proposing that a government agency can step in and fix the very real problems in banking access that other government agencies have created. I don't know whether to laugh or cry.

Incidentally, my personal experience with sending mail through the USPS - a monthly mail to a residential address within the same state, dropped in a regular post box - is that the failure rate is about 1 in 13. This is corroborated by the experience of The Advice Goddess (Los Angeles resident Amy Alkon, if you're not reading her blog or buying her books then you really should):

There is no way that the USPS could comply with the existing banking regulations in the USA without having the same order of overhead as the major US banks. I suspect their savings in property costs are insignificant; even if they could train existing post office counter staff to be bank tellers as well without any major salary inflation, all the backend systems and personnel required would kill their cost advantage. Check out the USPS compensation and benefits: "regular salary increases" means you're paid by length of service, not productivity, they get federal health benefits which are a step or three above Obamacare coverage, and they get a defined benefit retirement plan. Believe me, if you're staff at a major bank, you would sell your mother on the streets to get these benefits.

All the CEPR is doing in this article is lobbying for an increase in (unionized) federal government employees. The government, and therefore the taxpayer, is going to pick up the tab, but that's Just Fine with them. The only way I can see this working is if the USPS is exempted from most of the existing banking regulations - and if that's the problem, why not just repeal them for everyone else as well?

2014-11-04

A caricature of Civil Service placement and rhetoric

The new director of GCHQ was announced earlier this year as Robert Hannigan, CMG (Cross of St Michael and St George, aka "Call Me God") replacing the incumbent Sir Iain Lobban, KCMG (Knight's Cross of St Michael and St George, aka "Kindly Call Me God"). Whereas Sir Iain was a 30 year veteran of GCHQ, working his way up from a language specialist post, Hannigan was an Oxford classicist - ironically at Wadham, one of the few socialist bastions of the university - and worked his way around various government communications and political director posts before landing a security/intelligence billet at the Cabinet office. Hannigan is almost a cliché of the professional civil servant.

Hannigan decided to write in the FT about why Facebook, Twitter and Google increasing user security was a Bad Thing:

The extremists of Isis use messaging and social media services such as Twitter, Facebook and WhatsApp, and a language their peers understand. The videos they post of themselves attacking towns, firing weapons or detonating explosives have a self-conscious online gaming quality. [...] There is no need for today’s would-be jihadis to seek out restricted websites with secret passwords: they can follow other young people posting their adventures in Syria as they would anywhere else.
Right - but the UK or US governments can already submit requests to gain access to specific information stored by Facebook, Google, Twitter et al. What Hannigan leaves out is: why is this not sufficient? The answer, of course, is that it's hard to know where to look. Far easier to cast a dragnet through Internet traffic, identify likely sources of extremism, and use intelligence based on their details to ask for specific data from Facebook, Google, Twitter et al. But for the UK in the first half of 2014, the UK issued over 2000 individual requests for data, covering an average of 1.3 people per request. How many terrorism-related arrests (never mind convictions) correspond to this - single digits? That's a pretty broad net for a very small number of actual offenders.

Hannigan subsequently received a bitchslap in Comment is Free from Libdem Julian Huppert:

Take the invention of the radio or the telephone. These transformed the nature of communication, allowing people to speak with one another across long distances far more quickly than could have ever been imagined. However, they also meant that those wishing to do us harm, whether petty criminals or terrorists, could communicate with each other much more quickly too. But you wouldn’t blame radio or phone manufacturers for allowing criminals to speak to each other any more than you would old Royal Mail responsible for a letter being posted from one criminal to another.
Good Lord, I'm agreeing with a Libdem MP writing in CiF. I need to have a lie down.

Hannigan is so dangerous in his new role because he's never really had to be accountable to voters (since he's not a politician), nor influenced by the experience and caution of the senior technical staff in GCHQ (since he never worked there). He can view GCHQ as a factory for producing intelligence to be consumed by the civil service, not as a dangerous-but-necessary-in-limited-circumstances intrusion into the private lives of UK citizens. After all, he knows that no-one is going to tap his phone or read his email.

Personally, I'd like to see a set of 10 MPs, selected by public lottery (much like the National Lottery draw, to enforce fairness) read in on GCHQ and similar agency information requests. They'd get to see a monthly summary of the requests made and information produced, and would be obliged to give an annual public report (restricted to generalities, and maybe conducted 6 months in arrears of the requests to give time for data to firm up) on their perception of the width of the requests vs information retrieved. That's about 40 Facebook personal data trawls per MP, which is a reasonably broad view of data without excessive work. Incidentally, I'd also be interested in a breakdown of the immigration status of the people under surveillance.

Mazzucato and her State-behind-the-iPhone claims

This caught my eye in the Twitter feed of Mariana "everything comes from the State" Mazzucato:

The box claiming that "microprocessor" came from DARPA didn't sound right to me, so I did some digging.

Sure enough, DARPA appears to have had squat all to do with the development of the first microprocessors:

Three projects delivered a microprocessor at about the same time: Garrett AiResearch's Central Air Data Computer (CADC), Texas Instruments (TI) TMS 1000 (1971 September), and Intel's 4004 (1971 November).
I don't know about the CADC, but Tim Jackson's excellent book "Inside Intel" is very clear that the 4004 was a joint Intel-Busicom innovation, DARPA wasn't anywhere to be seen, TI's TMS 1000 was similarly an internal evolutionary development targeted at a range of industry products.

Looking at a preview of Mazzucato's book via Amazon, it seems that her claims about state money being behind the microprocessor are because the US government funded the SEMATECH semiconductor technology consortium with $100 million per year. Note that SEMATECH was founded in 1986 by which point we already had the early 68000 microprocessors, and the first ARM designs (from the UK!) appeared in 1985. Both of these were recognisable predecessors of the various CPUs that have appeared in the iPhone - indeed up to the late iPhone 4 models they used an ARM design.

I'm now curious about the other boxes in that diagram. The NAVSTAR/GPS and HTML/HTTP claims seem right to me, but I wonder about DARPA's association with "DRAM cache" - I'd expect that to come from Intel and friends - and "Signal compression" (Army Research Office) is so mind-meltingly vague a topic that you could claim nearly anyone is associated with it - the Motion Picture Experts Group who oversee the MPEG standards have hundreds of commercial and academic members. If Mazzucato's premise is that "without state support these developments would never have happened" then it's laughably refutable.

At this point I'm very tempted to order Mazzucato's book The Entrepreneurial State for the sole purpose of finding out just how misleading it is on this subject that happen to know about, and thus a measure of how reliable it is for the other parts I know less about.

Update: it seems that associating the DoE (US Department of Energy) with the lithium-ion battery is also something of a stretch. The first commercial lithium-ion battery was released by Sony and Asahi Kasei in Japan. The academic work leading up to it started with an Exxon-funded researcher in the early 70s . The only DofE link I can find is on their Vehicle Technologies Office: Batteries page and states:

This research builds upon decades of work that the Department of Energy has conducted in batteries and energy storage. Research supported by the Vehicle Technologies Office led to today's modern nickel metal hydride batteries, which nearly all first generation hybrid electric vehicles used. Similarly, the Office's research also helped develop the lithium-ion battery technology used in the Chevrolet Volt, the first commercially available plug-in hybrid electric vehicle.
That's a pretty loose connection. I suspect, since they specifically quote the Volt, that the DofE provided money to Chevrolet for research into the development of batteries for their cars, but the connection between the Volt and the iPhone battery is... tenuous.

For fuck's sake, Mariana. You could have had a reasonably good point by illustrating the parts of the iPhone that were fairly definitively state-funded in origin, but you had to go the whole hog and make wild, spurious and refutable claims just to bolster the argument, relying on most reviewers not challenging you because of your political viewpoint and on most readers not knowing better. That's pretty despicable.

2014-10-22

State-endorsed web browsers turn out to be bad news

Making the headlines in the tech world this week has been evidence of someone trying to man-in-the-middle Chinese iCloud users:

Unlike the recent attack on Google, this attack is nationwide and coincides with the launch today in China of the newest iPhone. While the attacks on Google and Yahoo enabled the authorities to snoop on what information Chinese were accessing on those two platforms, the Apple attack is different. If users ignored the security warning and clicked through to the Apple site and entered their username and password, this information has now been compromised by the Chinese authorities. Many Apple customers use iCloud to store their personal information, including iMessages, photos and contacts. This may also somehow be related again to images and videos of the Hong Kong protests being shared on the mainland.
MITM attacks are not a new phenomenon in China but this one is widespread, and clearly needs substantial resources and access to be effective. As such, it would require at least government complicity to organise and implement.

Of course, modern browsers are designed to avoid exactly this problem. This is why the Western world devotes so much effort to implementing and preserving the integrity of the "certificate chain" in SSL - you know you're connecting to your bank because the certificate is signed by your bank, and the bank's signature is signed by a certificate authority, and your browser already knows what the certificate authority's signature looks like. But it seems that in China a lot of people use Qihoo 360 web browser. It claims to provide anti-virus and malware protection, but for the past 18 months questions have been asked about its SSL implementation:

If your browser is either 360 Safe Browser or Internet Explorer 6, which together make up for about half of all browsers used in China, all you need to do is to click continue once. You will see no subsequent warnings. 360's so-called "Safe Browser" even shows a green check suggesting that the website is safe, once you’ve approved the initial warning message.

I should note, for the sake of clarity, that both the 2013 and the current MITM reports come from greatfire.org, whose owners leave little doubt that they have concerns about the current regime in China. A proper assessment of Qihoo's 360 browser would require it to be downloaded on a sacrificial PC and used to check out websites with known problems in their SSL certificates (e.g. self-signed, out of date, being MITM'd). For extra points you'd download it from a Chinese IP. I don't have the time or spare machine to test this thoroughly, but if anyone does then I'd be interested in the results.

Anyway, if the browser compromise checks out then I'm really not surprised at this development. In fact I'm surprised it hasn't happened earlier, and wonder if there have been parallel efforts at compromising IE/Firefox/Opera/Chrome downloads in China: it would take substantial resources to modify a browser installer to download and apply a binary patch to the downloaded binary which allowed an additional fake certificate authority (e.g. the Chinese government could pretend to be Apple), and more resources to keep up to date with browser releases so that you could auto-build the patch shortly after each new browser version release, but it's at least conceivable. But if you have lots of users of a browser developed by a firm within China, compromising that browser and its users is almost as good and much, much easier.

2014-10-13

Corporate welfare from Steelie Neelie and the EU

I used to be the starry-eyed person who thought that governments pouring into a new concept for "research" was a good thing. That didn't last long. Now I read The Reg on the EU's plan to chuck 2.5 billion euros at "Big Data" "research" and wonder why, in an age of austerity, the EU thinks that pissing away the entire annual defence budget of Austria is a good idea.

First, a primer for anyone unfamiliar with "Big Data". It's a horrendously vague term, as you'd expect. The EU defines the term thus:

Big data is often defined as any data set that cannot be handled using today’s widely available mainstream solutions, techniques, and technologies.
Ah, "mainstream". What does this actually mean? It's a reasonable lower bound to start with what's feasible on a local area network. If you have a data set with low hundreds of terabytes of storage, you can store and process this on some tens of regular PCs; if you go up to about 1PB (petabyte == 1024 terabytes, 1 terabyte is the storage of a regular PC hard drive) then you're starting to go beyond what you can store and process locally, and need to think about someone else hosting your storage and compute facility.

Here's an example. Suppose you have a collection of overhead imagery of the United Kingdom, in the infra-red spectrum, sampled at 1m resolution. Given that the UK land area is just under 250 thousand square kilometers, if you represent this in an image with 256 levels of intensity (1 byte per pixel) you'll need 250,0000 x (1000 x 1000) = 250 000 000 000 pixels or 250 gigabytes of storage. This will comfortably fit on a single hard drive. If you reduce this to 10cm resolution - so that at maximum resolution your laptop screen of 1200 pixel width will show 120m of land - then you're looking at 25 TB of data, so you'll need a network of tens of PCs to store and process it. If, instead of a single infra-red channel, you have 40 channels of different electromagnetic frequencies, from low infra-red up to ultra violet, you're at 1PB and need Big Data to solve the problem of processing the data.

Another example, more privacy-concerning: if you have 1KB of data about each of the 7bn people in the world (say, their daily physical location over 1 year inferred from their mobile phone logs), you'll have 7 terabytes of information. If you have 120 KB of data (say, their physical location every 10 minutes) then this is around 1PB and approaches the Big Data limits.

Here's the press release:

Mastering big data could mean:
  • up to 30% of the global data market for European suppliers;
  • 100,000 new data-related jobs in Europe by 2020;
  • 10% lower energy consumption, better health-care outcomes and more productive industrial machinery.
My arse, but let's look at each claim in turn.
  • How is this project going to make it more likely for European suppliers to take over more of the market? Won't all the results of the research be public? How, then, will a European company be better placed to take advantage of them than a US company? Unless one or more US-based international company has promised to attribute a good chunk of its future Big Data work to its European operations as an informal quid-pro-quo for funding from this pot.
  • As Tim Worstall is fond of saying, jobs are a cost not a benefit. These need to be new jobs that are a prerequisite for larger Big Data economic gains to be realized, not busywork to meet artificial Big Data goals
  • [citation required] to quote Wikipedia. I'll believe it when I see it measured by someone without financial interest in the Big Data project.

The EU even has a website devoted to the topic: Big Data Value. Some idea of the boondoggle level of this project can be gleaned from the stated commitment:

... to build a data-driven economy across Europe, mastering the generation of value from Big Data and creating a significant competitive advantage for European industry, boosting economic growth and jobs. The BDV PPP will commence in 2015[,] start with first projects in 2016 and will run until 2020. Covering the multidimensional character of Big Data, the PPP activities will address technology and applications development, business model discovery, ecosystem validation, skills profiling, regulatory and IPR environment and social aspects.
So how will we know if these 2.5bn Euros have been well spent? Um. Well. Ah. There are no deliverables specified, no ways that we can check back in 2020 to see if the project was successful. We can't even check in 2017 whether we're making the required progress, other than verifying that the budget is being spent at the appropriate velocity - and believe me, it will be.

The fundamental problem with widespread adoption of Big Data is that you need to accumulate the data before you can start to process it. It's surprisingly hard to do this - there really isn't that much new data generated in most fields and you can do an awful lot if you have reasonably-specced PCs on a high-speed LAN. Give each PC a few TB in storage, stripe your data over PCs for redundancy (not vulnerable to failure of a single drive or PC) and speed, and you're good to go. Even if you have a huge pile of storage, if you don't have the corresponding processing power then you're screwed and you'll have to figure out a way of copying all the data into Amazon/Google/Azure to allow them to process it.

Images and video are probably the most ripe field for Big Data, but still you can't avoid the storage/processing problem. If you already have the data in a cloud storage provider like Amazon/Google/Azure, they likely already have the processing models for your data needs; if you don't, where are all the CPUs you need for your processing? It's likely that the major limitations processing Big Data in most companies is appropriate reduction of the data to a relatively small secondary data set (e.g. processing raw images into vectors via edge detection) before sending it somewhere for processing.

The EU is about to hand a couple billion euros to favoured European companies and university research departments, and it's going to get nine tenths of squat all out of it. Mark my words, and check back in 2020 to see what this project has produced to benefit anyone other than its participants.