Debashis Basu: Data science for Smart Cities

Wed, 2016-04-27 10:42 -- SCC India Staff

On March 28 this year, Prime Minister Narendra Modi made an interesting speech at the Bloomberg India Economic Forum 2016. It was praised even by his critics for laying out an impressive array of facts and strategies. Since it was a Bloomberg event, the PM made it a point to mention that "I am grateful for the valuable advice that we have received from Mr Michael Bloomberg in the design of our Smart Cities programme. As mayor of one of the world's great cities, Mr Bloomberg has personal insight into what makes a city tick." We don't know what advice Mr Bloomberg gave Mr Modi, but it is worth recalling how he pioneered a statistical approach to governing cities through predictive analytics, as narrated by Gillian Tett in her book The Silo Effect. Readers can judge whether Mr Bloomberg's strategy can be applied anywhere in India, despite Mr Modi's best intentions.

When Michael Bloomberg was the mayor of New York City, Michael Flowers was offered a job to investigate financial fraud. Mr Flowers, instead, offered to work with the data that New York City collects to solve some of its big problems. While serving in Iraq he had seen how intelligent use of data could predict bomb attacks. In New York City Hall, one of the first things Mr Flowers did was to place an advertisement in Craigslist for young data crunchers. This was the first time anyone was being recruited that way for the City Hall. He put the recruits in a downtown warehouse asking them to crunch data that was being collected about fire risk. The idea was to hit upon anything that might predict when fires would break out next. The Fire Department data was of no use. Mr Flowers then asked his "kids" as he called them, to go on a ride with the inspectors from the different departments (sheriff's office, police, fire, housing, and building), instructing them to be humble - and keep an open mind about what might help predict fires. They listened to firemen, policemen, various inspectors - and a pattern emerged.

The majority of fire-prone buildings seem to have been built before 1938. Why 1938? Because that was the year building codes were tightened in New York. These buildings were also usually in poor localities, and had generated complaints about some issue or the other. Armed with these initial clues, the "kids" hunted for targeted data now. But while 40-odd agencies in New York City had been collecting data for decades, "the data was held in dozens of different databases, since not only were the agencies separated from each other, but there were subdivisions within the agencies. The numbers as crazily fragmented as the people," writes Ms Tett in her book. The data of vulnerable buildings were scattered across multiple departments.

Statistical techniques now being glamourised as "data sciences" solved this problem. The geeky "kids" under Mr Flowers first created a subset of 640,000 houses in the New York area that were registered to hold one to three families. Second, they trawled through the data about house fires and illegal conversion complaints from the fire department. Third, they collected the data on investigations for tax and mortgage defaults from the department of finance. Fourth, they secured the data on list of properties built before 1938 from the building department. Finally, they combined all this data of vulnerable buildings assembled from multiple departments and put them through a statistical model. The results were striking. Whenever a building had the features of second (fires and illegal conversion), third (investigations) and fourth (built before 1938) set of factors collected by them, there was a higher incidence of fires.

Mr Flowers took the data to the building department inspectors. The inspectors' hit rate of uncovering buildings with problems had been 13 per cent. After applying the new method, the hit rate for identifying problematic buildings rose to 70 per cent, five times more effective. For larger buildings, the "kids" discovered a statistical correlation with another factor: brickwork and brick deliveries across New York! The team had similar success with tracking down other problems in New York City such as tobacco smuggling (cigarettes cost twice as much in New York as in Virginia) and illegal sales of OxyContin, a prescription drug. The "kids" zeroed in on just one per cent of pharmacies that accounted for 24 per cent of illegal sales of OxyContin. Another application of statistical techniques made it easier to identify restaurants that just dumped their grease and fat down the drain late at night instead of turning it over to a waste disposal company, as was mandatory.

Did Mr Bloomberg tell Mr Modi to apply data science to running cities? Can statistics open up new dramatic solutions to intractable problems? Of course they can. The Mumbai municipal corporation, which is Asia's largest and richest, does not even have accurate data about all civic infrastructure nor is it moving fast to gather information or on geotagging. Using data sciences to govern cities better requires a strong commitment from the state governments to collect and digitise accurate data and then put a team in place to analyse them. More importantly, it requires a strong political commitment to support the city officials in their actions when conclusive data leads them to specific conclusions. Can Prime Minister Modi make a beginning with a few corporations in the many states that the BJP rules?

Source: Business Standard