We've written about BIG data before and while some reckon it's sexy, you better roll up your sleeves because you'll invariably need to do a lot of 'janitorial' (a.k.a. shit) work first!
Ron Sandland recently wrote about the new phenomenon of 'big data' - weighing up the benefits and concerns. Terry Speed reflected on the same issue in a talk earlier this year inGothenburg, Sweeden noting that this is nothing new to statisticians. So what's all the fuss about? Here's another take on the 'big data' bandwagon.
Opposition leader Ted Baillieu wants more transparency in Bay dredging moniotoring results.
Opposition leader Ted Baillieu plans on introducing a private member's bill into State Parliament next month seeking greater accountability and transparency in the environmental monitoring of Port Phillip Bay during dredging operations. At the same time, member of the government's independent expert review group, Professor Mick Keough has called on the Port of Melbourne to make publicly available the Environmental Monitoring Plan (EMP) sooner rather than later.
With less than 2 weeks before the commencement of dredging operations, there is insufficient time for full and comprehensive scrutinty of this important document by all interested parties - including the general public.
Ted Baillieu is correct in demanding a high level of transparency. Significant amounts of data will be collected before, during and after dredging. However, the release of data in its 'raw' form is unlikely to be illuminating nor necessary for the majority of users. For example, turbidity loggers can record NTU (a measure of turbidity) on a second-by-second basis and these data will exhibit a high degree of 'noise' that, if not correctly processed, will distort the true picture. It is quite common for individual turbidity readings to peak in excess of 100 NTU (compared to a typical background of about 3-5 NTU). This is usually due to a transient phenomenon such as seaweed drifting past the sensor or a small, localised patch of more turbid water. Clearly, from an environmental management perspective we are not interested in what turbidity is doing on sub-minute (or even sub-hourly) scales. What we are interetsed in is the daily, monthly, and annual time-scales. The statistical challenge lies in the 'correct' processing and interpretation of very large data sets to provide: 1) accurate; 2) timely; and 3) relevant information. It is this information (as distinct from data) that Ted should be pushing to have available (preferably via the web) in close to real-time.