An Algorithm By Any Other Name

An Algorithm is quite simply a set of steps to be taken in order to accomplish a specific task. Algorithms can be as simple as the steps that you take to prepare breakfast in the morning, but we are more familiar with them in more complex situations, for example, computer programming. More and more we rely on algorithms in our daily life – from video transmissions across the internet which are made possible with the use of audio and video compression algorithms; to Google maps which uses route finding algorithms.

The word algorithm comes from the Latin Algorithmi, which in turn is a translation of the arabic name Khwarizmi, the man who is very much credited with the foundation of algebra. Mohammed ibn-Musa al-Khwarizmi was a Persian mathematician, astronomer and geographer who wrote the book “The Compendious Book on Calculation by Completion and Balancing” in circa AD 850, which was subsequently translated into Latin into about the mid 12th century and basically presented the world with the first systematic approach to solving linear and quadratic equations. It includes an approach called al-jabr from which the term algebra is derived. BTW, I have checked and unfortunately this book in currently out of stock on Amazon with no restocking date.

Good algorithms solve problems correctly and efficiently but not always exactly. Lets take data mining, for example, where we use a set of heuristics and calculations to create a model. The algorithm(s) analyses the input data for patterns and trends and uses the results to find the optimal parameters for the data mining model.

Further on we will look at 2 algorithms, Lift and Chi Squared, which measure and test for correlation between two events respectively. Before we get into the examples though, a word of warning with respect to all statistical analysis.

Correlation not Causation – it is important to remember that correlation between two variables does not necessarily mean that one causes the other.
Intellectual laziness can lead to Post Hoc, Ergo Propter Hoc (after this, therefore because of this), the false cause fallacy. So, you are driving down the road and a black cat crosses your path. 15 minutes later, you drive over a chunk of metal that punctures your tyre. Would you really succumb to the idea that this was bad luck as a result of having encountered the black cat?

There are so many other fallacies that you need to consider when analysing data and looking for cause and effect. My favourite, which we have all encountered and some people are definitely more prone to using it than others, is Circular Reasoning.

circular_reasoning
The above diagram demonstrates that this argument goes in a never ending circle as there is no final proof, just the subject of the debate being put forward as evidence.

So algorithms will definitely help to establish relationships between entities such as correlation but human intelligence, subject matter knowledge, experience and good old fashioned common sense are still an essential ingredient when using this analysis to select intelligently from a vast number of possible decisions. In saying that, take care not to over complicate with hypotheticals, think of Occam’s Razor

We do not need to worry too much about Occam’s Razor with our Lift and Chi Squared algorithms but we will see where both could give us very misleading results, for example, where there is a very high level of Null transactions (i.e. transactions that contain neither A nor B, the variables that we are testing for correlation).

Let’s take the example of the supermarket who wants to understand the the likelihood of a customer wanting to buy coffee if they buy biscuits and visa versa.

Using Lift to analyse the relationship, we get the following results.

(For the purposes of the following calculation, B represents biscuits and C represents coffee).

Algorithm Blog Table 1

Using Lift, the correlation is said to be negative if the result is less than 1, independent if it is equal to 1 and positive if it is greater than one.

So, we can see that there is a positive correlation between the purchase of biscuits and the purchase of coffee so the question for the retailer is whether these products should be positioned closely together to maximise the revenue from this correlation or if they should position the products far away from each other with a view to maximising spontaneous spending,

As promised, we can use chi squared to test for correlation a similar type dataset.

Algorithm Blog Table 2

Using chi squared, a result of greater than zero indicates there is a correlation. You can see from the relationship between the observed (or actual) and the expected whether the correlation is positive or negative (where observed is greater than the expected, the correlation is positive and where the expected is greater than the observed, the correlation is negative)

In this case we get a result of 12.5 and this is a positive correlation as the observed value for biscuits and coffee (900) is greater than the expected (800).

Let’s now look at a situation where, although the relationship between the Biscuit and Coffee events remains the same, the Lift result might just get us a little confused.

Let’s take the example of Question 1 where we saw a positive correlation for Lift(B, C) with a result of 1.05. Here though, we change the value of the ^B ^C from 200 to 10,000. So, we still see a positive correlation for Lift(B, C) which you would expect as the ratio of times that one appears with the other to the number of times one appears without the other is the same. However, the 8.40 would appear to indicate a much stronger correlation which of course is not the case.

Algorithm Blog Table 3

So, you can see why it is important to understand the sort of parameters where the Lift result would provide you with a result on which you could make business decisions and where you might want to do a bit more analysis.  The result calculated above is impacted by the number of Null transactions (i.e the number of transactions that do not include either B or C).

For such a scenario, a Null-invariant measure of interestingness is recommended.  As the title suggests, these algorithms are not impacted by the number of Null transactions.

So, let’s both of the dataset examples used for Lift above only this time using the Kulczynski algorithm.

We will set epsilon so that we are confident that anything below this value is of no interest to us. My understanding is if Kulczynski is near 0 or 1, then we have interestingness that is negatively or positively associated respectively.

Algorithm Blog Table 4

Algorithm Blog Table 5

The Null-invariance algorithm Kulczynski produced exactly the same results for both datasets and it will continue to give the same result regardless of the value entered for ^B^C

There are other Null-invariant measures that can also be tested out such as Cosine (Null-invariant version of Lift as some would say), All Confidence, Max Confidence and Jaccard.

Being new to data mining specific algorithms, it is very interesting looking at the results of the few that I have worked with. I would like to spend so much more time understanding more of these algorithms but most importantly, getting an understanding of which algorithm is best suited for analysing various types of datasets, what are the potential fallacies of each and what ones should be used in conjunction with each other to give me the best possible understanding of the relationship between the events of my dataset.

Google Analytics at a Glance

Developing new insights in conjunction with past performance type measurements is the key to business analytics. Highly detailed and reliable statistical analysis allows for a better predictive modeling process and this in turn allows for more fact-based decision-making. Analytics is used in all elements of business from financial to supply chain to marketing. Digital Marketing relies heavily on the ability to anticipate business/consumer needs and demands. Not only is it essential to be able to capture data but we also need the right tools to help us to understand the data.
In today’s Digital world, the need to continually drive your customers’ online experience is critical for the success of your business. But how do you get ahead of the herd and ensure that you have the most real time quantative and qualitive data, not only from your website but also from your competition.
Let’s consider the fundamental KPI’s that are important to measure, which include the following:

• Total number of visitors on your website
• Average amount of time visitors spent on your website
• Bounce rate of those visitors.
• Campaign Return On Investment
• Campaign Conversion Rate
• Cost per Lead
• Visitor to Buyer Ratio
• Sales by Lead Source (Knowing your most profitable sources of leads can reduce costs in future ad spending (referrals, search engine results, social media etc.)
• Social Media (Facebook likes, Google +1s, retweets,etc.)
• Tracking Keywords (which search engine campaigns are working best)

SEO professional are understandably excited about one new reporting feature, real-time analytics which tracks live traffic, top active pages, and top keywords. It also shows top locations and pageviews per minute. This type of data provides great insights  for SEOs who want an instant view of traffic based on new content, events, or campaigns.  Company executives and marketing managers also want to understand the real-time impact of a current marketing campaign.

It pretty much goes without saying that your internet marketing campaign needs to be tracked to make it successful. Tracking tools are necessary to calculate ROI on your campaign. For small and mid-sized companies, there are many free or relatively cheap tools available to help you understand this information. GoogleAnalytics (google.com/analytics) is a free tool that, according to it’s own site’s usage statistics, is used by over 50% of the top 10,000 websites in the world. It’s simple, easy to use and robust, providing detailed statistics about your website visitors. Using Google Analytics will allow you to find out so much about your visitors such as how they located your site, what keywords they typed in to find your site, identify which pages and links were clicked most to establish which are the most popular. You can segment your customers by geograpghy and referral sources breaking out new from returning visitors. All this information will allow you to make the changes necessary to optimise your site so as to entice more quality prospects and create better conversion rates.

It is important to note that although Google Analytics is a very powerful and useful tool, most data analysts and SEO experts will agree that it is important to use more than one for comprehensive understanding of visitor behaviour and needs. For example, Google Analytics does not provide detail on visitor interaction (average mouse movement) in the way that Clicktale does. And Youeye’s attempt at meauring visitor sentiment (did the visitor find the site easy to navigate and get their answers quickly or was there a level of frutration involved) recording facial expressions of selected visitors in response to questions is quite extroadinary.

facial_expression
Image from FaceWarehouse: a 3D Facial Expression Database for Visual Computing http://www.kunzhou.net

For now though, we are going to take a closer look at Google Analytics and key features that are the essential toolkit for any online marketer.

Google Analytics 5 gives you the ability to create customer dashboards that include the metrics you choose and how you want them displayed.
You might want your default to be the general analytics dashoard showing unique visitors including visitors from social media or from SEO, top keywords, top viewed pages etc.
General Analytics Dashboard

You may want to create a geography dashboard (visits by country, region etc.) or a mobile dashboard (% of mobile visitors, which devices etc.).
You will most likely find a social media specific dashboard very useful.
Social Media Dashboard

 

Some other key features of Google Analytics that I would recomend using are:

  • Set up Goals based on your business Objectives.
  • Enable site speed to see the load times of your pages
  • Enable site seach so that you can report on keywords that vitiors are using on your site to find the product that they are looking for.
  • Definitely use the real time reporting functions.

I suggest that, if you have not done so already, you sign up to Google Analytics, play around with it and see what insights you can gain. However, don’t limit yourself to just one digital analytics tool.  Read the reviews, sign up for the free trials where the product is not free and find the best combination of tools to ensure that your website is giving your target audience exactly what is needed and wanted to maximise your conversion rate.

The G2Crowd website gives quite a detailed digital analytics tools comparison. The top level results are copied below but the website allows you to drill down into the detail and create your own comparisons.

Tool Comparison table

The R Statistical Programming Language

Which statistical programming language should I use?  R, SAS, Python, Julia, SPSS, the list goes on.  I am new to all of these languages but have had a very close relationship with MS Excel for many years now.  Some of my main uses of Excel included operational performance analysis, predicting future workload, resource planning and of course budgeting.  While few would argue that Excel has many very useful features and is a good looking GUI, it seems that for cleaning and analysing bigger data sets, a powerful and more flexible scripting language is recommended.  So, while I don’t see myself ever breaking up with Excel completely, I am open to reconsidering the exclusivity of our relationship on my part.

R is an open source statistical computing and graphics environment which is widely used by statisticians and data miners.  It is reported to have a very high standard of quality and numerical accuracy and I believe the latest techniques are usually available in the R system first.

I have read very opposing views on just how easy it is to learn R but hey, it’s free, let’s give it a go!!

CodeSchool offer a free Try R course so that was where I started my journey.  I completed the course (see certificate below) but I have to admit that I was struggling to see if I could apply what I had covered in the course in the live R environment.

Copy of the CodeSchool Completion Certificate:R CodeSchool Completion Certificate

I downloaded R and set about finding a data set that I would use for my first ever real R experience.  While browsing the  Eurostat website, I became interested in a comparison across the countries of the % of the active population that are employed in the science and technology sector.  I looked at how this % had changed through the most recent global recession and if there was a correlation to GDP per Capita. You will see a summary of my findings below under the heading “Tech In A Recession – Who Bucked the Trend?”

Back to the R experiment though, after all this was the real reason for seeking the data set in the first place.  I knew that I would not have the knowledge to carry out any structuring / cleansing of the data within R so back to my trusted Excel for that one.  I had access to the R Cookbook and was reasonably confident that I could create simple graphs. I didn’t expect that I could make them look pretty or present them exactly as I would have liked at this stage but the objective here was to get the coding correct to the point that the R graphs presented the data points accurately.

I wanted 3 graphs to show the following:

  1. the relationship between the % of the population employed in the science and technology sector and GDP per Capita in 2007
  2. the relationship between the % of the population employed in the science and technology sector and GDP per Capita in 2012
  3. a comparison between the change in % of the population employed in the science and technology sector and the change in the GDP per capita from 2007 and 2012

I installed the ggplot2 package as advised in the R Cookbook, installed the xlsx package and loaded my file and finally persuaded the program to plot a single data series from my data file.  After a lot of trying and a huge amount of research, I finally came to the conclusion that ggplot2 would not allow any more than one column of data on the Y axis.  Each of my data sets included 2 data series and my preference was to plot these on a primary and secondary Y axis so that I could define different maximum and interval values for each.

I then installed the poltrix package, split my data into separate CSV files (one for each graph), loaded both and finally put enough code together to come up with the following graphs:

graph12007_v1

graph22012_v1

graph3_v1

While I was very excited to be able to create graphs with a second Y axis that plot my data points correctly, to insert a title and data labels, to define the maximum values for each Y axis and to save the graphs out to my hard drive as jpg files, there is so more that I would do to the above graphs if I had the time.  Among other things, I would change the orientation of the X axis values so that all country names would be visible, change the font size and colour, change the line thickness etc.  In fact, I would like to be proficient enough in R that I could find more interesting ways to both analyse and present the data.  Time was not on my side though and getting to this level was not without it’s challenges.

So is R easy to learn?  Maybe if your background is in coding.  I, on the other hand, am putting it right up there on a par with Quantum Physics!  I’ll keep on experimenting though as it seems it is the way to go if I want to more involved in big data analysis.

Tech In A Recession – Who Bucked the Trend?

Arguably the most interesting conclusion to be drawn from the data was the lack of correlation between the countries which suffered most during the recession, and those where tech sector jobs were most heavily affected.  Certainly the countries which have seen proportionally the largest decline in employment within the sector have also been affected by the downturn, including Italy and Greece but also the UK which showed a surprisingly significant decline in both GDP and employment.  However, despite the length and depth of the recession across Europe only a small proportion of the countries surveyed actually showed a decline in the percentage of those employed in technology (although nine of the thirty countries were broadly “flat” with either an increase or a decrease of 2% or less).  One general trend is that larger countries appear to have done less well than smaller countries in attracting employment in the sector, although France did show a 4.6% increase (albeit from a low base).

At the positive end of the spectrum, the Nordic countries continue to perform strongly, as do two of the three Baltic States and two of the three Benelux countries.  This “fringe” of successful countries around the core of the EU is complemented by more developed Eastern European countries like Hungary, Croatia and Austria; and the two Mediterranean island nations of Malta and Cyprus.  Interestingly, three of the top six countries for technology growth are those worst hit by the financial crisis – Ireland and Iceland both suffered devastating collapses in their banking sectors but record double-digit growth through the worst of the recession; and Portugal recorded over 25% growth despite recording a drop in GDP.

Clearly employment in the technology sector is not a simple matter of prosperity.  Three of the stereotypical PIIGS (Portugal, Ireland, Italy, Greece and Spain) are in the lower half of the growth table, but the other two are right at the top.  Neither is corporation tax the answer – Ireland and the Netherlands both have low rates of tax and did well, but Luxembourg is one of the worst performers despite a low tax rate.  The trend, such as it is, suggests that small countries on the European fringe have been best placed to grow jobs in the technology sector, at the expense of large players like the UK and Germany.

 

 

 

 

 

 

 

 

 

Google Fusion Tables – Ireland Census 1861 to 2011

Google provides us with some really useful tools and one such tool is Fusion Tables, “an experimental data visualisation web application to gather, visualize and share data tables”.

To try out this tool for the first time, I downloaded the Irish County 2011 Population Table  and the Irish County KMZ File

(A KMZ is a compressed version of a KML file which stands for Keyhole Markup Language which we will use to create geographic data.  I decompressed it using a standard decompression utility and saved the KML file).

Having saved both popultation table file and the geographic files to my hard drive, I then downloaded the Google Fusion Tables API

I have been reliably informed that Google Fusion Tables only works using Chrome, the web browser developed by Google.

So, with everything that I needed for my experiment, I opened up the the Google Fusion Tables data visualisation tool.  On opening the tool, you are presented with the following screen

Fusion Tables Initital Page 2

 

I used the first option “From this Computer” and “Choose File” to upload the population table that I had previously saved to my hard drive and then just followed the default options until the file itself opened in the tool.  So far, rather unimpressive looking.

I then uploaded the county KML file that I had also saved previously using exactly the same steps.  These files are automatically saved on Google Drive.

I then chose to merge the two files together.  Once I was clear on how to rectify a problem with incorrect naming of a couple of counties in the KML file (thanks to my classmate Eoin for his help on this one!!), it was all pretty straight forward.

I selected the “Map of Location” tab at the top of the page and there was my data looking quite meaningless like pins on a map of Ireland.

I selected the “change map feature styles”, changed the “Map marker icons” to “Buckets”, entered what I thought to be the most meaning ranges using 7 buckets.

Now, the visualisation starts to look really meaningful and I do believe that I will be using this tool going forward!!

Below is the result of my experimentation with the Google Fusion Table API and the census and KML data that I used:

Population Density 1861 to 2011

Conventional wisdom suggests that the history of Ireland’s population has been based on two major trends: depopulation (largely due to emigration) from the time of the famine until the emergence of the Celtic Tiger in the 1990’s; and a shift from rural communities centred on agriculture to the cities, particularly Dublin.

I used census data from 1861 to 2011 to test this theory. Because the counties are of unequal size the most useful indicator available to me was population density; obtained by dividing the population of each county by its size in square kilometres.

Because the theories I wanted to test are based on change over time, I then looked at the size of population change over time in percentage terms to produce a graph covering the period 1861 to 2011. The graph shows clearly shows one interesting feature of the raw data, that the bottom of the “trough” in terms of depopulation occurs around the time of the 1961 census, and there is a consistent increase in population density from that point onwards. That suggests that population was increasing across the country from a much earlier stage than people generally think.  There was a significant rate of growth from 1961 through to about 1981, with a dramatic decrease during the recession of the 1980’s. Growth then increased again during the years of the Celtic Tiger, before beginning to tail off from about 2006.
Rate of Popultation change graph

Looking at the cities, Dublin has always had a significantly denser population than Cork, Galway and Limerick (the next three cities in the country, by size); although the fact that I was reliant on county, rather than city- based data for the latter three distorts the comparison to some extent. All four saw the downward trend from 1861, with a limited difference in the rate of that decline. However, Dublin “bounces back” much earlier than the rest of the country (from the late 1800’s) and at a massively different rate to the others, to the extent that Dublin’s population density in 2011 would be unrecognisable compared to the 1861 census whereas the other cities are still lagging behind their nineteenth century levels.

The emergence of Dublin as massively dominant compared to other cities is made clear by the data.  In 1861 it was in fact Cork which had the densest population in the country by some distance (608 individuals per square kilometre), with Dublin second (458) and Galway (303) not far behind.  By 2011 Dublin’s population density (1,421) exceeded Cork’s (579) by a factor of over 2.5.

The shift of population to Dublin is so clear that I also wanted to understand whether the city has dragged the counties around it along with it, so to speak. Kildare, Wicklow and Meath, the main “commuter counties” have certainly seen a significant rise in population density; to the extent that they now form a clear ring of density growth around Dublin.  Working out from there in concentric circles, Louth and Wexford have seen small rises in density with the rest of the neighbouring counties showing densities close-to but just below 1861 levels.  The same is true for the “city” counties of Cork, Limerick and Galway.

Outside these areas, particularly in the West and the Border Counties other than Louth, population density still remains well below 1861 levels, a trend which is emphasised on the above map by the huge movement of the population eastwards.

One other interesting theory which the data did not support was that emigration as a result of the financial crisis was having any major effect between 2006 and 2011. Densities continued to rise during this period, despite the onset of the recession in 2008/09, although the rate of growth had begin to slow in 2006. It will be interesting to see what the census data for 2016 tells us about this.

DELIVER US FROM (DON’T BE) EVIL? BIG BAD DATA. BIG BAD CORPORATIONS. BIG BROTHER SOLUTIONS?

I am being followed around by a rather nice villa in Spain.

Before you get ideas of some white-painted house on wheels trundling down the streets of West Dublin I should point out that the villa is virtually following me. It pops up on RTE.ie when I check the weather and on various other sites that I visit during the course of the day. Books increasingly do the same thing.

This is, of course, targeted marketing. I looked at the villa on a particular website (before deciding to go on holiday somewhere completely different). I browsed books on Amazon to read around the pool. Somewhere in the background two corporations agreed that one could market to me on the other’s website, using information based on my search.

Using data of this type to segment customers and personalise advertising gets marketing people extremely excited and not just in the online world. The screen I see on the ATM whilst it figures out whether I have money in my account may well now be based on what my bank knows about me and what it might be able to sell me – a mortgage, a personal loan or a savings product. The next big thing is locational services – the message that flashes up on my phone as I walk past a store where I have a loyalty card, inviting me to come in and avail of the great discounts that are available today.

The technical ability to do this is facilitated by two things: the fact that the devices we carry (and now wear) know more-and-more about us as processing power increases; and the fact that cloud services permit increasingly complex data analytics cost-effectively and fast as compared to traditional servers.

Interestingly, Amazon are a major provider of cloud computing services to business as well as a handy source of reading material; which brings us on to the big, nasty, obvious truth about the world of data – corporations are in it to turn a profit. Leaving Amazon alone for a minute, Google is a sexy brand. They tell us they won’t be evil and we largely believe them (with a lingering concern about what the US Government might be hovering up as we browse and network our way around the web) but their business model is built squarely on generating revenue from capturing and selling information about people almost exclusively for the purpose of other companies selling stuff to us, as Derek Scally’s interesting article in the Irish Times recently reminded us:

Derek Scally’s Irish Times Article

You could see Google’s approach to making money as ultimately less transparent than Amazon’s. The latter is recognisably a shop, which means you expect to pay money for the things that it sells you. The former provides free email, a free search engine and all sorts of other useful things. The price of that is not directly financial but something more personal: it’s information about you – what you like, what you look at, who you communicate with, where you go; in fact the minutiae of pretty much every aspect of your life.

And it’s not just Google, of course. Have you noticed the freaky way in which Facebook and LinkedIn seem to dig up potential contacts with whom you’re pretty sure you’ve never exchanged more than a few words over a coffee, let alone an email.

By the way, it works. Google’s revenues for the first part of 2015 are ahead of target, whilst poster boys of the last twenty years like Apple and Microsoft increasingly struggle to make money out of hardware and operating systems, with Microsoft posting a record loss this week.

Making money isn’t evil (at least outside North Korea!) but legislators are increasingly lining up against the “data exploiters” like Google. One line of attack is the anti-trust law suits with which the European Commission tortured Microsoft with for years. The other is data protection.

Back in May, European legislators finally put aside years of wrangling to agree on the fundamentals of a new Data Protection Directive to replace the existing EU-wide rules which date from as far back as 1981 (i.e. the Neolithic Era in data terms). Much of the media coverage has focussed on the struggle between the perceived laissez-faire approach of the Anglo-Irish fringe (driven, it is widely perceived, by the level of FDI in the Irish economy from US tech companies) and the more restrictive approach of continental Europe, exemplified by the famously savage rules on personal privacy under German and French law.

The essence of the new Directive will be informed consent. Data controllers and processors must ensure that we have agreed to the uses to which our data will be put, or bad things will happen to them in the form of fines based on percentages of turnover (i.e. potentially huge). The agenda is clearly to protect the individual from the inappropriate activities of big, mean (and largely American) corporations.

All of that is very laudable no doubt; and when you consider that when the first convention on data protection was signed in 1981 a huge swath of Europe was under control of regimes which believed that your whole life, including every shred of information about you, was the sole property of the State. No sane person would wish to return to that. On the other side of the fence, the antics of intelligence agencies protecting the democratic West haven’t helped, regardless of the threat they perceived themselves as protecting us against. Finally, the “data exploiters” have been their own worst enemies but appearing to be as opaque as possible about what data they collect, what they do with it and how you can stop them – as anyone who’s ever tried to navigate the privacy settings of a social media site will know. The level of time, effort and money you’d have to spend to “de-Google” your life is considerable (as the Irish Times article demonstrates), and frankly more than most people are willing or able to expend. Despite efforts by Regulators, including the Data Protection Commissioner, to improve transparency there’s still a feeling that you need degrees in both computer science and legalese to ensure that the actual level of privacy your data is enjoying matches what you thought you were getting when you unchecked those boxes in the “Settings” tab.

DP Cartoon1

There’s a lot of comfort in relying on the idea that the State (or the EU, if that’s not the same thing) will protect us from potential exploitation, and wield a very big stick against transgressors. Some of that level of comfort is false, for two reasons: First, laws are open to interpretation (and manipulation) by the very, very expensive lawyers kept by tech companies purely to ensure that pesky regulations don’t interfere with their business model; and that’s before we think about the “regulatory and government” liaison people whose job it is to remind politicians ever so gently of the need not to kill the golden goose of employment (which equates to votes, of course), by tying it up in yards of red tape. Secondly, like criminals, good tech companies are always one step ahead of the law because they have to innovate to survive. The fact that it has taken us until 2015 to even agree on a framework to replace a data protection regime which dates from the days of typewriters, when even fax machines were rare, demonstrates that the legislative process is not by any means agile and certainly no match for the speed at which data collection and analytics moves. Relying on the law to cure all the potential problems in this field is like chasing a Ferrari with a bicycle (and discovering that the guy driving the Ferrari just let your tyres down).

DP Cartoon2

Perhaps it’s time to think about playing Google at its own game, as some of the more forward thinking workers in this area are suggesting:

Gartner Datatopia Whitepaper

Information about me has value, even if it’s just my propensity for day dreaming about holidays in the sun. I know that, because Google and their ilk can sell it to people and make a profit. On the basis that I own “me”, and by extension all that information. Therefore, I am entitled to value in exchange for that information – either in money or services. If I “sell” my information then I lose control of it and if the buyer can make a profit by selling it on to someone else then so be it. That’s pretty basic supply chain economics. But, if we looked at the issue of privacy in that light, it would be the start of a far more transparent discussion with tech companies about how they use our data and would also allow each of us to make our own fine-grained decisions about what is done with our information, rather than relying on a monolithic legislative framework to protect us. Of course, the “price” of my information is unlikely to be cash. It’s more likely to be access to free email, or cloud storage or whatever. That means that privacy would come at a price, but at least the bargain is clear and both sides know its terms.

There are two major hurdles before that conversation can start. First, privacy settings and privacy policies have to be easy to find, simple, instinctive and clear. Taking that step means that more people will use them and will have a direct impact on the provider’s bottom line. On the basis that turkeys have never been known to vote for Christmas, that’s where the law and an intrusive regulatory regime comes in. Second, education is key. Even teenagers, who grew up in an online connected world still don’t grasp that their online presence and everything they post is effectively in the public domain; which suggests there’s limited hope for those of us over 30. Knowledge is power and it’s everyone’s responsibility to impart that knowledge.

Think different, as Apple used to say in the olden days. It might just be better to play those evil tech companies at their own game than try to push the river back uphill.