Project Botticelli

SQL Server 2012 SP1, Hekaton, xVelocity, PolyBase, Data Mining, Mobility, Office & SharePoint 2013

7 November 2012 · 2 comments · 7905 views

Many Essential Upgrades to the Microsoft BI Stack

Microsoft have just announced the immediate availability of SQL Server 2012 SP1, which you can download from here, and of the SQL Server 2012 SP1 Feature Pack, which is available here. Let me explain why these are important to anyone involved in Microsoft Business Intelligence, and let me briefly summarise the remaining announcements, regarding Project Hekaton and PolyBase.

SP1 and Data Mining Add-Ins for Office 2013

First of all, as a data miner, I am delighted to see the immediate availability of the fifth generation of Data Mining Add-ins, now for Office 2013. Because Excel 2013 comes with PowerPivot technology built right into it (xVelocity is its internal data storage mechanism) and because it is just so easy to use it to modify, concatenate, calculate, reformat, or just filter data in PowerPivot and Excel, it makes an excellent tool for that most arduous, repetitive step in any data mining process: data preparation. You just drop your data in, prep it using Excel 2013, and surface it as a flattened pivot table, ready for mining! Then, it is those add-ins that let you engage the power of the data mining engine, part of SQL Server Analysis Services (SSAS), to look for correlations, and to perform predictive analysis. If you have never seen this in action, have a look at my short, 3-minute video introducing this technology, or the 24-minute introductory module of my data mining course.

Above all, those add-ins open the world of data mining to new users, who might otherwise be a bit scared of this supposedly complex technology. In my experience, those add-ins have done more to popularise data mining than most other tools put together… You can download the brand-new Data Mining Add-Ins for Office 2013 from here. You may also want to have a look at the entire, just-launched, feature pack, it contains plenty of other stuff useful in corporate BI.

Hekaton

Hekaton (no release dates announced) aims to repeat Microsoft analytical in-memory success in the transactional world of SQL Server OLTP. xVelocity has made columnar analysis possible in-memory, yielding 10–15 times speed improvements for aggregations, and other calculations that use the DAX language, and, soon, for MDX queries, too. It also enabled everyone to work on large data sets, without having to leave Excel. Now Microsoft is taking the same step, but in the OLTP, transactional processing world. In my opinion, this is long overdue, and I was surprised that analytics got their in-memory technologies before the transactional applications, but I have not been an OLTP guy for more than a decade, so perhaps I oversimplify. At last, we are entering the era that I talked about in 2003, in my then Next Decade in IT presentation: everything will be done in-memory, one day. Actually, the semantics of in-memory is a bit strange, since SSD is a (rather slow) memory, after all, too. Perhaps we should be referring to in-memory in the sense of performance (speed and scalability) improvements, as it will eventually become a moot point whether something is in-memory, or not, once everything is in the fast-connected-cloud. Give it another 10 years, or a bit longer in some parts of less-connected world.

In the meantime, back to Hekaton: Microsoft is processing OLTP transactions and queries, with full ACID support, in fast, CPU-accessible memory, they are not just using memory as a cache for an obvious speed-up, nor simply preloading all tables into memory. That is where, I expect, we will find the most interesting aspects of this technology, delivering a performance and a scalability improvement by being cleverer as to where the data is, and how it is structured, through cleverer query execution plans, unique to Hekaton, and/or additional API. The core design of Hekaton revolves around four principles, according to Dave Campbell, Microsoft Technical Fellow looking after this technology:

  1. Optimize for main memory data access,
  2. Accelerate business logic processing,
  3. Provide frictionless scale-up,
  4. Built-in to SQL Server.

Microsoft are suggesting up to 10 times better performance for existing applications and up to 50 for new ones—the new ones could take advantage of the additional API, while the existing ones would just rely on better query execution planning, aware of the in-memory storage. In a first glimpse of this technology, Dave Campbell describes some of this new API in his blog post. One interesting aspect of Hekaton, should use use that feature, is the ability to recompile stored procedures to native machine code, to further improve speed, while applying hardware-specific optimisations in the process. I cannot wait to see more announcements about this technology.

xVelocity does the speed-up in a wonderfully simple way for analytics: internally, it stores data not as traditional rows or columns, but as a tree of relationships representing columnar references to identical values. That is why it can aggregate so fast, but on the flip-side, that is why it works well for certain data sets only—those that exhibit the type of sparseness and regularity that can be abstracted into a relatively small tree, like a set of billions of sales records of just thousands of product lines. I am looking forward to the upcoming announcements from Microsoft sharing more detail about the inner workings of Hekaton, which, in Greek, ἑκατόν, means 100. It would be great to get even 10–20 times better performance, by doing just a database upgrade! That would make a process taking an hour run in just 3 minutes, something we are seeing with xVelocity Columnstore Index already today, for analytics. Incidentally, xVelocity Columnstore Indexes will become writeable in the next major release of SQL Server, not SP1, according to today’s PASS announcements.

PolyBase

Just two weeks ago, Microsoft announced HDInsight, their Hortonworks-built Hadoop distro, for both Windows Server, and for Azure (more about the preview versions here). HDInsight let’s you crunch big data stored in a Hadoop Distributed File System (HDFS). PolyBase (to be released by June 2013) is a nice, and a simple idea: let’s unify the way we query a traditional data warehouse with the way you would query a HDFS. You may be aware of the Apache Hive, technology that enabled data warehouse-style queries over data stored as plain HDFS files, which usually consist of plain key-value pairs. The usual problem with Hive is that you have to either query it directly, using the HiveQL language (SQL-like), or using specialised tools, such as the Microsoft Excel Hive Add-In, or the ODBC Drivers for Hive, or using the SQL Server Parallel Data Warehouse connectors, which use the Apache Sqoop tool to load data in and out of HDFS. PolyBase will let you write plain, normal T-SQL queries, and it will know how to translate them, and how to federate the underlying data, so your applications can transparently access a traditional star-schema warehouse stored in SQL Server 2012 PDW, also announced today (shipping by June 2013), or read results from your MapReduce big data processing, stored in HDFS, or even access the big input data itself. Effectively, this is a layer of indirection and federation that simplifies hybrid scenarios that use traditional and Hadoop approaches for analytics. This neatly extends Microsoft approach to big data, of letting you make this powerful technology work as part of your existing application platform. I talk about this in more detail in my article on Microsoft and Big Data, and in the video introducing SQL Server BI—if you are new to Big Data, I recommend you have a look at those.

Self-Service, SharePoint 2013, Mobility

Self-service analytics means Excel to most users. PowerPivot enabled budding analysts to do amazing things with their data, large or small, internal, or mashed-up from external sources, a few years ago, when SQL Server 2008 R2 shipped. PowerPivot is now part of Excel 2013, built-in, there is no need to download it, but you may want to enable the full user interface (just turn on the option). Power View, running on SharePoint Server 2010, on the other hand, enabled everyone, not just analysts, to explore and visualise data in pretty, informative, or even amazing ways, like with animated bubble charts. Power View is now also part of Excel 2013 (including the version for Surface Pro, not RT), which removes the dependency on SharePoint. However, you should seriously consider using SharePoint 2013 if you are involved in any type of corporate, enterprise BI. Why? It makes collaboration, sharing, and supervision, including risk management, of user-created BI easy. For example, it enables one user to create a model, using PowerPivot and Excel, so that another user can query and visualise it using Power View. For that, the model needs to be loaded somewhere, so that Power View can access it, when the analyst has gone home and shut down his laptop. There are two ways to achieve that:

  1. Put it in the cloud, using the brand-new Office 365 SharePoint Online, launching very soon, or,
  2. Put it on-premise, using SharePoint 2013.

In both cases, you need to have a version of SQL Server 2012 that is compatible with SharePoint 2013 and Excel 2013, and that is precisely what SQL 2012 SP1 provides. Further, SharePoint 2013 makes Microsoft BI pretty on mobile devices. PerformancePoint Services work on iPads (and other iOS devices), and Windows 8, and SharePoint itself is nice to use on those modern devices.

The full mobility of Microsoft BI, however, will not happen thanks to HTML5 quite yet, as it is just not fast enough for the kind of rich interaction that our fingers demand, for example when playing with a Power View bubble chart. Today, Microsoft announced that they will ship, in 2013, native iOS and Windows 8 apps, which will let you consume SQL Server BI in a more intimate way than even the much-improved SharePoint 2013 can allow. There was no announcement of an Android app.

The new Power View has plenty of new goodies, too, like interactive zooming maps, and the new ability to deal with categorised data. It was also announced today, that you will be able to use Power View to visualise multidimensional (cube) data, something that is possible today, provided you take the additional step of flattening the cube while you import data into a tabular model. It will be nice not to have to flatten cubes, one day, but this functionality is not part of the just-released SP1. 

As I am speaking at the upcoming SharePoint Conference I will blog again, with more about the new Power View, PowerPivot and the entire Microsoft BI stack in the Azure cloud. Above all, there will be new videos, for our Full Access Members, coming regularly over the next weeks and months, covering all of the new technologies.

In the meantime, if you want to stay on top of it all, please become our member. There is a lot of BI knowledge here for free, but if you buy a membership, you can also start learning, online, from these excellent tutorials:

And to celebrate the shipping of SQL Server 2012 SP1 we are offering a 10% discount on all memberships—use code SQL2012SP1, which expires this Sunday, Nov 11th, so hurry.

If you would like to see the original announcements, have a look at the SQL team’s TechNet blog. And finally, my congratulations to the entire team on an interesting release, and on a cool choice of names for all those new technobits—I can already hear someone saying: I’d apply a hekaton to that query… 

Rafal

Comments

Testbug · 2 August 2013

Rafal, you are the best, the best, the best, multiplied by two!
ohh and btw, 3rd paragrapth ends with "You may alsop" please delete this message after fixing it!
Cheers,
Testbug.

Project Bottice... · 13 August 2013

Testbug, you are very kind, indeed. Thank you for spotting the typo, now corrected.