CDS for Apps VS Power Query – is there a difference? If so, what is that difference? Are they mutually exclusive or complement each other? Later I briefly outline
Common Data Services for Analytics no, Power BI Datapools no again, actually Power BI Data Flows. It is difficult to keep up with the correct naming.
What I have put together is my impression based on my experience and research from the perspective of a backend data person with a developer background. I try not to draw too direct conclusions so please treat it as what it is at this point: a technical opinion (and not an expert’s testimonial).
In short: my understanding is that CDS for Apps is a subset of Power Query functionality packaged together, something similar to how a SQL Server stored procedure is an abstraction layer of a bunch of T-SQL statements. The same applies to CDS for Apps, you don’t need to understand all the subtleties behind CDS for Apps, it’s like an M stored procedure (that includes connectors in my view though it is not the case in the official documentation).
So even without CDS for Apps, we can use all that it offers and even more on the backend side, assuming at least one of the dev team can code M queries (Microsoft plans to introduce M into SSIS/ADF, too so it can be used in enterprise level data load processes, too!).
For some more details, please read on!
Recently I had a chance (thanks to a colleague who came to me with a question) to experiment a little bit with Power Query.
With all things, Power (true to this one, too) lines become blurry regarding the target audience or the targeted userbase. You hear about a Power [Something], labeled as “self-service” and initially, you (certainly true for me) don’t really know how the pieces fit together and how it is integrated into Microsoft’s enterprise level product roadmap.
My purpose is to share what I have learned in a brief period of time to help you avoid the confusion and misunderstandings that I had about these things. Given my background, I mostly see things from the BI developer’s (if there is still such a thing, please put your own designation here: data engineer, analytical specialist, etc) perspective, so when I write I or we, it is always from that viewpoint. However, being a consultant means I try to consider technical solutions from a wider perspective from two angles:
- How would management/stakeholders understand the capabilities of a new tool (or a new version of a tool)? – this often drives future tech-investments and has a significant impact on projects
- How could end users (on any level) deal with the new tool/version? – this obviously impacts support and its structure
I openly admit that I don’t have a clear picture when it comes to Microsoft’s Power [Something] strategy and roadmap. Bits of news fly around all the time and the scene changes very often. What I write below is my impression and I try not to draw too direct conclusions so please treat it what it is at this point: a technical opinion (and not an expert’s testimonial).
CDS for Apps
I keep hearing about CDS for Apps and how awesome it is, and that we just need to wait for a little as it’s not ready yet or not released with all its features.
Apps? What Apps?
My first unspoken question was: What the heck are these Apps? Only Power Apps or a more general application concept? I’m still not entirely sure as it seems to be Power Apps but for some reason, Microsoft doesn’t want to explicitly say that so at the same time it’s wider than that, maybe we need to include all the other apps that are part of e.g. D365.
Then what is CDS for Apps?
My own brief definition for CDS for Apps is the data management layer of Apps.
On their official site, Microsoft doesn’t define it but highlights what it can be used for. I am not sure if it’s a fully fleshed out product but could still be in the finalizing phase which will be done after releasing it as a preview kind of feature.
Officially “Common Data Service (CDS) for Apps lets you securely store and manage data that are used by business applications. Data within CDS for Apps is stored within a set of entities. An entity is a set of records used to store data, similar to how a table stores data within a database.” – I find it a bit vague and not really clear. My quick definition also doesn’t say more but it is less cloudy.
Shall we use it?
After (not) figuring it out the answer to the first question comes the next one: Shall we use it?
How does it fit into the BI/analytics space? My take on that: it doesn’t.
Why do I say this? Those Apps (whatever they are) are on the other side of data: when it’s captured. Of course, there is always a “getting data” component to present the user e.g. available values, customized product list, … but the main focus is on capturing data in a form / using an ERP system / … We use the data on the other side when it is already part of the system and we extract it from there usually en masse.
Where do the two worlds come together? Does this mean that CDS for Apps (whatever it does/will include as it evolves) is so cool, but in the backend realm we lose out on it because we want to use it but can’t?
Be patient, I’ll answer this at the end of a next point ?
We all know what this age-old concept of Extract-Transform-Load (or nowadays often flavored as Extract-Load-Transform) means. But why do I mention it here? How is it related to CDS for Apps? Where is ETL in the current Microsoft universe?
The whole data loading landscape is under serious changes (seemingly fast but things don’t die out that quickly and new technologies (in a production-ready state!) don’t rise that quickly either) highlighted by the followings (this is purely just based on my experience and readings) – please note that I exclude the big data world on purpose for now:
- in the Microsoft world on premise versions are losing space against Azure
- shifting towards platform as a service without the justified overhead of the virtual machines (or real servers)
- SSIS is way past its heydays and, my opinion is that it will go away in a few years, it doesn’t really get any new components/features (just those that are used for other software products and get ported, so they can be used in SSIS)
- ADF (Azure Data Factory) is the cloud based ETL platform for Microsoft (or will be soon) because even though you can get SSIS in the cloud but ADF is designed with the cloud in mind from day 1.
A few days later after writing this paragraph, I skimmed through an official Microsoft entry about the new features of SQL Server 2019 and – sadly or not – this part seems to support my opinion:
- old-school T-SQL doesn’t go out of fashion in the near future
Power Query – MMM, yummy…
How does Power Query come into the picture? It is (again!) a completely different language (or frontend for the M language which is the informal /short/ name for Power Query Formula Language) though if you are lucky you can use the drag & drop interface embedded in Excel or Power BI and at other places.
However, the graphical UI has its limitations as not every functionality of M is accessible. The gap is getting narrower at a steady pace but for the very tricky tasks, handcrafting M code could be your only option. And no matter what the marketing guys at Microsoft say: it is not easy or straightforward to use, it is not that similar to Excel’s expression language. Just see this reference website if you need convincing: https://www.powerquery.io/
Coding – life of a BI developer in the modern cloud age
How many languages can a BI developer/data engineer/[future naming of such positions] have the need to use depending on source/target systems? No, despite how it starts, it’s not a joke 🙂
The list is not exhaustive (I know some of the items are not actually coding languages but specifications or protocols but to deal with them can be equally complicated)
MDX(R.I.P.) => DAX
- Excel expression language
- Power Query / M
- Azure CLI
- HTML / XML
- free text – just laugh, free text fields and files without a proper structure can be the most untameable beasts when it comes to ETL development!
- PowerApps syntax
All the above ones have a different syntax no matter how similar some of them are meant to be. I think the similarity is more often confusing than helpful.
And regardless of what techie celebrities say in a Coding is not difficult YouTube video, when you spend hours (if not days) trying to make a programming language do what you want because according to the specifications (or their marketing department produced such a documentation that stakeholders consider it a silver bullet) it’s hard to agree with this statement.
I’m sure some of you would think that such a problem can come for my lack of expertise in this or that language. I agree, it could come of that but when together with my workmates we also spend a long time not being able to come to a solution (I’m not talking about prototypes but production-ready code) that is mostly due to the complexity behind real-life scenarios when the source dataset is not a nicely prepared .csv file.
Power Query and CDS for Apps
Finally, after all this long prologue…
So, can we use the Sheer awesomeness embedded into CDS for Apps even if we don’t use power Apps? Yes, and even more!
And the very existence of this next page shows that we don’t need CDS for Apps as why would we want to limit ourselves?
Executive developer summary – once more
In short: my understanding is that CDS for Apps is a subset of Power Query functionality packaged together, something similar to how a SQL Server stored procedure is an abstraction layer of a bunch of T-SQL statements. You don’t need to know how to write those SQL commands, but only how to call the stored procedure, what parameters to feed and what to expect as the outcome. Same here: you don’t need to understand all the subtleties behind CDS for Apps which is like an M stored procedure (that includes connectors in my view though not in the above screenshot).
So even without CDS for Apps, we can use all that and more, assuming at least one of the dev team can code M queries (Microsoft plans to introduce M into SSIS / ADF, too so it can be used in enterprise level data load processes, too!)
A little nugget at the end. It is important as I think integrating self-service components and enterprise level data artifacts is often very tricky. Especially in terms of understanding what we should support and what not. Finance teams are good examples how self-service approaches can easily go out of control with their monster Excel spreadsheets/Power BI desktop files/[anything else] as they can be very creative which is good up to a point, and only up to that point. Where that point is… I think we all would like to know where that is!
Often the maintenance of the useful result of user creativity becomes a key question. Or the way of how to give them building blocks that are nicely tuned/maintained/documented to avoid them needing to reinvent the wheel or – what’s worse – creating their own mess.
We are treading down that path anyway by making models in Power BI and give those to users as a centralized, supported, maintained baseline. Data Catalog is the same thing for Power Query objects.
Let’s say we have an external data source (website) that doesn’t need to be fully integrated into our system, so it does not become part of the ETL. Power Query can connect to it and we prefer applying some best practice data cleansing steps (e.g. setting date values as date types to avoid April becoming the first month of the year…). Then instead of distributing the Excel file containing that query the way is to send it to the data catalog which is the centralized store of the Power Query queries.
Power BI Data Flows and “self-service data warehousing”
Data Flows in Power BI is a new feature, announced only 7 November 2018 so no one really has a lot of hands-on experience with it, but based on the announcement my impression is that this is the next step on the way Data Catalog was the previous station. It is really getting enterprise level in terms of infrastructure and as a canonical layer.
I am often cautious when Microsoft comes up with a new significant functionality – more because often their marketing team advertises it in a way that senior leaders see it as the “tool that can solve everything” – and I am somewhat confused about how Azure Data Factory and Power BI Data Flows will live side-by-side. Previously Power BI and the other Power [Somethings] targeted self-service power users and not the enterprise level ETL. Now it’s creeping into that area and if it’s conquered by it, I wonder what’s left for ADF? Maybe they consciously throw two competing projects in the mix to see what comes out of it?
I would reserve my judgment for now about such concepts as “self-service data preparation” and “self-service data warehousing” but I have to admit that these can have very significant implications to an organization and in both ways. I’m more anxious than enthused but that might come from my position as I am not on the self-service side.
Time will tell, but until then let me quickly recap my understanding of Data Flows. It’s everything that was in Data Catalog: a power user creates a nice data prep and load logic using Power Query and then publishes it, so it becomes available for others in the same organization (so far not a lot of difference…) but there are two major improvements:
- It’s a PaaS that stores data in Azure Data Lake Storage Gen2, so it will perform quite well, and a lot of other Azure services can be plugged into it (Databricks, Machine Learning components, …)
- There will be Common Data Models available that are industry standard and – hopefully – customizable data flows covering areas that appear at many companies in the same industry. Common entities (e.g. Account, Customer) are provided too as usually, companies don’t need to reinvent the wheel but smartly use what is already available!
The licensing is not totally clear for me, but I assume that, and the actual technology will gradually evolve starting in the near future.
Some very low level details are already available in Amir Netz’s whitepaper if you want to dive really deep.
Thanks for reading if you’ve managed to get to this point! Feel free to comment and/or correct!