Azure Databricks + MagicOrange
MagicOrange is a cloud first multi-tenant SaaS offering, listed on the Azure Marketplace. Since its inception MagicOrange has been using Azure’s native SaaS offerings to build and scale our MagicOrange Platform.
When we first started implementing a data platform for MagicOrange back in 2015, we were not anticipating huge data volumes. Existing Azure service offerings like Azure SQL DB, App Services, Storage Accounts, Power BI, Analysis Services were more than sufficient to run and grow our platform, considering that all of these services offered scalability. But after a few years when we started to see significant increases in data volumes with more customers onboarding to the platform, we started to look for scalable, durable and cost effective solutions in Azure Cloud to build out our MagicOrange Data and Analytics Platform.
We went through an exercise of evaluating which tools we needed to build our Data and Analytics Platform. We quickly realized that there are some tools which were good at ETL, others specialized in warehousing, and some a better fit for Analytics. Overall, if we chose this path, stitching together various tools and technologies, we would be spending more money on the various services and there would be management overhead.
After evaluating various tools and platforms, such as Snowflake, we chose to go with Azure Databricks as it is more cost-effective, there is less management overhead, and it meets all of our requirements for a next gen Cloud Data and Analytics Platform.
What we primarily liked about Azure Databricks:
-
- Lakehouse Architecture and single platform for Data Engineering, Data Science, Data Ingestion, Machine Learning, Data Warehouse/Lakehouse, Data Analytics.
- Data Security and Integration with Azure AD.
- Integration with Power BI using Azure Databricks Connector.
- Scalable architecture with clusters and Databricks Runtime, giving power of Apache-Spark and taking away complexity for managing any Spark config.
- Interactive development experience with Databricks Workspace and Notebooks, added the benefit of support for multiple languages like Python, R, SQL, Scala, Java (.jars).
- Orchestration with Jobs/Workflows and recently using Delta Live Tables.
- Most Important — Cost Effective — Databricks enabled us to build and run a Cloud Data and Analytics platform at scale and keep our cost well under budget. For example after migrating our ETL workload from a cloud native ETL tool to Azure Databricks we saw savings up to 400% per month on ETL jobs alone. We were able to start small and scale based on our needs, as we only pay for what we use.
Storage and compute are separate, which saves on storage cost, as data is in Delta Lake format and is stored in form of Parquet files on Azure Data Lake Storage Containers.
MagicOrange Lakehouse Architecture
Here is the MagicOrange architecture and the key areas we’ve found beneficial as we scale the business:
Data Engineering
Azure Databricks seamlessly integrates a wide range of data sources which has helped us build and scale our solutions quickly.
Data Engineering Workspace UI is developer friendly, with native features like notebooks, pipeline environment with Jobs, Workflows, Delta Live Tables, scheduling/orchestration, and failure notifications. This eliminates the need to maintain different tools to do the same tasks and has enabled the Data Engineering Team to keep their focus on solving ETL tasks.
Before Databricks Lakehouse, complex ETL pipelines were developed using cloud native ETL tools. Migrating to the Databricks Lakehouse was relatively easy using PySpark and Spark-SQL, with support for multiple languages and this enabled our Data Engineering teams to deliver complex ETL requirements quickly.
Since migrating to the Databricks Lakehouse, using scalable clusters and notebooks, ETL tasks are completing faster and are less expensive.
Data Governance and Security
MagicOrange is multi-tenant SaaS offering. Data Security and Customer Data isolation are top priorities, and since Azure Databricks is compliant with several industry and regulatory standards, including ISO 27001, SOC 2, and HIPAA, it helps MagicOrange in building secure solutions.
Azure Databricks has strong integration with Azure AD, which eliminates many security concerns and helps to leverage RBAC (Role Based Access Control) to control access to Databricks Workspace and other resources.
The implementation of Unity-Catalog helped us make the overall data landscape more secure. Databricks helped us remove prior limitations and gave us the ability to achieve our dev and production data isolation policy.
Unity Catalog features like external storage location, and support for SQL GRANT statements, helped in implementing better access control per Customer Catalog.
There are out-of-the box security features like Network Isolation, Data Encryption and a range of security features that helped us protect our data and meet our security requirements
Data Analytics with Databricks SQL
Databricks SQL Warehouses/Endpoints can be easily integrated with Power BI using the Azure Databricks Connector and support Direct Query mode to Delta Lake Data, which enabled us to build customer-facing Power BI Reports and Dashboards.
Serverless SQL Warehouses with Photon are immensely powerful and help us to visualize large datasets (100 million+ rows) in Power BI.
Databricks SQL Dashboards have helped our Data Analysts and Customer Success team to quickly analyze very large data sets, by writing simple SQL queries and building dashboards inside Databricks.
Data Sharing with Delta Sharing open protocol
Delta Sharing is an open standard that we use to securely share data with external and internal consumers from its original source.
Delta Sharing has helped us to democratize data and share data externally and securely with MagicOrange Customers. As part of customer on-boarding, each customer gets a dedicated share and recipient link.
Delta Sharing connectors are supported in popular BI tools, which has eliminated the need to build something in house to share data securely.
Future of MagicOrange with ML/AI/LLM
MagicOrange is a data driven company, always trying to create innovative solutions to help our customers draw insights into their complex data. As part of the MagicOrange product roadmap, there are plans to build ML/AI based data products which can enable customers to easily draw more insights from complex data. Using the Databricks Lakehouse Platform will help MagicOrange build and scale our ML/AI practice.
We plan to leverage Databricks Lakehouse Architecture and Dolly 2.0 to build ML/AI based data products, which can bring more value to MagicOrange Customers.
In this blog, I have shared some insights into how implementing the Lakehouse Architecture has helped MagicOrange build a scalable Data and Analytics Platform. Working as Cloud Architect and Data Architect, I find Azure Databricks very cost-effective, as it has allowed us to start on a small budget and scale with many features. In my perspective, Databricks can help fulfill most organization’s data, analytics and AI requirements using a single unified platform, which we could not achieve on other cloud data warehouses.
Over the last few years, I have seen Databricks evolve — adding new features and concepts, which makes it unique in this space. There is also continuous effort from the Databricks team to always improve. There is a great support system from Databricks Solution Architects, who bring expertise and best practices, speeding up implementation of Databricks in your preferred Cloud provider.
There is a ton of documentation available on Databricks and Microsoft to try and implement any feature mentioned in this blog, I recommend you to check out these docs if you are keen to understand and implement Databricks at your organization.
Want To Learn More? Let’s Start A Conversation.