Databricks is a good data source with maximum performance on Power BI. Still, you need to select the right connection mode (DirectQuery or Import), optimize queries, and set their settings for efficient handling of data. The intention is to minimize latency and maximize performance for larger datasets.
Best Practice for Connection Settings
Use the Databricks Connector
Connecting Power BI to Databricks using the Azure Databricks connector or the ODBC driver would ensure optimal compatibility.
Use Spark SQL queries to pull just the data needed, avoiding excessive loads in Power BI.
Optimizing Queries for Performance
Push calculations to Databricks instead of processing them in Power BI; pre-processing of data in Databricks could utilize Materialized Views or Delta Tables.
Partition large datasets and use indexing in Databricks for a faster query execution time.
Select the Ideal Connection Mode
The Import Mode: This is the best for performance, given that refreshes can be scheduled. It is recommended that the dataset be within memory limits for Power BI.
DirectQuery Mode: The best option for real-time reporting, although it could be slow. It is highly optimized if complex DAX calculations are avoided and data is aggregated in Databricks.
Hybrid Model - It supports the Import of frequently used data and DirectQuery for real-time insight.