SQL is a standardized query language for requesting information from a database. [1]
On a scale of 1–10 where 1 only knows select * from table and a 10 can fluently build stored procedures and views, a data scientist should be at least a 7.
Why?
SQL is THE language for working through a database environment. It’s not the language to perform “science” on the data, but it is the language to pull and manipulate the data. A DATA scientist needs to be fluent in DATA. Being fluent in data means that they should have a proper understanding of the final stage of data governance.
Data governance is the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data.[2] The final stage of data governance: querying the data.
If a data scientist fully relied on a data engineer or an ETL developer to get all of the data they needed, they would have a tough time finding an employer who wants them.
Are you going to develop a statistical approach on a table that contains 2 billionrows? What’s your plan? Store all of that in R or Python memory? Come on…
All things aside, SQL is an easy language to learn. It honestly mirrors the English language.
A data scientist, who is typically expected to be fluent in one of R, Python or SAS, could and should be able to learn and be proficient in SQL in a relatively short amount of time