I will drop the answer in the form of steps for better understanding.
1. The degree of parallelism is controlled by -m <n> or --num-mappers <n>. By default value of --num-mappers is 4.
2. --split-by <column-name>, will split your task on the basis of column-name.
3. $CONDITIONS, it is used internally by sqoop to achieve this splitting task.
Example, Your query:
sqoop import --connect jdbc:mysql://myserver:1202/ --username u1 --password p1 --query 'select * from emp where $CONDITIONS' --split-by empId --target-dir /temp/emp -m 4
Say, my empId is uniformly distributed from 1- 100.
Now, sqoop will take --split-by column and find its max and min value using the query:
SELECT MIN(empId), MAX(empId) FROM (Select * From emp WHERE (1 = 1) ) t1
See it replaced $CONDITIONS with (1 = 1).
In our case, min, max values are 1 and 100.
As number of mappers are 4, sqoop will divide my query in 4 parts.
Creating input split with lower bound 'empId >= 1' and upper bound 'empId < 25'
Creating input split with lower bound 'empId >= 25' and upper bound 'empId < 50'
Creating input split with lower bound 'empId >= 50' and upper bound 'empId < 75'
Creating input split with lower bound 'empId >= 75' and upper bound 'empId <= 100'
Now $CONDITIONS will again come into the picture. It is replaced by above range queries.
First mapper will fire query like this:
Select * From emp WHERE empId >= 25' AND 'empId < 50
and so on for other 3 mappers.
Results from all the mappers is aggregated and written to a final HDFS directory.
Regarding your next query :
select deptid, avg(salary) from emp group by deptid
you will specify
--query 'select deptid, avg(salary) from emp group by deptid where $CONDITIONS'
It will be first converted to
select deptid, avg(salary) from emp group by deptid where (1 = 0)
to fetch column metadata.
I believe this query won't run in RDBMS. Try above query(having Where (1 = 0)) directly in Mysql.
So you will not be able to use this query to fetch data using Sqoop.
Hope this helps.