大数据Spark “蘑菇云”行动第47课程 Spark 2.0实战之Dataset:collect_list、collect_set、avg、sum、countDistinct等
Dataset API:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
people.json
{"name":"Michael", "age":16} {"name":"Andy", "age":30} {"name":"Justin", "age":19} {"name":"Justin", "age":29} {"name":"Michael", "age":46}
运行结果
16/09/17 22:22:15 INFO CodeGenerator: Code generated in 20.317672 ms +-------+--------+--------+--------+--------+-------------------+--------+--------------+ | name|sum(age)|avg(age