Dataset API -- Actions¶

Actions are part of the <> for...FIXME

NOTE: Actions are the methods in the Dataset Scala class that are grouped in action group name, i.e. @group action.

[[methods]] .Dataset API's Actions [cols="1,2",options="header",width="100%"] |=== | Action | Description

| <> a|

[source, scala]¶

collect(): Array[T]¶

| <> a|

[source, scala]¶

count(): Long¶

| <> a|

[source, scala]¶

describe(cols: String*): DataFrame¶

| <> a|

[source, scala]¶

first(): T¶

| <> a|

[source, scala]¶

foreach(f: T => Unit): Unit¶

| <> a|

[source, scala]¶

foreachPartition(f: Iterator[T] => Unit): Unit¶

| <> a|

[source, scala]¶

head(): T head(n: Int): Array[T]

| <> a|

[source, scala]¶

reduce(func: (T, T) => T): T¶

| <> a|

[source, scala]¶

show(): Unit show(truncate: Boolean): Unit show(numRows: Int): Unit show(numRows: Int, truncate: Boolean): Unit show(numRows: Int, truncate: Int): Unit show(numRows: Int, truncate: Int, vertical: Boolean): Unit

| <> a| Computes specified statistics for numeric and string columns. The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

[source, scala]¶

summary(statistics: String*): DataFrame¶

NOTE: summary is an extended version of the <> action that simply calculates count, mean, stddev, min and max statistics.

| <> a|

[source, scala]¶

take(n: Int): Array[T]¶

| <> a|

[source, scala]¶

toLocalIterator(): java.util.Iterator[T]¶

|===

=== [[collect]] collect Action

[source, scala]¶

collect(): Array[T]¶

collect...FIXME

=== [[count]] count Action

[source, scala]¶

count(): Long¶

count...FIXME

=== [[describe]] Calculating Basic Statistics -- describe Action

[source, scala]¶

describe(cols: String*): DataFrame¶

describe...FIXME

=== [[first]] first Action

[source, scala]¶

first(): T¶

first...FIXME

=== [[foreach]] foreach Action

[source, scala]¶

foreach(f: T => Unit): Unit¶

foreach...FIXME

=== [[foreachPartition]] foreachPartition Action

[source, scala]¶

foreachPartition(f: Iterator[T] => Unit): Unit¶

foreachPartition...FIXME

=== [[head]] head Action

[source, scala]¶

head(): T // <1> head(n: Int): Array[T]

<1> Calls the other head with n as 1 and takes the first element

head...FIXME

=== [[reduce]] reduce Action

[source, scala]¶

reduce(func: (T, T) => T): T¶

reduce...FIXME

=== [[show]] show Action

[source, scala]¶

show(): Unit show(truncate: Boolean): Unit show(numRows: Int): Unit show(numRows: Int, truncate: Boolean): Unit show(numRows: Int, truncate: Int): Unit show(numRows: Int, truncate: Int, vertical: Boolean): Unit

show...FIXME

=== [[summary]] Calculating Statistics -- summary Action

[source, scala]¶

summary(statistics: String*): DataFrame¶

summary calculates specified statistics for numeric and string columns.

The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

NOTE: summary accepts arbitrary approximate percentiles specified as a percentage (e.g. 10%).

Internally, summary uses the StatFunctions to calculate the requested summaries for the Dataset.

=== [[take]] Taking First Records -- take Action

[source, scala]¶

take(n: Int): Array[T]¶

take is an action on a Dataset that returns a collection of n records.

WARNING: take loads all the data into the memory of the Spark application's driver process and for a large n could result in OutOfMemoryError.

Internally, take creates a new Dataset with Limit logical plan for Literal expression and the current LogicalPlan. It then runs the SparkPlan.md[SparkPlan] that produces a Array[InternalRow] that is in turn decoded to Array[T] using a bounded encoder.

=== [[toLocalIterator]] toLocalIterator Action

[source, scala]¶

toLocalIterator(): java.util.Iterator[T]¶

toLocalIterator...FIXME