Skip to content

Dataset API -- Actions

Actions are part of the <> for...FIXME

NOTE: Actions are the methods in the Dataset Scala class that are grouped in action group name, i.e. @group action.

[[methods]] .Dataset API's Actions [cols="1,2",options="header",width="100%"] |=== | Action | Description

| <> a|

[source, scala]

collect(): Array[T]

| <> a|

[source, scala]

count(): Long

| <> a|

[source, scala]

describe(cols: String*): DataFrame

| <> a|

[source, scala]

first(): T

| <> a|

[source, scala]

foreach(f: T => Unit): Unit

| <> a|

[source, scala]

foreachPartition(f: Iterator[T] => Unit): Unit

| <> a|

[source, scala]

head(): T head(n: Int): Array[T]


| <> a|

[source, scala]

reduce(func: (T, T) => T): T

| <> a|

[source, scala]

show(): Unit show(truncate: Boolean): Unit show(numRows: Int): Unit show(numRows: Int, truncate: Boolean): Unit show(numRows: Int, truncate: Int): Unit show(numRows: Int, truncate: Int, vertical: Boolean): Unit


| <> a| Computes specified statistics for numeric and string columns. The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

[source, scala]

summary(statistics: String*): DataFrame

NOTE: summary is an extended version of the <> action that simply calculates count, mean, stddev, min and max statistics.

| <> a|

[source, scala]

take(n: Int): Array[T]

| <> a|

[source, scala]

toLocalIterator(): java.util.Iterator[T]

|===

=== [[collect]] collect Action

[source, scala]

collect(): Array[T]

collect...FIXME

=== [[count]] count Action

[source, scala]

count(): Long

count...FIXME

=== [[describe]] Calculating Basic Statistics -- describe Action

[source, scala]

describe(cols: String*): DataFrame

describe...FIXME

=== [[first]] first Action

[source, scala]

first(): T

first...FIXME

=== [[foreach]] foreach Action

[source, scala]

foreach(f: T => Unit): Unit

foreach...FIXME

=== [[foreachPartition]] foreachPartition Action

[source, scala]

foreachPartition(f: Iterator[T] => Unit): Unit

foreachPartition...FIXME

=== [[head]] head Action

[source, scala]

head(): T // <1> head(n: Int): Array[T]


<1> Calls the other head with n as 1 and takes the first element

head...FIXME

=== [[reduce]] reduce Action

[source, scala]

reduce(func: (T, T) => T): T

reduce...FIXME

=== [[show]] show Action

[source, scala]

show(): Unit show(truncate: Boolean): Unit show(numRows: Int): Unit show(numRows: Int, truncate: Boolean): Unit show(numRows: Int, truncate: Int): Unit show(numRows: Int, truncate: Int, vertical: Boolean): Unit


show...FIXME

=== [[summary]] Calculating Statistics -- summary Action

[source, scala]

summary(statistics: String*): DataFrame

summary calculates specified statistics for numeric and string columns.

The default statistics are: count, mean, stddev, min, max and 25%, 50%, 75% percentiles.

NOTE: summary accepts arbitrary approximate percentiles specified as a percentage (e.g. 10%).

Internally, summary uses the StatFunctions to calculate the requested summaries for the Dataset.

=== [[take]] Taking First Records -- take Action

[source, scala]

take(n: Int): Array[T]

take is an action on a Dataset that returns a collection of n records.

WARNING: take loads all the data into the memory of the Spark application's driver process and for a large n could result in OutOfMemoryError.

Internally, take creates a new Dataset with Limit logical plan for Literal expression and the current LogicalPlan. It then runs the SparkPlan.md[SparkPlan] that produces a Array[InternalRow] that is in turn decoded to Array[T] using a bounded encoder.

=== [[toLocalIterator]] toLocalIterator Action

[source, scala]

toLocalIterator(): java.util.Iterator[T]

toLocalIterator...FIXME