Class DataFrameGroupBy (1.35.0)

DataFrameGroupBy(
    block: bigframes.core.blocks.Block,
    by_col_ids: typing.Sequence[str],
    *,
    selected_cols: typing.Optional[typing.Sequence[str]] = None,
    dropna: bool = True,
    as_index: bool = True
)

Class for grouping and aggregating relational data.

Methods

agg

agg(
    func=None, **kwargs
) -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]

Aggregate using one or more operations.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> data = {"A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860]}
>>> df = bpd.DataFrame(data)

The aggregation is for each column.

>>> df.groupby('A').agg('min')
    B         C
A
1  1  0.227877
2  3  -0.56286
<BLANKLINE>
[2 rows x 2 columns]

Multiple aggregations

>>> df.groupby('A').agg(['min', 'max'])
    B             C
       min max       min       max
A
1        1   2  0.227877  0.362838
2        3   4  -0.56286  1.267767
<BLANKLINE>
[2 rows x 4 columns]
Parameter
NameDescription
funcfunction, str, list, dict or None

Function to use for aggregating the data. Accepted combinations are: - string function name - list of function names, e.g. ['sum', 'mean'] - dict of axis labels -> function names or list of such. - None, in which case kwargs are used with Named Aggregation. Here the output has one column for each element in kwargs. The name of the column is keyword, whereas the value determines the aggregation used to compute the values in the column.

Returns
TypeDescription
bigframes.pandas.DataFrameA BigQuery DataFrame.

aggregate

aggregate(
    func=None, **kwargs
) -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]

Aggregate using one or more operations.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> data = {"A": [1, 1, 2, 2],
...         "B": [1, 2, 3, 4],
...         "C": [0.362838, 0.227877, 1.267767, -0.562860]}
>>> df = bpd.DataFrame(data)

The aggregation is for each column.

>>> df.groupby('A').aggregate('min')
    B         C
A
1  1  0.227877
2  3  -0.56286
<BLANKLINE>
[2 rows x 2 columns]

Multiple aggregations

>>> df.groupby('A').agg(['min', 'max'])
    B             C
       min max       min       max
A
1        1   2  0.227877  0.362838
2        3   4  -0.56286  1.267767
<BLANKLINE>
[2 rows x 4 columns]
Parameter
NameDescription
funcfunction, str, list, dict or None

Function to use for aggregating the data. Accepted combinations are: - string function name - list of function names, e.g. ['sum', 'mean'] - dict of axis labels -> function names or list of such. - None, in which case kwargs are used with Named Aggregation. Here the output has one column for each element in kwargs. The name of the column is keyword, whereas the value determines the aggregation used to compute the values in the column.

Returns
TypeDescription
bigframes.pandas.DataFrameA BigQuery DataFrame.

all

all() -> bigframes.dataframe.DataFrame

Return True if all values in the group are true, else False.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 0], index=lst)
>>> ser.groupby(level=0).all()
a     True
b    False
dtype: boolean

For DataFrameGroupBy:

>>> data = [[1, 0, 3], [1, 5, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["ostrich", "penguin", "parrot"])
>>> df.groupby(by=["a"]).all()
        b       c
a
1   False    True
7   True    True
<BLANKLINE>
[2 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesDataFrame or Series of boolean values, where a value is True if all elements are True within its respective group; otherwise False.

any

any() -> bigframes.dataframe.DataFrame

Return True if any value in the group is true, else False.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 0], index=lst)
>>> ser.groupby(level=0).any()
a     True
b    False
dtype: boolean

For DataFrameGroupBy:

>>> data = [[1, 0, 3], [1, 0, 6], [7, 1, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["ostrich", "penguin", "parrot"])
>>> df.groupby(by=["a"]).any()
        b       c
a
1   False    True
7   True    True
<BLANKLINE>
[2 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesDataFrame or Series of boolean values, where a value is True if any element is True within its respective group; otherwise False.

count

count() -> bigframes.dataframe.DataFrame

Compute count of group, excluding missing values.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, np.nan], index=lst)
>>> ser.groupby(level=0).count()
a     2
b     0
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, np.nan, 3], [1, np.nan, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["cow", "horse", "bull"])
>>> df.groupby(by=["a"]).count()
   b  c
a
1  0  2
7  1  1
<BLANKLINE>
[2 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesCount of values within each group.

cumcount

cumcount(ascending: bool = True)

Number each item in each group from 0 to the length of that group - 1. (DataFrameGroupBy functionality is not yet available.)

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b', 'c']
>>> ser = bpd.Series([5, 1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).cumcount()
a    0
a    1
b    0
b    1
c    0
dtype: Int64
>>> ser.groupby(level=0).cumcount(ascending=False)
a    0
a    1
b    0
b    1
c    0
dtype: Int64
Parameter
NameDescription
ascendingbool, default True

If False, number in reverse, from length of group - 1 to 0.

Returns
TypeDescription
bigframes.pandas.SeriesSequence number of each element within each group.

cummax

cummax(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative max for each group.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cummax()
a    6
a    6
b    0
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cummax()
         b  c
fox      8  2
gorilla  8  5
lion     6  9
<BLANKLINE>
[3 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesCumulative max for each group.

cummin

cummin(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative min for each group.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cummin()
a    6
a    2
b    0
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cummin()
         b  c
fox      8  2
gorilla  2  2
lion     6  9
<BLANKLINE>
[3 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesCumulative min for each group.

cumprod

cumprod(*args, **kwargs) -> bigframes.dataframe.DataFrame

Cumulative product for each group.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cumprod()
a     6.0
a    12.0
b     0.0
dtype: Float64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["cow", "horse", "bull"])
>>> df.groupby("a").cumprod()
          b     c
cow     8.0   2.0
horse  16.0  10.0
bull    6.0   9.0
<BLANKLINE>
[3 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesCumulative product for each group.

cumsum

cumsum(
    *args, numeric_only: bool = False, **kwargs
) -> bigframes.dataframe.DataFrame

Cumulative sum for each group.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([6, 2, 0], index=lst)
>>> ser.groupby(level=0).cumsum()
a    6
a    8
b    0
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["fox", "gorilla", "lion"])
>>> df.groupby("a").cumsum()
          b  c
fox       8  2
gorilla  10  7
lion      6  9
<BLANKLINE>
[3 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesCumulative sum for each group.

diff

diff(periods=1) -> bigframes.series.Series

First discrete difference of element. Calculates the difference of each element compared with another element in the group (default is element in previous row).

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).diff()
a    <NA>
a      -5
a       6
b    <NA>
b      -1
b       0
dtype: Int64

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                   'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).diff()
          a     b
dog    <NA>  <NA>
dog       2     3
dog       2     4
mouse  <NA>  <NA>
mouse     0     0
mouse     1    -2
mouse    -5    -1
<BLANKLINE>
[7 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesFirst differences.

expanding

expanding(min_periods: int = 1) -> bigframes.core.window.Window

Provides expanding functionality.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'c', 'c', 'e']
>>> ser = bpd.Series([1, 0, -2, -1, 2], index=lst)
>>> ser.groupby(level=0).expanding().min()
index  index
a      a         1
       a         0
c      c        -2
       c        -2
e      e         2
dtype: Int64
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesAn expanding grouper, providing expanding functionality per group.

head

head(n: int = 5) -> bigframes.dataframe.DataFrame

Return last first n rows of each group

Examples:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> df = bpd.DataFrame([[1, 2], [1, 4], [5, 6]],
...                   columns=['A', 'B'])
>>> df.groupby('A').head(1)
   A  B
0  1  2
2  5  6
[2 rows x 2 columns]
Parameter
NameDescription
nint

If positive: number of entries to include from start of each group. If negative: number of entries to exclude from end of each group.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesFirst n rows of the original DataFrame or Series

kurt

kurt(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Examples:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b']
>>> ser = bpd.Series([0, 1, 1, 0, 0, 1, 2, 4, 5], index=lst)
>>> ser.groupby(level=0).kurt()
a        -6.0
b   -1.963223
dtype: Float64
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesVariance of values within each group.

kurtosis

kurtosis(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher's definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Examples:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b']
>>> ser = bpd.Series([0, 1, 1, 0, 0, 1, 2, 4, 5], index=lst)
>>> ser.groupby(level=0).kurtosis()
a        -6.0
b   -1.963223
dtype: Float64
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesVariance of values within each group.

max

max(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute max of group values.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).max()
a     2
b     4
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby(by=["a"]).max()
   b  c
a
1  8  5
2  6  9
<BLANKLINE>
[2 rows x 2 columns]
Parameters
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count and non-NA values are present, the result will be NA.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesComputed max of values within each group.

mean

mean(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute mean of groups, excluding missing values.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'A': [1, 1, 2, 1, 2],
...                    'B': [np.nan, 2, 3, 4, 5],
...                    'C': [1, 2, 1, 1, 2]}, columns=['A', 'B', 'C'])

Groupby one column and return the mean of the remaining columns in each group.

>>> df.groupby('A').mean()
    B         C
A
1  3.0  1.333333
2  4.0       1.5
<BLANKLINE>
[2 rows x 2 columns]

Groupby two columns and return the mean of the remaining column.

>>> df.groupby(['A', 'B']).mean()
         C
A B
1 2.0  2.0
  4.0  1.0
2 3.0  1.0
  5.0  2.0
<BLANKLINE>
[4 rows x 1 columns]

Groupby one column and return the mean of only particular column in the group.

>>> df.groupby('A')['B'].mean()
A
1    3.0
2    4.0
Name: B, dtype: Float64
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesMean of groups.

median

median(
    numeric_only: bool = False, *, exact: bool = True
) -> bigframes.dataframe.DataFrame

Compute median of groups, excluding missing values.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).median()
a    7.0
b    3.0
dtype: Float64

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).median()
        a    b
dog    3.0  4.0
mouse  7.0  3.0
<BLANKLINE>
[2 rows x 2 columns]
Parameters
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

exactbool, default True

Calculate the exact median instead of an approximation.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesMedian of groups.

min

min(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute min of group values.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).min()
a     1
b     3
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                    index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby(by=["a"]).min()
   b  c
a
1  2  2
2  5  8
<BLANKLINE>
[2 rows x 2 columns]
Parameters
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count and non-NA values are present, the result will be NA.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesComputed min of values within each group.

nunique

nunique() -> bigframes.dataframe.DataFrame

Return DataFrame with counts of unique elements in each position.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> df = bpd.DataFrame({'id': ['spam', 'egg', 'egg', 'spam',
...                           'ham', 'ham'],
...                    'value1': [1, 5, 5, 2, 5, 5],
...                    'value2': list('abbaxy')})
>>> df.groupby('id').nunique()
      value1  value2
id
egg        1       1
ham        1       2
spam       2       1
<BLANKLINE>
[3 rows x 2 columns]
Returns
TypeDescription
bigframes.pandas.DataFrameNumber of unique values within a BigQuery DataFrame.

prod

prod(numeric_only: bool = False, min_count: int = 0)

Compute prod of group values. (DataFrameGroupBy functionality is not yet available.)

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).prod()
a     2.0
b    12.0
dtype: Float64
Parameters
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count and non-NA values are present, the result will be NA.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesComputed prod of values within each group.

quantile

quantile(
    q: typing.Union[float, typing.Sequence[float]] = 0.5, *, numeric_only: bool = False
) -> bigframes.dataframe.DataFrame

Return group values at the given quantile, a la numpy.percentile.

Examples:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([
...     ['a', 1], ['a', 2], ['a', 3],
...     ['b', 1], ['b', 3], ['b', 5]
... ], columns=['key', 'val'])
>>> df.groupby('key').quantile()
     val
key
a    2.0
b    3.0
<BLANKLINE>
[2 rows x 1 columns]
Parameters
NameDescription
qfloat or array-like, default 0.5 (50% quantile)

Value(s) between 0 and 1 providing the quantile(s) to compute.

numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesReturn type determined by caller of GroupBy object.

rolling

rolling(window: int, min_periods=None) -> bigframes.core.window.Window

Returns a rolling grouper, providing rolling functionality per group.

Examples:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'a', 'e']
>>> ser = bpd.Series([1, 0, -2, -1, 2], index=lst)
>>> ser.groupby(level=0).rolling(2).min()
index  index
a      a        <NA>
    a           0
    a          -2
    a          -2
e      e        <NA>
dtype: Int64
Parameter
NameDescription
min_periodsint, default None

Minimum number of observations in window required to have a value; otherwise, result is np.nan. For a window that is specified by an offset, min_periods will default to 1. For a window that is specified by an integer, min_periods will default to the size of the window.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesReturn a new grouper with our rolling appended.

shift

shift(periods=1) -> bigframes.series.Series

Shift each group by periods observations.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).shift(1)
a    <NA>
a       1
b    <NA>
b       3
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["tuna", "salmon", "catfish", "goldfish"])
>>> df.groupby("a").shift(1)
             b     c
tuna      <NA>  <NA>
salmon       2     3
catfish   <NA>  <NA>
goldfish     5     8
<BLANKLINE>
[4 rows x 2 columns]
Parameter
NameDescription
periodsint, default 1

Number of periods to shift.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesObject shifted within each group.

size

size() -> typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]

Compute group sizes.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b']
>>> ser = bpd.Series([1, 2, 3], index=lst)
>>> ser
a     1
a     2
b     3
dtype: Int64
>>> ser.groupby(level=0).size()
a    2
b    1
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 2, 3], [1, 5, 6], [7, 8, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["owl", "toucan", "eagle"])
>>> df
        a  b  c
owl     1  2  3
toucan  1  5  6
eagle   7  8  9
[3 rows x 3 columns]
>>> df.groupby("a").size()
a
1    2
7    1
dtype: Int64
Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesNumber of rows in each group as a Series if as_index is True or a DataFrame if as_index is False.

skew

skew(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Return unbiased skew within groups.

Normalized by N-1.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> ser = bpd.Series([390., 350., 357., np.nan, 22., 20., 30.],
...                  index=['Falcon', 'Falcon', 'Falcon', 'Falcon',
...                         'Parrot', 'Parrot', 'Parrot'],
...                  name="Max Speed")
>>> ser.groupby(level=0).skew()
Falcon    1.525174
Parrot    1.457863
Name: Max Speed, dtype: Float64
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesVariance of values within each group.

std

std(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Compute standard deviation of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).std()
a     3.21455
b     0.57735
dtype: Float64

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).std()
              a         b
dog         2.0  3.511885
mouse  2.217356       1.5
<BLANKLINE>
[2 rows x 2 columns]
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesStandard deviation of values within each group.

sum

sum(numeric_only: bool = False, *args) -> bigframes.dataframe.DataFrame

Compute sum of group values.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'b', 'b']
>>> ser = bpd.Series([1, 2, 3, 4], index=lst)
>>> ser.groupby(level=0).sum()
a     3
b     7
dtype: Int64

For DataFrameGroupBy:

>>> data = [[1, 8, 2], [1, 2, 5], [2, 5, 8], [2, 6, 9]]
>>> df = bpd.DataFrame(data, columns=["a", "b", "c"],
...                   index=["tiger", "leopard", "cheetah", "lion"])
>>> df.groupby("a").sum()
    b   c
a
1  10   7
2  11  17
<BLANKLINE>
[2 rows x 2 columns]
Parameters
NameDescription
numeric_onlybool, default False

Include only float, int, boolean columns.

min_countint, default 0

The required number of valid values to perform the operation. If fewer than min_count and non-NA values are present, the result will be NA.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesComputed sum of values within each group.

var

var(*, numeric_only: bool = False) -> bigframes.dataframe.DataFrame

Compute variance of groups, excluding missing values.

For multiple groupings, the result index will be a MultiIndex.

Examples:

For SeriesGroupBy:

>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> lst = ['a', 'a', 'a', 'b', 'b', 'b']
>>> ser = bpd.Series([7, 2, 8, 4, 3, 3], index=lst)
>>> ser.groupby(level=0).var()
a   10.333333
b    0.333333
dtype: Float64

For DataFrameGroupBy:

>>> data = {'a': [1, 3, 5, 7, 7, 8, 3], 'b': [1, 4, 8, 4, 4, 2, 1]}
>>> df = bpd.DataFrame(data, index=['dog', 'dog', 'dog',
...                    'mouse', 'mouse', 'mouse', 'mouse'])
>>> df.groupby(level=0).var()
              a          b
dog         4.0  12.333333
mouse  4.916667       2.25
<BLANKLINE>
[2 rows x 2 columns]
Parameter
NameDescription
numeric_onlybool, default False

Include only float, int or boolean data.

Returns
TypeDescription
bigframes.pandas.DataFrame or bigframes.pandas.SeriesVariance of values within each group.