Pandas dataframe.sum()
DataFrame.sum() function in Pandas allows users to compute the sum of values along a specified axis. It can be used to sum values along either the index (rows) or columns, while also providing flexibility in handling missing (NaN) values. Example:
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
# Sum along columns (default: axis=0)
col_sum = df.sum()
# Sum along rows (axis=1)
row_sum = df.sum(axis=1)
print("Column-wise sum:\n", col_sum)
print("\nRow-wise sum:\n", row_sum)
Output
Column-wise sum: A 6 B 15 C 24 dtype: int64 Row-wise sum: 0 12 1 15 2 18 dtype: int64
Explanation: This code creates a DataFrame from a dictionary and calculates sums along both columns and rows. By default, summing along columns (axis=0) adds all values in each column separately. When summing along rows (axis=1), it adds values in each row. Missing values, if present, would be ignored.
Syntax
DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
Parameters:
- axis (index (0), columns (1)) specifies whether to sum along rows (index) or columns.
- skipna(bool, default True) If True, it excludes NaN/null values when computing the sum.
- level (int or level name, default None): If the axis is a MultiIndex (hierarchical), sum along a specific level, collapsing into a Series.
- numeric_only (bool, default None): If True, only numeric columns are considered. If None, all data types are considered, but non-numeric data is ignored.
- min_count (int, default 0) : The minimum number of valid values required to perform the sum operation. If fewer than min_count non-NA values are present, the result will be NaN.
Returns: A Series or scalar containing the summed values along the specified axis.
Examples of Using DataFrame.sum()
Example 1. Summing Values Across Columns (default behavior)
Let's use the Pandas sum() function to find the sum of all values over the index axis (rows) in a DataFrame. In this example, we'll exclude NaN values while calculating the sum. Dataset Link: nba.csv
import pandas as pd
df = pd.read_csv("nba.csv")
print(df.dtypes)
# Convert all columns to numeric, coercing errors to NaN
df_numeric = df.apply(pd.to_numeric, errors='coerce')
# Sum the numeric columns along the index (rows), skipping NaN values
df_sum = df_numeric.sum(axis=0, skipna=True)
print("\nFinding Sum over Index Axis:")
print(df_sum)
Output:

Explanation: sum() function adds up all the values in each column. If a column has missing values (NaN), they are ignored. If there are non-numeric columns (like text), they are converted to numbers before summing.
Example 2. Row wise summation as sum() with axis = 1
Now, let's calculate the sum of all values over the column axis (columns) using the sum() function. Again, we'll ensure that NaN values are excluded from the sum.
import pandas as pd
df = pd.read_csv("nba.csv")
print(df.dtypes)
# Select only numeric columns for summing
df_numeric = df.select_dtypes(include=['number'])
# Sum across rows (axis=1) and skip NaN values
df_row_sum = df_numeric.sum(axis=1, skipna=True)
print("\nRow-wise Summation:")
print(df_row_sum)
Output:

Explanation: Instead of adding up values in columns, this example sums up values in each row. First, it selects only numeric columns to avoid errors. Missing values are ignored during the summation.
Example 3. Summing with min_count Parameter
The min_count parameter ensures that the sum operation is performed only if at least a certain number of non-NaN values are present. Otherwise, the result will be NaN.
import pandas as pd
import numpy as np
data = {'A': [1, np.nan, 3, np.nan], 'B': [4, np.nan, np.nan, np.nan], 'C': [7, 8, 9, np.nan]}
df = pd.DataFrame(data)
# Sum columns but require at least 2 valid values
df_sum_min_count = df.sum(axis=0, min_count=2)
print("\nSum with min_count=2:")
print(df_sum_min_count)
Output
Sum with min_count=2: A 4.0 B NaN C 24.0 dtype: float64
Explanation: This example sets a rule i.e. only sum values in a column if at least two non-missing numbers are present. If a column has too many missing values, the sum result for that column is NaN instead of a number.