Salary Statistics Fields - Complete Explanation
27 Jan, 2025

Salary Statistics Fields - Complete Explanation

HR Drone Analytics Team

HR Drone Analytics Team

Data & Analytics, HR Drone

Salary Statistics Fields - Complete Explanation

Overview

This document provides a comprehensive explanation of all statistical fields returned by the Salary Statistics by Level query. Understanding these metrics will help you interpret salary data and make informed decisions about compensation, market analysis, and hiring strategies.

Statistics Fields Reference

Central Tendency Metrics

min_salary

Type: Integer

Description: The lowest salary value among all applicants at this level.

What it means:

  • The absolute minimum salary found in the dataset
  • Represents the floor of the salary range
  • May be an outlier if significantly lower than other values

When to use:

  • Understand the lower bound of salaries
  • Identify minimum acceptable compensation
  • Set salary floors for job postings

Example:

min_salary: $500

→ At least one applicant at this level has a salary of $500

Interpretation tips:

  • Compare with Q1 to identify if it's an outlier
  • If min is much lower than Q1, it may be an entry-level or part-time position
  • Use with max_salary to understand the full range

max_salary

Type: Integer

Description: The highest salary value among all applicants at this level.

What it means:

  • The absolute maximum salary found in the dataset
  • Represents the ceiling of the salary range
  • May be an outlier if significantly higher than other values

When to use:

  • Understand the upper bound of salaries
  • Identify maximum potential compensation
  • Set salary caps for budget planning

Example:

max_salary: $5000

→ At least one applicant at this level has a salary of $5000

Interpretation tips:

  • Compare with Q3 to identify if it's an outlier
  • If max is much higher than Q3, it may be a senior specialist or leadership role
  • Use with min_salary to understand the full range

avg_salary (Average Salary)

Type: Decimal (rounded to 2 places)

Description: The arithmetic mean of all salaries at this level. Calculated as the sum of all salaries divided by the number of applicants.

Formula: SUM(salary) / COUNT(*)

What it means:

  • The "typical" salary if all salaries were equal
  • Sensitive to outliers (very high or very low values)
  • Most commonly used measure of central tendency

When to use:

  • Quick overview of typical compensation
  • Budget planning and salary benchmarking
  • Comparing levels or groups

Example:

avg_salary: $1750.25

→ The average salary across all applicants at this level is $1750.25

Interpretation tips:

  • Compare with median: If avg > median, high earners pull the average up
  • Compare with median: If avg < median, low earners pull the average down
  • Outlier sensitivity: A few very high salaries can significantly increase the average

Limitations:

  • Can be misleading if there are extreme outliers
  • Doesn't show the distribution of salaries
  • May not represent the "typical" salary if distribution is skewed

median_salary (50th Percentile)

Type: Integer

Description: The middle value when all salaries are sorted from lowest to highest. Exactly half of the applicants earn less than this value, and half earn more.

What it means:

  • The "middle" salary in the dataset
  • Not affected by outliers (robust statistic)
  • Better representation of "typical" salary than average when data is skewed

When to use:

  • Primary measure for understanding typical compensation
  • Better than average when outliers are present
  • Setting competitive salary offers

Example:

median_salary: $1700

→ Half of applicants earn less than $1700, half earn more

Interpretation tips:

  • More reliable than average when data has outliers
  • Represents the 50th percentile - middle of the distribution
  • Compare with average: Large difference indicates skewed distribution
  • Best single number to represent "typical" salary

Why it's important:

  • Resistant to outliers (one very high salary won't affect it)
  • Represents the true middle of the data
  • Industry standard for salary benchmarking

Distribution Metrics

q1_salary (First Quartile / 25th Percentile)

Type: Integer

Description: The salary value below which 25% of applicants fall. Also known as the lower quartile.

What it means:

  • 25% of applicants earn less than this value
  • 75% of applicants earn more than this value
  • Marks the boundary between the bottom quarter and the rest

When to use:

  • Understand the lower end of the salary distribution
  • Identify entry-level or junior compensation
  • Set minimum salary expectations

Example:

q1_salary: $1200

→ 25% of applicants earn less than $1200, 75% earn more

Interpretation tips:

  • Lower boundary of the middle 50% (interquartile range)
  • Compare with median: Large gap indicates many low earners
  • Use with Q3 to understand the spread of the middle 50%

q3_salary (Third Quartile / 75th Percentile)

Type: Integer

Description: The salary value below which 75% of applicants fall. Also known as the upper quartile.

What it means:

  • 75% of applicants earn less than this value
  • 25% of applicants earn more than this value
  • Marks the boundary between the top quarter and the rest

When to use:

  • Understand the higher end of the salary distribution
  • Identify senior or experienced compensation
  • Set competitive salary targets

Example:

q3_salary: $2200

→ 75% of applicants earn less than $2200, 25% earn more

Interpretation tips:

  • Upper boundary of the middle 50% (interquartile range)
  • Compare with median: Large gap indicates many high earners
  • Use with Q1 to understand the spread of the middle 50%

salary_range

Type: Integer

Description: The difference between the maximum and minimum salaries. Calculated as max_salary - min_salary.

What it means:

  • The total spread of salaries from lowest to highest
  • Shows the variability in compensation
  • Larger ranges indicate more diversity in salary levels

When to use:

  • Understand salary variability within a level
  • Identify levels with wide compensation ranges
  • Assess market diversity

Example:

salary_range: $2000

→ The difference between highest and lowest salary is $2000

Interpretation tips:

  • Large range: High variability, many factors affect salary
  • Small range: More standardized compensation
  • Compare across levels: Some levels may have wider ranges than others
  • Use with quartiles: Range can be misleading if outliers exist

Limitations:

  • Sensitive to outliers (one extreme value affects the entire range)
  • Doesn't show where most salaries fall
  • Use interquartile range (Q3 - Q1) for a more robust measure

stddev_salary (Standard Deviation)

Type: Decimal (rounded to 2 places)

Description: A measure of how spread out the salaries are from the average. Indicates the typical distance of salaries from the mean.

Formula: Square root of the variance

What it means:

  • Low standard deviation: Salaries are clustered close to the average
  • High standard deviation: Salaries are spread out widely
  • Zero standard deviation: All salaries are identical (rare)

When to use:

  • Understand salary consistency within a level
  • Identify levels with high or low variability
  • Assess market standardization

Example:

stddev_salary: $450.25

→ Salaries typically vary by about $450 from the average

Interpretation tips:

  • Compare with average: If stddev ≈ 0.2 × average, moderate spread
  • Compare with average: If stddev > 0.5 × average, high variability
  • Compare with average: If stddev < 0.1 × average, low variability
  • NULL value: Occurs when there's only one data point (can't calculate spread)

Rule of thumb:

  • 68% of salaries fall within 1 standard deviation of the average
  • 95% of salaries fall within 2 standard deviations of the average
  • 99.7% of salaries fall within 3 standard deviations of the average

Example calculation:

If avg_salary = $2000 and stddev_salary = $400:

- 68% of salaries are between $1600 and $2400

- 95% of salaries are between $1200 and $2800

- 99.7% of salaries are between $800 and $3200

Understanding Relationships Between Statistics

Average vs Median

When Average > Median:

  • High earners pull the average up
  • Right-skewed distribution (tail on the right)
  • Indicates some very high salaries

When Average < Median:

  • Low earners pull the average down
  • Left-skewed distribution (tail on the left)
  • Indicates some very low salaries

When Average ≈ Median:

  • Symmetrical distribution
  • Salaries are evenly distributed
  • Average and median both represent typical salary well

Quartiles and Distribution Shape

Interquartile Range (IQR) = Q3 - Q1

Small IQR:

  • Most salaries are clustered in a narrow range
  • More standardized compensation
  • Less variability

Large IQR:

  • Salaries are spread across a wide range
  • More diverse compensation
  • Higher variability

Symmetric Distribution:

  • Q1 and Q3 are equidistant from median
  • (Median - Q1) ≈ (Q3 - Median)

Skewed Distribution:

  • Q1 and Q3 are not equidistant from median
  • (Median - Q1) ≠ (Q3 - Median)

Range vs Standard Deviation

Both measure spread, but differently:

  • Range: Total spread (max - min), sensitive to outliers
  • Standard Deviation: Typical spread from average, less sensitive to outliers

Use Range when:

  • You need to know the absolute limits
  • Outliers are important to understand

Use Standard Deviation when:

  • You want to understand typical variability
  • You want a measure less affected by outliers

Practical Use Cases

Use Case 1: Setting Competitive Salary Ranges

Goal: Determine appropriate salary range for a job posting

Recommended approach:

  1. Look at median_salary as the target midpoint
  2. Use q1_salary as the minimum (25th percentile)
  3. Use q3_salary as the maximum (75th percentile)
  4. Consider avg_salary if distribution is symmetric

Example:

Level: MIDDLE

median_salary: $1700

q1_salary: $1200

q3_salary: $2200

Recommended range: $1200 - $2200

Target offer: ~$1700 (median)

Use Case 2: Identifying Market Anomalies

Goal: Find levels with unusual salary patterns

Recommended approach:

  1. Compare avg_salary vs median_salary
  2. Check stddev_salary for variability
  3. Examine salary_range for outliers

Red flags:

  • Large difference between average and median (> 20%)
  • Very high standard deviation (> 50% of average)
  • Extremely wide salary range
  • Small sample size (< 10)

Use Case 3: Budget Planning

Goal: Estimate total compensation costs

Recommended approach:

  1. Use avg_salary for total cost estimation
  2. Use median_salary for typical cost per hire
  3. Use q3_salary for worst-case scenario planning
  4. Multiply by expected number of hires

Example:

Planning to hire 5 MIDDLE level developers:

avg_salary: $1750.25

Expected total: 5 × $1750.25 = $8,751.25

Worst case (using Q3):

q3_salary: $2200

Worst case total: 5 × $2200 = $11,000

Use Case 4: Salary Benchmarking

Goal: Compare compensation across levels or groups

Recommended approach:

  1. Compare median_salary across levels (most reliable)
  2. Compare avg_salary as secondary check
  3. Compare q1_salary and q3_salary for distribution
  4. Consider stddev_salary for consistency

Example comparison:

JUNIOR:

  median: $800

  avg: $850

  q1: $600, q3: $1000

MIDDLE:

  median: $1700

  avg: $1750

  q1: $1200, q3: $2200

Analysis:

- MIDDLE earns ~2x JUNIOR (median comparison)

- Both have similar distribution shape (symmetric)

- MIDDLE has wider range (more variability)

Common Misconceptions

❌ "Average is always the best measure"

Reality: Median is often more representative, especially with outliers.

When to use average:

  • Symmetric distribution
  • No outliers
  • Need for mathematical operations (sum, etc.)

When to use median:

  • Skewed distribution
  • Presence of outliers
  • Need for typical value

❌ "Range shows where most salaries fall"

Reality: Range shows extremes, not typical values. Use quartiles for typical range.

Better approach:

  • Use Q1 to Q3 (interquartile range) for where most salaries fall
  • Use min/max only to understand absolute limits

❌ "High standard deviation is always bad"

Reality: High variability can indicate:

  • Diverse skill levels within a level
  • Different specializations
  • Market flexibility
  • Growth opportunities

Context matters: High variability might be expected and acceptable.

❌ "More data always means better statistics"

Reality: More data helps, but data quality and representativeness matter more.

Consider:

  • Is the sample representative?
  • Are there selection biases?
  • Is the data current and relevant?

Statistical Significance

Sample Size Guidelines

Sample SizeReliabilityRecommendation
< 10LowUse with extreme caution, may not be representative
10-30ModerateUse with caution, consider confidence intervals
30-100GoodReliable for most purposes
> 100ExcellentHighly reliable, can perform detailed analysis

Confidence Intervals (Conceptual)

While not calculated in the query, understanding confidence helps:

  • Larger samples: Narrower confidence intervals, more precise estimates
  • Smaller samples: Wider confidence intervals, less precise estimates
  • Rule of thumb: ±10% margin of error for samples of 30-100

Visualizing the Statistics

Box Plot Representation

The statistics can be visualized as a box plot:


min_salary ──┐

            │

q1_salary ──┤ ┌─────┐

            │ │     │

median ─────┤─┤  ■  ├── q3_salary

            │ │     │

            │ └─────┘

max_salary ─┘

Box plot elements:

  • Whiskers: min to max (or Q1-1.5×IQR to Q3+1.5×IQR for outliers)
  • Box: Q1 to Q3 (interquartile range)
  • Line in box: Median
  • Average: Often shown as a dot or X

Distribution Shape Indicators

Symmetric (Normal-like):

  • Average ≈ Median
  • Q1 and Q3 equidistant from median
  • Standard deviation moderate

Right-skewed (High outliers):

  • Average > Median
  • Q3 further from median than Q1
  • Standard deviation high

Left-skewed (Low outliers):

  • Average < Median
  • Q1 further from median than Q3
  • Standard deviation high

Summary Table

FieldTypeBest ForLimitations
min_salaryIntegerLower boundMay be outlier
max_salaryIntegerUpper boundMay be outlier
avg_salaryDecimalQuick overviewSensitive to outliers
median_salaryIntegerTypical valueLess intuitive
q1_salaryIntegerLower quartileNone
q3_salaryIntegerUpper quartileNone
salary_rangeIntegerTotal spreadSensitive to outliers
stddev_salaryDecimalVariabilityRequires > 1 data point

Quick Reference Guide

For Setting Salary Ranges

  1. Target: Use median_salary
  2. Minimum: Use q1_salary (25th percentile)
  3. Maximum: Use q3_salary (75th percentile)

For Understanding Typical Salary

  1. Primary: Use median_salary (most reliable)
  2. Secondary: Use avg_salary (if symmetric distribution)
  3. Check: Compare both to assess distribution shape

For Budget Planning

  1. Expected: Use avg_salary × number of hires
  2. Conservative: Use q3_salary × number of hires
  3. Optimistic: Use q1_salary × number of hires

For Market Analysis

  1. Central tendency: Compare median_salary across groups
  2. Variability: Compare stddev_salary across groups
  3. Distribution: Compare quartiles (Q1, median, Q3) across groups

Additional Resources

  • Query Documentation: See salary-statistics-by-level-guide.md
  • Query File: See salary-statistics-by-level.sql
  • Statistical Concepts:
    • Percentiles and quartiles
    • Measures of central tendency
    • Measures of variability
    • Distribution shapes

Questions to Ask When Interpreting Results

  1. Distribution: Is the distribution symmetric or skewed? (Compare avg vs median)
  2. Variability: How much do salaries vary? (Check stddev and range)
  3. Outliers: Are min/max values outliers? (Compare with quartiles)
  4. Representativeness: Is this sample representative of the population?
  5. Context: How do these compare to industry standards?
  6. Trends: How do these compare across different levels or groups?

By understanding these statistics and asking the right questions, you can make informed decisions about compensation, market analysis, and hiring strategies.


This article was published by the HR Drone platform to contribute to the development of data-driven HR practices, salary analytics culture, and informed compensation decision-making.


Share this article on your social media platform!