`ineqpy.inequality`

Analysis of inequality.

This package provide an easy way to realize a quantitative analysis of grouped, also make easy work with stratified data, in this module you can find statistics and grouped indicators to this task.

Todo

Rethinking this module as Class.
https://en.wikipedia.org/wiki/Income_inequality_metrics

Module Contents

Functions

`concentration`(income[, weights, data, sort])	Calculate concentration's index.
`lorenz`(income[, weights, data])	Calculate Lorent's curve.
`gini`(income[, weights, data, sort])	Calculate Gini's index.
`atkinson`(→ float)	Calculate atkinson index.
`kakwani`(tax, income_pre_tax[, weights, data])	Calculate Kakwani's index.
`reynolds_smolensky`(income_pre_tax, income_post_tax[, ...])	Calculate Reynolds-Smolensky's index.
`theil`(income[, weights, data])	Calculate Theil's index.
`avg_tax_rate`(total_tax, total_base[, weights, data])	Calculate average tax rate.
`top_rest`(income[, weights, data, top_percentage])	Calculate the 10:90 Ratio.
`hoover`(income[, weights, data])	Calculate Hoover index.

ineqpy.inequality.concentration(income, weights=None, data=None, sort=True)[source]

Calculate concentration’s index.

This function calculate the concentration index, according to the notation used in [Jenkins1988] you can calculate the:

C_x = 2 / x · cov(x, F_x) if x = g(x) then C_x becomes C_y

when there are taxes:

y = g(x) = x - t(x)

Parameters:

incomearray-like
weightsarray-like
datapandas.DataFrame
sortbool: If true, will sort the values.

Returns:

concentrationarray-like

References

Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716

ineqpy.inequality.lorenz(income, weights=None, data=None)[source]

Calculate Lorent’s curve.

In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing grouped of the wealth distribution. This function compute the lorenz curve and returns a DF with two columns of axis x and y.

Parameters:

datapandas.DataFrame: A pandas.DataFrame that contains data.
incomestr or 1d-array, optional: Population or wights, if a DataFrame is passed then income should be a name of the column of DataFrame, else can pass a pandas.Series or array.
weightsstr or 1d-array: Income, monetary variable, if a DataFrame is passed then `y`is a name of the series on this DataFrame, however, you can pass a pd.Series or np.array.

Returns:

lorenzpandas.Dataframe: Lorenz distribution in a Dataframe with two columns, labeled x and y, that corresponds to plots axis.

References

Lorenz curve. (2017, February 11). In Wikipedia, The Free Encyclopedia. Retrieved 14:34, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Lorenz_curve&oldid=764853675

ineqpy.inequality.gini(income, weights=None, data=None, sort=True)[source]

Calculate Gini’s index.

The Gini coefficient (sometimes expressed as a Gini ratio or a normalized Gini index) is a measure of statistical dispersion intended to represent the income or wealth distribution of a nation’s residents, and is the most commonly used measure of grouped. It was developed by Corrado Gini. The Gini coefficient measures the grouped among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal grouped among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).

Parameters:

datapandas.DataFrame: DataFrame that contains the data.
incomestr or np.array, optional: Name of the monetary variable x in` df`
weightsstr or np.array, optional: Name of the series containing the weights x in` df`
sortedbool, optional: If the DataFrame is previously ordered by the variable x, it’s must pass True, but False by default.

Returns:

ginifloat: Gini Index Value.

Notes

The calculation is done following (discrete probability distribution): G = 1 - [∑_i^n f(y_i)·(S_{i-1} + S_i)] where: - y_i = Income - S_i = ∑_{j=1}^i y_i · f(y_i)

ineqpy.inequality.atkinson(income, weights=None, data=None, e=0.5) → float[source]

Calculate atkinson index.

More precisely labelled a family of income grouped measures, the theoretical range of Atkinson values is 0 to 1, with 0 being a state of equal distribution.

An intuitive interpretation of this index is possible: Atkinson values can be used to calculate the proportion of total income that would be required to achieve an equal level of social welfare as at present if incomes were perfectly distributed.

For example, an Atkinson index value of 0.20 suggests that we could achieve the same level of social welfare with only 1 – 0.20 = 80% of income. The theoretical range of Atkinson values is 0 to 1, with 0 being a state of equal distribution.

Parameters:

incomearray or str: If data is none income must be an 1D-array, when data is a pd.DataFrame, you must pass the name of income variable as string.
weightsarray or str, optional: If data is none weights must be an 1D-array, when data is a pd.DataFrame, you must pass the name of weights variable as string.
eint, optional: Epsilon parameter interpreted by atkinson index as grouped adversion, must be between 0 and 1.
datapd.DataFrame, optional: data is a pd.DataFrame that contains the variables.

Returns:

atkinsonfloat

ineqpy.inequality.kakwani(tax, income_pre_tax, weights=None, data=None)[source]

Calculate Kakwani’s index.

The Kakwani (1977) index of tax progressivity is defined as twice the area between the concentration curves for taxes and pre-tax income, or equivalently, the concentration index for t(x) minus the Gini index for x, i.e.

K = C(t) - G(x): = (2/t) cov [t(x), F(x)] - (2/x) cov [x, F(x)].

Parameters:

datapandas.DataFrame: This variable is a DataFrame that contains all data required in columns.
tax_variablearray-like or str: This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
income_pre_taxarray-like or str: This variable represent income of person, if pass array-like then data must be None, else you pass str-name column in data.
weightsarray-like or str: This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.

Returns:

kakwanifloat

References

Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716

ineqpy.inequality.reynolds_smolensky(income_pre_tax, income_post_tax, weights=None, data=None)[source]

Calculate Reynolds-Smolensky’s index.

The Reynolds-Smolensky (1977) index of the redistributive effect of taxes, which can also be interpreted as an index of progressivity (Lambert 1985), is defined as:

L = Gx - Gy: = [2/x]cov[x,F(x)] - [2/ybar] cov [y, F(y)].

Parameters:

datapandas.DataFrame: This variable is a DataFrame that contains all data required in it’s columns.
income_pre_taxarray-like or str: This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
income_post_taxarray-like or str: This variable represent income of person, if pass array-like then data must be None, else you pass str-name column in data.
weightsarray-like or str: This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.

Returns:

reynolds_smolenskyfloat

References

Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716

ineqpy.inequality.theil(income, weights=None, data=None)[source]

Calculate Theil’s index.

The Theil index is a statistic primarily used to measure economic grouped and other economic phenomena. It is a special case of the generalized entropy index. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, grouped, non-randomness, and compressibility. It was proposed by econometrician Henri Theil.

Parameters:

datapandas.DataFrame: This variable is a DataFrame that contains all data required in it’s columns.
incomearray-like or str: This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
weightsarray-like or str: This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.

Returns:

theilfloat

References

Theil index. (2016, December 17). In Wikipedia, The Free Encyclopedia. Retrieved 14:17, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Theil_index&oldid=755407818

ineqpy.inequality.avg_tax_rate(total_tax, total_base, weights=None, data=None)[source]

Calculate average tax rate.

This function compute the average tax rate given a base income and a total tax.

Parameters:

total_basestr or numpy.array
total_taxstr or numpy.array
datapd.DataFrame

Returns:

avg_tax_ratefloat or pd.Series: Is the ratio between mean the tax income and base of income.

ineqpy.inequality.top_rest(income, weights=None, data=None, top_percentage=10.0)[source]

Calculate the 10:90 Ratio.

Calculates the quotient between the number of contributions from the top 10% of contributors divided by the number contributions made by the other 90%. The ratio is 1 if the total contributions by the top contributors are equal to the cotnributions made by the rest; less than zero if the top 10% contributes less than the rest; and greater that 1 if the top 10% contributes more than the other ninety percent.

Parameters:

incomearray-like or str: This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
weightsarray-like or str: This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data. All-ones by default
datapandas.DataFrame: This variable is a DataFrame that contains all data required in it’s columns.
top_percentagefloat: The richest x percent to consider. (10 percent by default) It must be a number between 0 and 100

Returns:

ratiofloat

References

Participation Inequality in Wikis: A Temporal Analysis Using WikiChron. Serrano, Abel & Arroyo, Javier & Hassan, Samer. (2018). DOI: 10.1145/3233391.3233536.

ineqpy.inequality.hoover(income, weights=None, data=None)[source]

Calculate Hoover index.

The Hoover index, also known as the Robin Hood index or the Schutz index, is a measure of income metrics. It is equal to the portion of the total community income that would have to be redistributed (taken from the richer half of the population and given to the poorer half) for there to be income uniformity.

Formula:

H = 1/2 sum_i( |xi - mu| ) / sum_i(xi)

Parameters:

incomearray-like or str: This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
weightsarray-like or str: This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.
datapandas.DataFrame: This variable is a DataFrame that contains all data required in it’s columns.

Returns:

hooverfloat

References

Hoover index : https://en.wikipedia.org/wiki/Hoover_index

ineqpy.inequality

Todo

Module Contents

Functions

`ineqpy.inequality`