ineqpy.api
API’s module.
Extend pandas.DataFrames with the main functions from statistics and inequality modules.
Module Contents
Classes
Convey. |
|
Survey it's a data structure that handles survey data. |
- class ineqpy.api.Convey(data=None, index=None, columns=None, weights=None, group=None, **kw)[source]
Convey.
- class ineqpy.api.Survey(data=None, index=None, columns=None, weights=None, group=None, **kw)[source]
Survey it’s a data structure that handles survey data.
- Attributes:
- dfpandas.DataFrame
- weightsstr
- groupstr
Methods
atkinson(income=None, weights=None, e=0.5)
Calculate Atkinson’s index.
avg_tax_rate(total_tax=None, total_base=None, weights=None)
Calculate average tax rate.
c_moment(variable=None, weights=None, order=2, param=None, ddof=0)
Calculate central momment.
coef_variation(variable=None, weights=None)
Calculate coefficient of variation.
concentration(income=None, weights=None, sort=True)
Calculate concentration’s index.
density(variable=None, weights=None, groups=None)
Calculate density.
gini(income=None, weights=None, sort=True)
Calculate Gini’s index.
kakwani(tax=None, income_pre_tax=None, weights=None)
Calculate Kakwani’s index.
kurt(variable=None, weights=None)
Calculate Kurtosis.
lorenz(income=None, weights=None)
Calculate Lorenz curve.
mean(variable=None, weights=None)
Calculate mean.
percentile(variable=None, weights=None, p=50, interpolate=”lower”)
Calculate percentile.
reynolds_smolensky(income_pre_tax=None, income_post_tax=None, weights=None)
Calculate Reynolds-Smolensky’s index.
skew(variable=None, weights=None)
Calculate Skew.
std_moment(variable=None, weights=None, param=None, order=3, ddof=0)
Calculate standard deviation.
theil(income=None, weights=None)
Calculate Theil’s index.
var(variable=None, weights=None, ddof=0)
Calculate variance.
- c_moment(variable, weights=None, order=2, param=None, ddof=0)[source]
Calculate central momment.
Calculate the central moment of x with respect to param of order n, given the weights w.
- Parameters:
- variable1d-array
Variable
- weights1d-array
Weights
- orderint, optional
Moment order, 2 by default (variance)
- paramint or array, optional
Parameter for which the moment is calculated, the default is None, implies use the mean.
- ddofint, optional
Degree of freedom, zero by default.
- Returns:
- central_momentfloat
Notes
The cmoment of order 1 is 0
The cmoment of order 2 is the variance.
- percentile(variable, weights=None, p=50, interpolate='lower')[source]
Calculate the value of a quantile given a variable and his weights.
- Parameters:
- datapd.DataFrame, optional
pd.DataFrame that contains all variables needed.
- variablestr or array
- weightsstr or array
- qfloat
Quantile level, if pass 0.5 means median.
- interpolatebool
- Returns:
- percentilefloat or pd.Series
- std_moment(variable, weights=None, param=None, order=3, ddof=0)[source]
Calculate the standardized moment.
Calculate the standardized moment of order c for the variable` x` with respect to c.
- Parameters:
- datapd.DataFrame, optional
pd.DataFrame that contains all variables needed.
- variable1d-array
Random Variable
- weights1d-array, optional
Weights or probability
- orderint, optional
Order of Moment, three by default
- paramint or float or array, optional
Central trend, default is the mean.
- ddofint, optional
Degree of freedom.
- Returns:
- std_momentfloat
Returns the standardized n order moment.
References
https://en.wikipedia.org/wiki/Moment_(mathematics)#Significance_ of_the_moments
- mean(variable, weights=None)[source]
Calculate the mean of variable given weights.
- Parameters:
- variablearray-like or str
Variable on which the mean is estimated.
- weightsarray-like or str
Weights of the x variable.
- datapandas.DataFrame
Is possible pass a DataFrame with variable and weights, then you must pass as variable and weights the column name stored in data.
- Returns:
- meanarray-like or float
- density(variable, weights=None, groups=None)[source]
Calculate density in percentage.
This make division of variable inferring width in groups as max - min.
- Parameters:
- datapd.DataFrame, optional
pandas.DataFrame that contains all variables needed.
- variablearray-like, optional
- weightsarray-like, optional
- groupsarray-like, optional
- Returns:
- densityarray-like
References
Histogram. (2017, May 9). In Wikipedia, The Free Encyclopedia. Retrieved: https://en.wikipedia.org/w/index.php?title=Histogram
- var(variable, weights=None, ddof=0)[source]
Calculate the population variance of variable given weights.
- Parameters:
- datapd.DataFrame, optional
pd.DataFrame that contains all variables needed.
- variable1d-array or pd.Series or pd.DataFrame
Variable on which the quasivariation is estimated
- weights1d-array or pd.Series or pd.DataFrame
Weights of the variable.
- Returns:
- variance1d-array or pd.Series or float
Estimation of quasivariance of variable
Notes
If stratificated sample must pass with groupby each strata.
References
Moment (mathematics). (2017, May 6). In Wikipedia, The Free Encyclopedia. Retrieved 14:40, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Moment_(mathematics)
- coef_variation(variable, weights=None)[source]
Calculate the coefficient of variation.
The coefficient of variation is the square root of the variance of the incomes divided by the mean income. It has the advantages of being mathematically tractable and is subgroup decomposable, but is not bounded from above.
- Parameters:
- datapandas.DataFrame
- variablearray-like or str
- weightsarray-like or str
- Returns:
- coefficient_variationfloat
References
Coefficient of variation. (2017, May 5). In Wikipedia, The Free Encyclopedia. Retrieved 15:03, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Coefficient_of_variation
- kurt(variable, weights=None)[source]
Calculate the asymmetry coefficient.
- Parameters:
- variable1d-array
- w1d-array
- Returns:
- kurtfloat
Kurtosis coefficient.
Notes
It is an alias of the standardized fourth-order moment.
References
Moment (mathematics). (2017, May 6). In Wikipedia, The Free Encyclopedia. Retrieved 14:40, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Moment_(mathematics)
- skew(variable, weights=None)[source]
Return the asymmetry coefficient of a sample.
- Parameters:
- datapandas.DataFrame
- variablearray-like, str
- weightsarray-like, str
- Returns:
- skewfloat
Notes
It is an alias of the standardized third-order moment.
References
Moment (mathematics). (2017, May 6). In Wikipedia, The Free Encyclopedia. Retrieved 14:40, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Moment_(mathematics)& oldid=778996402
- concentration(income, weights=None, sort=True)[source]
Calculate concentration index.
This function calculate the concentration index, according to the notation used in [Jenkins1988] you can calculate the: C_x = 2 / x · cov(x, F_x) if x = g(x) then C_x becomes C_y when there are taxes:
y = g(x) = x - t(x)
- Parameters:
- incomearray-like
- weightsarray-like
- datapandas.DataFrame
- sortbool
- Returns:
- concentrationarray-like
References
Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716
- lorenz(income, weights=None)[source]
Calculate lorenz curve.
In economics, the Lorenz curve is a graphical representation of the distribution of income or of wealth. It was developed by Max O. Lorenz in 1905 for representing grouped of the wealth distribution. This function compute the lorenz curve and returns a DF with two columns of axis x and y.
- Parameters:
- datapandas.DataFrame
A pandas.DataFrame that contains data.
- incomestr or 1d-array, optional
Population or wights, if a DataFrame is passed then income should be a name of the column of DataFrame, else can pass a pandas.Series or array.
- weightsstr or 1d-array
Income, monetary variable, if a DataFrame is passed then `y`is a name of the series on this DataFrame, however, you can pass a pd.Series or np.array.
- Returns:
- lorenzpandas.Dataframe
Lorenz distribution in a Dataframe with two columns, labeled x and y, that corresponds to plots axis.
References
Lorenz curve. (2017, February 11). In Wikipedia, The Free Encyclopedia. Retrieved 14:34, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Lorenz_curve&oldid=764853675
- gini(income, weights=None, sort=True)[source]
Calculate Gini’s index.
The Gini coefficient (sometimes expressed as a Gini ratio or a normalized Gini index) is a measure of statistical dispersion intended to represent the income or wealth distribution of a nation’s residents, and is the most commonly used measure of grouped. It was developed by Corrado Gini.
The Gini coefficient measures the grouped among values of a frequency distribution (for example, levels of income). A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal grouped among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).
- Parameters:
- datapandas.DataFrame
DataFrame that contains the data.
- incomestr or np.array, optional
Name of the monetary variable x in` df`
- weightsstr or np.array, optional
Name of the series containing the weights x in` df`
- sortedbool, optional
If the DataFrame is previously ordered by the variable x, it’s must pass True, but False by default.
- Returns:
- ginifloat
Gini Index Value.
Notes
The calculation is done following (discrete probability distribution): G = 1 - [∑_i^n f(y_i)·(S_{i-1} + S_i)] where: - y_i = Income - S_i = ∑_{j=1}^i y_i · f(y_i)
- atkinson(income, weights=None, e=0.5)[source]
Calculate Atkinson index.
More precisely labelled a family of income grouped measures, the theoretical range of Atkinson values is 0 to 1, with 0 being a state of equal distribution. An intuitive interpretation of this index is possible: Atkinson values can be used to calculate the proportion of total income that would be required to achieve an equal level of social welfare as at present if incomes were perfectly distributed.
For example, an Atkinson index value of 0.20 suggests that we could achieve the same level of social welfare with only 1 – 0.20 = 80% of income. The theoretical range of Atkinson values is 0 to 1, with 0 being a state of equal distribution.
- Parameters:
- incomearray or str
If data is none income must be an 1D-array, when data is a pd.DataFrame, you must pass the name of income variable as string.
- weightsarray or str, optional
If data is none weights must be an 1D-array, when data is a pd.DataFrame, you must pass the name of weights variable as string.
- eint, optional
Epsilon parameter interpreted by atkinson index as grouped adversion, must be a number between 0 to 1.
- datapd.DataFrame, optional
data is a pd.DataFrame that contains the variables.
- Returns:
- atkinsonfloat
- kakwani(tax, income_pre_tax, weights=None)[source]
Calculate kakwani’s index.
The Kakwani (1977) index of tax progressivity is defined as twice the area between the concentration curves for taxes and pre-tax income, or equivalently, the concentration index for t(x) minus the Gini index for x, i.e.
- K = C(t) - G(x)
= (2/t) cov [t(x), F(x)] - (2/x) cov [x, F(x)].
- Parameters:
- datapandas.DataFrame
This variable is a DataFrame that contains all data required in columns.
- tax_variablearray-like or str
This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
- income_pre_taxarray-like or str
This variable represent income of person, if pass array-like then data must be None, else you pass str-name column in data.
- weightsarray-like or str
This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.
- Returns:
- kakwanifloat
References
Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716
- reynolds_smolensky(income_pre_tax, income_post_tax, weights=None)[source]
Calculate Reynolds-Smolensky’s index.
The Reynolds-Smolensky (1977) index of the redistributive effect of taxes, which can also be interpreted as an index of progressivity (Lambert 1985), is defined as:
- L = Gx - Gy
= [2/x]cov[x,F(x)] - [2/ybar] cov [y, F(y)].
- Parameters:
- datapandas.DataFrame
This variable is a DataFrame that contains all data required in it’s columns.
- income_pre_taxarray-like or str
This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
- income_post_taxarray-like or str
This variable represent income of person, if pass array-like then data must be None, else you pass str-name column in data.
- weightsarray-like or str
This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.
- Returns:
- reynolds_smolenskyfloat
References
Jenkins, S. (1988). Calculating income distribution indices from micro-data. National Tax Journal. http://doi.org/10.2307/41788716
- theil(income, weights=None)[source]
Calculate theil index.
The Theil index is a statistic primarily used to measure economic grouped and other economic phenomena. It is a special case of the generalized entropy index. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, grouped, non-randomness, and compressibility. It was proposed by econometrician Henri Theil.
- Parameters:
- datapandas.DataFrame
This variable is a DataFrame that contains all data required in it’s columns.
- incomearray-like or str
This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
- weightsarray-like or str
This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data.
- Returns:
- theilfloat
References
Theil index. (2016, December 17). In Wikipedia, The Free Encyclopedia. Retrieved 14:17, May 15, 2017, from https://en.wikipedia.org/w/index.php?title=Theil_index&oldid=755407818
- avg_tax_rate(total_tax, total_base, weights=None)[source]
Compute the average tax rate given a base income and a total tax.
- Parameters:
- total_basestr or numpy.array
- total_taxstr or numpy.array
- datapd.DataFrame
- Returns:
- avg_tax_ratefloat or pd.Series
Is the ratio between mean the tax income and base of income.
- top_rest(income, weights=None, data=None, top_percentage=10)[source]
Calculate the 10:90 Ratio.
Calculates the quotient between the number of contributions from the top 10% of contributors divided by the number contributions made by the other 90%. The ratio is 1 if the total contributions by the top contributors are equal to the cotnributions made by the rest; less than zero if the top 10% contributes less than the rest; and greater that 1 if the top 10% contributes more than the other ninety percent.
- Parameters:
- incomearray-like or str
This variable represent tax payment of person, if pass array-like then data must be None, else you pass str-name column in data.
- weightsarray-like or str
This variable represent weights of each person, if pass array-like then data must be None, else you pass str-name column in data. All-ones by default
- datapandas.DataFrame
This variable is a DataFrame that contains all data required in it’s columns.
- top_percentagefloat
The richest x percent to consider. (10 percent by default) It must be a number between 0 and 100
- Returns:
- ratiofloat
References
Participation Inequality in Wikis: A Temporal Analysis Using WikiChron. Serrano, Abel & Arroyo, Javier & Hassan, Samer. (2018). DOI: 10.1145/3233391.3233536.