MetidaStats
Metida descriptive statistics - provide tables with categirized descriptive statistics from tabular data.
*This program comes with absolutely no warranty. No liability is accepted for any loss and risk to public health resulting from use of this software.
Contents
Example
using MetidaStats, CSV, DataFrames;
ds = CSV.File(joinpath(dirname(pathof(MetidaStats)), "..", "test", "csv", "ds.csv")) |> DataFrame
For DataFrame ds
:
ds[1:5, :]
5×7 DataFrame
Row | col | row | s1 | Variable | var1 | var2 | var3 |
---|---|---|---|---|---|---|---|
String1 | String1 | String1 | String1 | Float64 | Float64? | Float64? | |
1 | a | c | e | g | 65.0591 | -41.3106 | -41.3106 |
2 | a | c | e | g | 68.825 | missing | missing |
3 | a | c | e | g | 21.3784 | NaN | 16.2079 |
4 | a | c | e | g | 52.0018 | -0.193488 | -0.193488 |
5 | a | c | e | g | 68.6295 | 5.44529 | missing |
Import:
di = MetidaStats.dataimport(ds, vars = [:var1, :var2], sort = [:col, :row])
DataSet: observations
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var1, :row => String1("d"), :col => String1("a")); Length: 44
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var2, :row => String1("d"), :col => String1("a")); Length: 44
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var1, :row => String1("c"), :col => String1("b")); Length: 20
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var2, :row => String1("c"), :col => String1("b")); Length: 20
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var1, :row => String1("c"), :col => String1("a")); Length: 24
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var2, :row => String1("c"), :col => String1("a")); Length: 24
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var1, :row => String1("d"), :col => String1("b")); Length: 83
Var: Variable; ID: Dict{Symbol, Any}(:Variable => :var2, :row => String1("d"), :col => String1("b")); Length: 83
Statistics:
des = MetidaStats.descriptives(di; skipmissing = true, skipnonpositive = true, stats = MetidaStats.STATLIST)
---------- --------- --------- --------- --------- ---------- --------- -------
Variable row col n posn mean var b ⋯
Symbol String1 String1 Float64 Float64 Float64 Float64 Floa ⋯
---------- --------- --------- --------- --------- ---------- --------- -------
var1 d a 44.0 44.0 58.0626 726.402 709. ⋯
var2 d a 44.0 24.0 1.88004 838.634 819. ⋯
var1 c b 20.0 20.0 51.8411 640.079 608. ⋯
var2 c b 20.0 9.0 0.435363 758.656 720. ⋯
var1 c a 24.0 24.0 51.8434 941.195 901. ⋯
var2 c a 22.0 10.0 -3.24275 714.676 682. ⋯
var1 d b 83.0 83.0 47.2578 737.991 72 ⋯
var2 d b 83.0 39.0 -3.2516 830.511 820. ⋯
---------- --------- --------- --------- --------- ---------- --------- -------
25 columns omitted
Make DataFrame
df = DataFrame(des)
8×32 DataFrame
Row | Variable | row | col | n | posn | mean | var | bvar | logmean | geom | logvar | sd | se | cv | geocv | lci_0.95 | uci_0.95 | lmeanci_0.95 | umeanci_0.95 | median | min | max | range | q1 | q3 | iqr | kurt | skew | harmmean | ses | sek | sum |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Symbol | String1 | String1 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | var1 | d | a | 44.0 | 44.0 | 58.0626 | 726.402 | 709.893 | 3.8803 | 48.4385 | 0.541359 | 26.9518 | 4.06314 | 46.4186 | 84.7549 | 3.70907 | 112.416 | 49.8685 | 66.2567 | 55.7519 | 3.43629 | 99.1849 | 95.7486 | 40.6652 | 82.3774 | 41.7122 | -0.83033 | -0.246545 | 32.177 | 0.357484 | 0.701676 | 2554.76 |
2 | var2 | d | a | 44.0 | 24.0 | 1.88004 | 838.634 | 819.574 | 2.75783 | 15.7656 | 1.4262 | 28.9592 | 4.36576 | 1540.35 | 177.845 | -56.5217 | 60.2818 | -6.92436 | 10.6844 | 2.4562 | -49.8684 | 46.3191 | 96.1875 | -25.086 | 28.2933 | 53.3793 | -1.20617 | -0.0349994 | 36.1985 | 0.357484 | 0.701676 | 82.7216 |
3 | var1 | c | b | 20.0 | 20.0 | 51.8411 | 640.079 | 608.075 | 3.81381 | 45.3228 | 0.313529 | 25.2998 | 5.65721 | 48.8026 | 60.6832 | -1.11199 | 104.794 | 40.0004 | 63.6817 | 52.7755 | 18.2958 | 99.6525 | 81.3567 | 33.1435 | 67.2546 | 34.1111 | -1.02057 | 0.210728 | 38.7909 | 0.512103 | 0.992384 | 1036.82 |
4 | var2 | c | b | 20.0 | 9.0 | 0.435363 | 758.656 | 720.723 | 3.11501 | 22.5336 | 0.383102 | 27.5437 | 6.15896 | 6326.61 | 68.3248 | -57.2143 | 58.085 | -12.4555 | 13.3262 | -2.65636 | -48.1143 | 42.838 | 90.9522 | -18.9878 | 26.9292 | 45.917 | -1.04576 | -0.0518575 | -22.2214 | 0.512103 | 0.992384 | 8.70725 |
5 | var1 | c | a | 24.0 | 24.0 | 51.8434 | 941.195 | 901.979 | 3.68274 | 39.7551 | 0.705449 | 30.6789 | 6.26231 | 59.1761 | 101.23 | -11.6208 | 115.308 | 38.8888 | 64.798 | 64.3451 | 8.37484 | 94.7447 | 86.3698 | 19.9405 | 71.4649 | 51.5244 | -1.45164 | -0.165553 | 27.4161 | 0.472261 | 0.917777 | 1244.24 |
6 | var2 | c | a | 22.0 | 10.0 | -3.24275 | 714.676 | 682.191 | 2.6311 | 13.889 | 0.757478 | 26.7334 | 5.69959 | 824.405 | 106.437 | -58.838 | 52.3525 | -15.0957 | 8.61019 | -2.15984 | -48.7115 | 48.8994 | 97.611 | -16.5256 | 13.299 | 29.8246 | -0.444083 | -0.0339916 | -4.37031 | 0.490962 | 0.95278 | -71.3406 |
7 | var1 | d | b | 83.0 | 83.0 | 47.2578 | 737.991 | 729.1 | 3.5817 | 35.9346 | 0.853743 | 27.166 | 2.98185 | 57.4847 | 116.121 | -6.78402 | 101.3 | 41.3259 | 53.1896 | 46.2615 | 0.482996 | 99.8001 | 99.3171 | 24.7421 | 66.9678 | 42.2257 | -1.02983 | 0.131148 | 15.3943 | 0.264174 | 0.522613 | 3922.4 |
8 | var2 | d | b | 83.0 | 39.0 | -3.2516 | 830.511 | 820.505 | 2.90466 | 18.259 | 0.79552 | 28.8186 | 3.16325 | 886.289 | 110.254 | -60.581 | 54.0778 | -9.54431 | 3.04111 | -4.06996 | -47.4386 | 48.0944 | 95.533 | -30.2718 | 19.076 | 49.3478 | -1.30068 | 0.0524108 | 228.731 | 0.264174 | 0.522613 | -269.883 |
Reference
Textbooks:
https://towardsdatascience.com/5-free-books-to-learn-statistics-for-data-science-768d27b8215
Statistics for Julia:
https://statisticswithjulia.org/