Data Statistics in Calc

From LibreOffice Help
Jump to: navigation, search

Use the data statistics in Calc to perform complex data analysis

To work on a complex statistical or engineering analysis, you can save steps and time by using Calc Data Statistics. You provide the data and parameters for each analysis, and the set of tools uses the appropriate statistical or engineering functions to calculate and display the results in an output table.

Sampling

Create a table with data sampled from another table.

Choose Data - Statistics - Sampling

Sampling allows you to pick data from a source table to fill a target table. The sampling can be random or in a periodic basis.

Note.png Sampling is done row-wise. That means, the sampled data will pick the whole line of the source table and copy into a line of the target table.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Sampling Method

Random: Picks exactly Sample Size lines of the source table in a random way.

Sample size: Number of lines sampled from the source table.

Periodic: Picks lines in a pace defined by Period.

Period: the number of lines to skip periodically when sampling.

Example

The following data will be used as example of source data table for sampling:

A B C
1 11 21 31
2 12 22 32
3 13 23 33
4 14 24 34
5 15 25 35
6 16 26 36
7 17 27 37
8 18 28 38
9 19 29 39

Sampling with a period of 2 will result in the following table:

12 22 32
14 24 34
16 26 36
18 28 38

Descriptive Statistics

Fill a table in the spreadsheet with the main statistical properties of the data set.

Choose Data - Statistics - Descriptive Statistics

The Descriptive Statistics analysis tool generates a report of univariate statistics for data in the input range, providing information about the central tendency and variability of your data.

Note.png For more information on descriptive statistics, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following data will be used as example

A B C
1 Maths Physics Biology
2 47 67 33
3 36 68 42
4 40 65 44
5 39 64 60
6 38 43
7 47 84 62
8 29 80 51
9 27 49 40
10 57 49 12
11 56 33 60
12 57
13 26

The following table displays the results of the descriptive statistics of the sample data above.

Column 1 Column 2 Column 3
Mean 41.9090909091 59.7 44.7
Standard Error 3.5610380138 5.3583786934 4.7680650629
Mode 47 49 60
Median 40 64.5 43.5
Variance 139.4909090909 287.1222222222 227.3444444444
Standard Deviation 11.8106269559 16.944681237 15.0779456308
Kurtosis -1.4621677981 -0.9415988746 1.418052719
Skewness 0.0152409533 -0.2226426904 -0.9766803373
Range 31 51 50
Minimum 26 33 12
Maximum 57 84 62
Sum 461 597 447
Count 11 10 10

Analysis of Variance (ANOVA)

Produces the analysis of variance (ANOVA) of a given data set

Choose Data - Statistics - Analysis of Variance (ANOVA)

ANOVA is the acronym for ANalysis Of VAriance. This tool produces the analysis of variance of a given data set

Note.png For more information on ANOVA, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Type

Select if the analysis is for a single factor or for two factor ANOVA.

Parameters

Alpha: the level of significance of the test.

Rows per sample: Define how many rows a sample has.

Example

The following data will be used as example

A B C
1 Maths Physics Biology
2 47 67 33
3 36 68 42
4 40 65 44
5 39 64 60
6 38 43
7 47 84 62
8 29 80 51
9 27 49 40
10 57 49 12
11 56 33 60
12 57
13 26

The following table displays the results of the analysis of variance (ANOVA) of the sample data above.

ANOVA - Single Factor
Alpha 0.05
Groups Count Sum Mean Variance
Column 1 11 461 41.9090909091 139.4909090909
Column 2 10 597 59.7 287.1222222222
Column 3 10 447 44.7 227.3444444444
Source of Variation SS df MS F P-value
Between Groups 1876.5683284457 2 938.2841642229 4.3604117704 0.0224614952
Within Groups 6025.1090909091 28 215.1824675325
Total 7901.6774193548 30

Correlation

Calculates the correlation of two sets of numeric data.

Choose Data - Statistics - Correlation

The correlation coefficient (a value between -1 and +1) means how strongly two variables are related to each other. You can use the CORREL function or the Data Statistics to find the correlation coefficient between two variables.

A correlation coefficient of +1 indicates a perfect positive correlation.

A correlation coefficient of -1 indicates a perfect negative correlation

Note.png For more information on statistical correlation, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following data will be used as example

A B C
1 Maths Physics Biology
2 47 67 33
3 36 68 42
4 40 65 44
5 39 64 60
6 38 43
7 47 84 62
8 29 80 51
9 27 49 40
10 57 49 12
11 56 33 60
12 57
13 26

The following table displays the results of the correlation of the sample data above.

Correlations Column 1 Column 2 Column 3
Column 1 1
Column 2 -0.4029254917 1
Column 3 -0.2107642836 0.2309714048 1

Covariance

Calculates the covariance of two sets of numeric data.

Choose Data - Statistics - Covariance

The covariance is a measure of how much two random variables change together.

Note.png For more information on statistical covariance, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following data will be used as example

A B C
1 Maths Physics Biology
2 47 67 33
3 36 68 42
4 40 65 44
5 39 64 60
6 38 43
7 47 84 62
8 29 80 51
9 27 49 40
10 57 49 12
11 56 33 60
12 57
13 26

The following table displays the results of the covariance of the sample data above.

Covariances Column 1 Column 2 Column 3
Column 1 126.8099173554
Column 2 -61.4444444444 258.41
Column 3 -32 53.11 204.61

Exponential Smoothing

Results in a smoothed data series

Choose Data - Statistics - Exponential Smoothing

Exponential smoothing is a filtering technique that when applied to a data set, produces smoothed results. It is employed in many domains such as stock market, economics and in sampled measurements.

Note.png For more information on exponential smoothing, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Parameters

Smoothing Factor: A parameter between 0 and 1 that represents the damping factor Alpha in the smoothing equation.

Example

The following table has two time series, one representing an impulse function at time t=0 and the other an impulse function at time t=2.

A B
1 1 0
2 0 0
3 0 1
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0

The resulting smoothing is below with smoothing factor as 0.5:

Alpha
0.5
Column 1 Column 2
1 0
1 0
0.5 0
0.25 0.5
0.125 0.25
0.0625 0.125
0.03125 0.0625
0.015625 0.03125
0.0078125 0.015625
0.00390625 0.0078125
0.001953125 0.00390625
0.0009765625 0.001953125
0.0004882813 0.0009765625
0.0002441406 0.0004882813

Moving Average

Calculates the moving average of a time series

Choose Data - Statistics - Moving Average

Note.png For more information on the moving average, refer to the corresponding Wikipedia article.

Data

Input Range: The reference of the range of the data to analyze.

Results to: The reference of the top left cell of the range where the results will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Parameters

Interval: The number of samples used in the moving average calculation.

Example

The following table has two time series, one representing an impulse function at time t=0 and the other an impulse function at time t=2.

A B
1 1 0
2 0 0
3 0 1
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0

Results of the moving average:

Column 1 Column 2
#N/A #N/A
0.3333333333 0.3333333333
0 0.3333333333
0 0.3333333333
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
#N/A #N/A

t-test

Calculates the t-Test of two data samples.

Choose Data - Statistics - t-test

A t-test is any statistical hypothesis test that follows a Student's t distribution.

Note.png For more information on t-tests, refer to the corresponding Wikipedia article.

Data

Variable 1 range: The reference of the range of the first data series to analyze.

Variable 2 range: The reference of the range of the second data series to analyze.

Results to: The reference of the top left cell of the range where the test will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following table has two data sets.

A B
1 28 19
2 26 13
3 31 12
4 23 5
5 20 34
6 27 31
7 28 31
8 14 12
9 4 24
10 0 23
11 2 19
12 8 10
13 9 33

Results for t-Test:

The following table shows the t-Test for the data series above:

t-test
Alpha 0.05
Hypothesized Mean Difference 0
Variable 1 Variable 2
Mean 16.9230769231 20.4615384615
Variance 125.0769230769 94.4358974359
Observations 13 13
Pearson Correlation -0.0617539772
Observed Mean Difference -3.5384615385
Variance of the Differences 232.9358974359
df 12
t Stat -0.8359262137
P (T<=t) one-tail 0.2097651442
t Critical one-tail 1.7822875556
P (T<=t) two-tail 0.4195302884
t Critical two-tail 2.1788128297

F-test

Calculates the F-Test of two data samples.

Choose Data - Statistics - F-test

A F-test is any statistical test based on the F-distribution under the null hypothesis.

Note.png For more information on F-tests, refer to the corresponding Wikipedia article.

Data

Variable 1 range: The reference of the range of the first data series to analyze.

Variable 2 range: The reference of the range of the second data series to analyze.

Results to: The reference of the top left cell of the range where the test will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following table has two data sets.

A B
1 28 19
2 26 13
3 31 12
4 23 5
5 20 34
6 27 31
7 28 31
8 14 12
9 4 24
10 0 23
11 2 19
12 8 10
13 9 33

Results for F-Test:

The following table shows the F-Test for the data series above:

Ftest
Alpha 0.05
Variable 1 Variable 2
Mean 16.9230769231 20.4615384615
Variance 125.0769230769 94.4358974359
Observations 13 13
df 12 12
F 1.3244637524
P (F<=f) right-tail 0.3170614146
F Critical right-tail 2.6866371125
P (F<=f) left-tail 0.6829385854
F Critical left-tail 0.3722125312
P two-tail 0.6341228293
F Critical two-tail 0.3051313549 3.277277094

Z-test

Calculates the z-Test of two data samples.

Choose Data - Statistics - Z-test

Note.png For more information on Z-tests, refer to the corresponding Wikipedia article.

Data

Variable 1 range: The reference of the range of the first data series to analyze.

Variable 2 range: The reference of the range of the second data series to analyze.

Results to: The reference of the top left cell of the range where the test will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following table has two data sets.

A B
1 28 19
2 26 13
3 31 12
4 23 5
5 20 34
6 27 31
7 28 31
8 14 12
9 4 24
10 0 23
11 2 19
12 8 10
13 9 33

Results for z-Test:

The following table shows the z-Test for the data series above:

z-test
Alpha 0.05
Hypothesized Mean Difference 0
Variable 1 Variable 2
Known Variance 0 0
Mean 16.9230769231 20.4615384615
Observations 13 13
Observed Mean Difference -3.5384615385
z #DIV/0!
P (Z<=z) one-tail #DIV/0!
z Critical one-tail 1.644853627
P (Z<=z) two-tail #DIV/0!
z Critical two-tail 1.9599639845

Chi-square test

Calculates the Chi-square test of a data sample.

Choose Data - Statistics - Chi-square Test

Note.png For more information on chi-square tests, refer to the corresponding Wikipedia article.

Data

Input range: The reference of the range of the data series to analyze.

Results to: The reference of the top left cell of the range where the test will be displayed.

Grouped By

Select whether the input data has columns or rows layout.

Example

The following table has two data sets.

A B
1 28 19
2 26 13
3 31 12
4 23 5
5 20 34
6 27 31
7 28 31
8 14 12
9 4 24
10 0 23
11 2 19
12 8 10
13 9 33

Results for Chi-square Test:

Test of Independence (Chi-Square)
Alpha 0.05
df 12
P-value 2.32567054678584E-014
Test Statistic 91.6870055842
Critical Value 21.0260698175

Related Topics

Regression Analysis