GSoC/2019/StatusReports/DevanshuAgarwal: Difference between revisions

From KDE Community Wiki
< GSoC‎ | 2019‎ | StatusReports
mNo edit summary
No edit summary
Line 1: Line 1:
== Project Overview ==
== Project Overview ==
'''Project Name:''' Statistical Analysis in Labplot<br />
 
'''Purpose:''' Adding statistically relevant features in labplot.
'''Project Name:'''
Statistical Analysis in Labplot
 
'''Abstract:'''
We aimed to add statistically relevant features in Labplot. These features should be able to give the correlation between data points and should perform various hypothesis testings along with assumption checking. Our target audience includes both scientists and engineers, hence we aimed to provide results in the form that is elaborative enough for any non-statistical person to use yet non-distractive for someone who is just interested in numbers.
 
== Proposal ==
You can find my GSoC proposal here:
[https://docs.google.com/document/d/1aoibrQXcpJwP8tGdaNrDwoP2LiTqkj9HwJ3gAqA361U/edit https://docs.google.com/document/d/1aoibrQXcpJwP8tGdaNrDwoP2LiTqkj9HwJ3gAqA361U/edit]


== List of Added Features ==
== List of Added Features ==
Line 12: Line 20:
* '''ZTest'''
* '''ZTest'''
**Two-Sample Independent
**Two-Sample Independent
**Two Sample Paired
**One Sample


* '''ANOVA'''
* '''ANOVA'''
Line 25: Line 31:
**Chi-Square Test for Independence  
**Chi-Square Test for Independence  


== Frontend ==  
== Status Reports ==  
Since a major part of the project aims to make the user comfortable while finding statistics, the special attention was given to frontend. The two major components of Frontend are Dock Widgets (Hypothesis Dock and Correlation Coefficient Dock) and Test Views (Hypothesis Test view and Correlation Coefficient Test View).
'''First Evaluation:'''<br>
 
[https://docs.google.com/document/d/1JxA569fFTcrDUTHdInvKJPz9rXmVYM7DuYT54f7C38U/edit?usp=sharing https://docs.google.com/document/d/1JxA569fFTcrDUTHdInvKJPz9rXmVYM7DuYT54f7C38U/edit?usp=sharing]
== Dock Widgets ==
<br><br>
Dock Widgets provides an interface to the user to select from various options which are required to perform a test. There are two such dock widgets created during the project course (Hypothesis Test Dock and Correlation Coefficient Dock) which appear on the right side of the window according to the test chosen by the user. Some common and important elements in both the dock widgets are listed below with the brief description:
'''Second Evaluation:'''<br>
* '''Name:''' It is the name of dock widget. It can be edited by the user.
[https://docs.google.com/document/d/1qgss0AssIb3HJIDeAYIos2ig37tk_8UWqDsn4OwDPrQ/edit?usp=sharing https://docs.google.com/document/d/1qgss0AssIb3HJIDeAYIos2ig37tk_8UWqDsn4OwDPrQ/edit?usp=sharing]
* '''Comment:''' Any comment which the user wants to add for future reference.  
<br><br>
* '''Data'''
'''Final Report:''' <br>
** '''Source:''' Data source type: Spreadsheet and Database. Currently, only spreadsheet is supported.
I have included all my work with screenshots and demos in the final post of my blog.
** '''Spreadsheet:''' This option appears when the "Spreadsheet" is chosen in "Source". It provides the name of the spreadsheet on which the chosen test will be performed. The user can change the spreadsheet chosen.
Here is the link o that post: [https://agdeva8labplot.blogspot.com/2019/08/final-days-of-gsoc-2019.html https://agdeva8labplot.blogspot.com/2019/08/final-days-of-gsoc-2019.html]
* '''Test'''
** '''Type:''' Type of Test the user wants to perform
** '''Sub-type:''' Subtype of the test chosen from "Type" option. It will be shown only when there are subtypes for the test-type chosen.
** '''Calculate Statistics for Spreadsheet:''' This checkbox is useful whenever the user doesn't have access to the whole data but have a summary of the data like its number of elements, mean of data, standard-deviation or contingency table. Uncheck this checkbox (default: checked) whenever the user wants to perform tests on statistics or the contingency table. On unchecking the box a [[GSoC/2019/StatusReports/DevanshuAgarwal#Statistic_Table|Statistic Table]] will appear. For now, this feature is supported for Two-Sample Independent Z-Test and Chi-Square Test for Independence.
** '''Number of Rows:''' This option is shown when "Calculate Statistics from Spreadsheet" checkbox is unchecked and the user has data in the form of the contingency table. The user can change its default value and dynamically the statistic table will change the number of rows in it.
** '''Number of Columns:''' Similar to "Number of Rows" option for changing the number of columns.
* '''Variable:''' It is visible when "Calculate Statistics from Spreadsheet" is checked.
** '''Independent Var. 1, Independent Var. 2:''' From here, the user can select columns of a spreadsheet on which the test is to be performed. This combo-box only shows columns which are valid with respect to test selected. The labels and number of such combo-box will change automatically according to the columns/options chosen. For example, whenever "Indpenedent Var.1" is intended to contain categorical labels, the label "Independent Var. 2" will get changed to "Dependent Var. 1".
** '''Independent Var. 1 Categorical:''' This checkbox appears when there is a possibility for column selected in Independent Var. 1 to be categorical such that the "Independent Var. 2" can act as the dependent variable. Check this checkbox for such a case.
** '''Recalculate:''' The user should press this push-button after selecting all the preferred options from the dock. After clicking on this button Test View Widget will get populated by results and statistics. This push-button is disabled when no column is selected in at least one of  "Independent Vars" combo boxes.
<br>
There are a few more options which are specific to the Test chosen. These are listed below with a very brief description:
* '''Levene's Test:''' The user can perform Levene's test by clicking on it. It is visible for Two-Sample Independent T-Test and One-Way ANOVA. It shows similar behaviour as "Recalculate".
* '''Equal Variance:''' This checkbox should be checked when the user wants the program to assume homogeneity of variance between populations. This assumption can be checked by performing the Levene Test on the data. Hence It will be visible for all the cases when "Levene's Test" button is visible.
* '''Hypothesis'''
** '''Null:''' The user can see the null hypothesis for T-Test and Z-Test. User can't change it directly but can do so using the "Alternate" hypothesis option.
** '''Alternate''' The user can select alternate hypothesis for T-Test and Z-Test. The changes will be reflected in the "Null" hypothesis dynamically.
** '''μₒ:''' The user can set population mean for One-Sample T-Test and Z-Test. The default value set is 0.
** '''α:''' The user can set significance level for all Hypothesis Tests. The default value set is 0.05.
<br>
Screenshots for dock widgets:
<gallery>
TwoSampleIndependentTTest dock.png| Hypothesis Test Dock (Test: Two-Sample Independent T-Test)
TwoWayAnova dock.png| Hypothesis Test Dock (Test: Two Way ANOVA)
ChiSquareIndependeceTest dock.png| Correlation Coefficient Dock (Test: Chi-Square Test for Independence)
PearsonR dock.png| Correlation Coefficient Dock (Test: Pearson's R)
</gallery>
 
== Test View ==
''' Content under this section is yet to be added '''
 
<translate><span id="Statistic Table"></span> </translate>
 
== Statistic Table ==
''' Content under this section is yet to be added '''
 
== Backend ==
''' Content under this section is yet to be added '''
 
== Demonstrations ==
''' Content under this section is yet to be added '''


== TODO ==  
== TODO ==  
Line 83: Line 48:
* Integrate various tests in one workbook to show a summary to the user in few clicks.
* Integrate various tests in one workbook to show a summary to the user in few clicks.
* All other minor TODOs are already written as comments in source code itself.  
* All other minor TODOs are already written as comments in source code itself.  
 
== Future Goals ==
We aim to generate a single self-contained report for the data, currently analysed by the user. This report will show the statistical analysis summary and graphs in one place, at a single click, without the need of the user to explicitly select or instruct anything unless he/she feels the need of doing so. The idea is to make the task of data analysis easy for the user and give him/her the freedom to play around with the data while keeping track of the changes occurring in different statistical parameters.
 
== Commits ==  
== Commits ==  
My Commits: [https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal] <br>
My Commits: [https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal] <br>
Line 89: Line 57:
<br>
<br>
Review Request: [https://phabricator.kde.org/p/devanshuagarwal/ https://phabricator.kde.org/p/devanshuagarwal/].
Review Request: [https://phabricator.kde.org/p/devanshuagarwal/ https://phabricator.kde.org/p/devanshuagarwal/].
== My Blog ==
[https://agdeva8labplot.blogspot.com/ https://agdeva8labplot.blogspot.com/]


== About Me ==  
== About Me ==  
* '''Name:''' Devanshu Agarwal
* '''Name:''' Devanshu Agarwal
* '''Mentors:''' Stefan Gerlach, Alexander Semke
* '''Mentors:''' Stefan Gerlach, Alexander Semke
* '''Email:''' ​[email protected], ​ [email protected]
* '''Email:''' ​[email protected], ​ [email protected]
* '''Github Id:​''' ​https://github.com/agdeva8
* '''Github Id:​''' ​https://github.com/agdeva8
* '''IRC nickname:''' agdeva8
* '''IRC nickname:''' agdeva8

Revision as of 14:44, 24 August 2019

Project Overview

Project Name: Statistical Analysis in Labplot

Abstract: We aimed to add statistically relevant features in Labplot. These features should be able to give the correlation between data points and should perform various hypothesis testings along with assumption checking. Our target audience includes both scientists and engineers, hence we aimed to provide results in the form that is elaborative enough for any non-statistical person to use yet non-distractive for someone who is just interested in numbers.

Proposal

You can find my GSoC proposal here: https://docs.google.com/document/d/1aoibrQXcpJwP8tGdaNrDwoP2LiTqkj9HwJ3gAqA361U/edit

List of Added Features

I have added the following features for the first evaluation:

  • TTest
    • Two-Sample Independent
    • Two Sample Paired
    • One Sample
  • ZTest
    • Two-Sample Independent
  • ANOVA
    • One Way ANOVA
    • TWo Way ANOVA
  • Levene Test: To check for the assumption of homogeneity of variance between populations
  • Correlation Coefficient
    • Pearson's R
    • Kendall's Tau
    • Spearman Rank
    • Chi-Square Test for Independence

Status Reports

First Evaluation:
https://docs.google.com/document/d/1JxA569fFTcrDUTHdInvKJPz9rXmVYM7DuYT54f7C38U/edit?usp=sharing

Second Evaluation:
https://docs.google.com/document/d/1qgss0AssIb3HJIDeAYIos2ig37tk_8UWqDsn4OwDPrQ/edit?usp=sharing

Final Report:
I have included all my work with screenshots and demos in the final post of my blog. Here is the link o that post: https://agdeva8labplot.blogspot.com/2019/08/final-days-of-gsoc-2019.html

TODO

  • Add more tooltips to Result View
  • Check for assumptions using various tests (like Levene's Test).
  • Reimplement above features when data source type is Database.
  • Integrate various tests in one workbook to show a summary to the user in few clicks.
  • All other minor TODOs are already written as comments in source code itself.

Future Goals

We aim to generate a single self-contained report for the data, currently analysed by the user. This report will show the statistical analysis summary and graphs in one place, at a single click, without the need of the user to explicitly select or instruct anything unless he/she feels the need of doing so. The idea is to make the task of data analysis easy for the user and give him/her the freedom to play around with the data while keeping track of the changes occurring in different statistical parameters.

Commits

My Commits: https://cgit.kde.org/labplot.git/log/?h=gsoc2019_stats&qt=author&q=Devanshu+Agarwal
These commits are reviewed on phabricator by my mentors Stefan Gerlach and Alexander Semke.

Review Request: https://phabricator.kde.org/p/devanshuagarwal/.

My Blog

https://agdeva8labplot.blogspot.com/

About Me