Tutorial

Welcome to opynfield’s tutorial. Follow along using the Drosophila test data and then try it out on your own!

1. Import the package and needed functions / classes

[ ]:
from opynfield.config.user_input import UserInput
from opynfield.config.defaults_settings import Defaults
from opynfield.config.cov_asymptote import CoverageAsymptote
from opynfield.config.model_settings import set_up_fits
from opynfield.config.plot_settings import PlotSettings
from opynfield.readin.run_all import run_all_track_types
from opynfield.calculate_measures.calculate_measures import tracks_to_measures
from opynfield.summarize_measures.summarize_individuals import individual_measures_to_dfs
from opynfield.summarize_measures.summarize_groups import all_group_averages
from opynfield.fit_models.fit_individual_models import fit_all, find_fit_bounds, re_fit_all
from opynfield.fit_models.fit_group_models import group_fit_all
from opynfield.stat_test.stat_test import format_params, format_group_params, run_tests
from opynfield.plotting.plot_individuals import plot_all_individuals, plot_traces
from opynfield.plotting.plot_solo_groups import plot_all_solo_groups
from opynfield.plotting.plot_solo_groups_with_individuals import plot_components_of_solo_groups
from opynfield.plotting.plot_group_comparisons import plot_all_group_comparisons
from opynfield.readin.summary_file import summary_file

from copy import deepcopy

2. Define the settings to use

2a. Set up your user inputs

In the test data we have two groups, both recorded using Buridan’s tracker, so our groups_and_types attribute of this UserInput dataclass will be {'GroupA': ['Buridian Tracker'], 'GroupB': ['Buridian Tracker']}.

These group names have no special characters, so we could provide their names as-is to the groups_to_paths attribute (as {'GroupA': 'GroupA', 'GroupB':'GroupB'}), but we could also provide an abbreviation, like {'GroupA': 'A', 'GroupB': 'B'}.

We also need to provide arena_radius_cm with the radius of the arena that the tracks were recorded in. In this case, 4.2cm.

Next, the sample_freq, or sampling frequency with which the tracks were recorded is 30Hz.

Then, we define the edge_dist_cm, which is how far in from the boundary of the arena we should consider the ‘edge region’. In this case, we will use 1cm. If your subjects are a larger animal, recorded in a larger arena, it may make sense to increase this value.

Next, the time_bin_size, or how many seconds pass between analysis point, we will set to 1s. Since our sample_freq was 30Hz, this means 30 samples will be aggregated together for each analysis point.

Then we define the inactivity_threshold. Any movements smaller than this value will be ignored and attributed to body wobble. We will set this to 0.001cm. Larger animals will require larger inactivity thresholds.

Next we set the verbose setting. This indicates whether we want to display progress updates as the analysis progresses. We will set this to True.

The final required UserInput setting is result_path, which indicates where we would like the results to be saved. I will save them to my Desktop for now. You will need to change this argument to a path on your computer.

There are a few other settings that can be changed from their default values. See the UserInput documentation page or the user input information page for more information.

[ ]:
user_settings = UserInput(groups_and_types = {"GroupA": ["Buridian Tracker"], "GroupB": ["Buridian Tracker"]},
                          groups_to_paths = {"GroupA": "A", "GroupB": "B"},
                          arena_radius_cm = 4.2,
                          sample_freq = 30,
                          edge_dist_cm = 1,
                          time_bin_size = 1,
                          inactivity_threshold = 0.001,
                          verbose = True,
                          result_path = '/Users/ellenmcmullen/Desktop/TutorialResults')

2b. Prep the results directory

Once you run the prep_directory() method, you should see the ‘TutorialResults’ folder appear on your desktop.

[ ]:
user_settings.prep_directory()

2c. Set up the default settings

You should be able to use Defaults() without providing any other information.

If you are using a larger animal in a smaller arena, you may want to change the node_size setting to something larger. You can see the Defaults documentation page or the user input information page for more information.

[ ]:
default_settings = Defaults()

2d. Set up the coverage asymptote settings

You should be able to use CoverageAsymptote() without providing any other information.

If you are using an animal that does not spend a significant amount of time in the arena’s edge region, or one that does not habituate during the course of the experiment, you may need to change the time vs coverage model to a linear one. Of course, in this case there will be no asymptote, and you should ignore any PICA or PGCA related results. You can see the CoverageAsymptote documentation page or the user input information page for more information.

[ ]:
asymptote_settings = CoverageAsymptote()

2e. Set up the model settings

The easiest way to provide model settings (ModelSpecification instances) for each valid pair of measures is to run the set_up_fits function.

It automatically pairs each possible x-axis (time, coverage, pica, pgca, and percent_coverage) with each possible y-axis for that measure (activity, p_plus_plus, p_plus_minus, p_plus_zero, p_zero_plus, p_zero_zero, p_plus_plus_given_plus, p_plus_minus_given_plus, p_plus_zero_given_plus, p_zero_plus_given_zero, p_zero_zero_given_zero, p_plus_plus_given_any, p_plus_minus_given_any, p_plus_zero_given_any, p_zero_plus_given_any, and p_zero_zero_given_any for all x-axes, with coverage, pica, pgca, and percent_coverage as additional y-axis options for time). For each pair of measures, it uses the appropriate model and model settings (exponential and fixed exponential models).

The set_up_fits function returns a dictionary keyed by x-axis which in turn is a dictionary keyed by y-axis and gives the ModelSpecification for that x vs y relationship. If you need to use other models (like a linear model), you will need to manually create this nested dictionary. See ModelSpecification documentation page or the user input information page for more information.

[ ]:
model_settings = set_up_fits()

2f. Set up the plotting settings

The only mandatory input to PlotSettings is group_colors. We need to provide a dictionary of group names to colors we want to use to display them in group comparison plots. Let’s use blue and red for our two groups.

For the sake of the tutorial, however, we would also like to display the plots we generate, rather than solely saving them out. To do this we can set some of the optional inputs to True. Typically, it is better to keep these inputs set to False, as rendering the plots takes significantly more time than just saving them.

If you would like to change any other plot settings (for example, what file format to save the images in) you can check out the PlotSettings documentation page or the user input information page for more information.

[ ]:
plot_settings = PlotSettings(group_colors={'GroupA': 'b', 'GroupB': 'r'},
                             display_individual_figures=True,
                             display_solo_group_figures=True)

3. Read in your data

The run_all_track_types function will automatically figure out which track types we need to select files for, in this case, just Buridan’s tracker. All we need to do is provide it with some of the settings defined in the user_settings.

A file-selection window will pop up asking for specific filetypes. For Buridan’s tracker, it will ask for .dat files of GroupA, .xml files for GroupA, .dat files for GroupB, and .xml files for Group B in turn. For some other tracker types where the datafiles include the group name, it will ask for all files (from all groups) at once.

Since we have verbose set to True, we will see progress updates displayed as the files are read in.

This function will return a list of Track objects. Each Track object contains attributes that indicate the group to which the track belongs, the x, y, and time coordinates, as well as some other information. See the Track documentation page for more information.

A final note: if you are on a newer Mac, you may see a message about “+[CATransaction synchronize]”. This is a Mac issue, but will not impact your results.

[ ]:
track_list = run_all_track_types(groups_and_types=user_settings.groups_and_types,
                                 verbose=user_settings.verbose,
                                 arena_radius_cm=user_settings.arena_radius_cm,
                                 running_window_length=user_settings.running_window_length,
                                 window_step_size=user_settings.window_step_size,
                                 sample_freq=user_settings.sample_freq,
                                 time_bin_size=user_settings.time_bin_size,
                                 trim=user_settings.trim)

4. Calculate behavioral measures

4a. Generate the measures

Now that we have the tracks read in, we need to calculate the measures that we care about, such as activity, coverage, and the motion probabilities. To do this we will use the tracks_to_measures function. It takes in our list of Track objects and returns a list of StandardTrack objects, as well as a dictionary of lists of StandardTrack objects keyed by the group to which the tracks belong.

We need to provide the Track objects generated with the run_all_track_types function, as well as some of the setting objects.

The StandardTrack objects that are generated contain the same group, x, y, and y, information as the Track objects, as well as the measures we calculated such as the radial position, coverage, or P++Given+ (p_plus_plus_given_plus). You can see the full list of StandardTrack attributes at the StandardTrack documentation page.

[ ]:
standard_track_list, tracks_by_groups = tracks_to_measures(all_tracks=track_list,
                                                           user_config=user_settings,
                                                           default_settings=default_settings,
                                                           coverage_settings=asymptote_settings)

4b. Save the measures

Now that we have StandardTrack objects with their associated behavioral measures, we want to export that information to .csv files. With the default settings, this will generate a .csv file for each measure under ‘TutorialResults/Individuals/A/Measures’ which contains the data for all individuals in GroupA. Likewise, it will create files under ‘TutorialResults/Individuals/B/Measures’ for GroupB data. Finally, it will create files under ‘TutorialResults/Individuals/CombinedGroups/Measures’ which contains data from all the individuals you read in, along with a column for group name.

If you are interested in conducting your own statistical analyses, the files created under combined group measures would be the easiest to use.

The function also returns a dictionary of group name to a dictionary of measure name to the dataframe that was saved in the .csv files. If you do not want to save the .csv files you can make sure the save_group_csvs and save_all_group_csvs attributes of your Defaults instance are set to False in step 2c. You will still need to run this function to generate the return dictionary which is an input to other functions.

[ ]:
individual_dfs = individual_measures_to_dfs(tracks_by_groups=tracks_by_groups,
                                            defaults=default_settings,
                                            user_inputs=user_settings)

4c. Calculate group averages of the measures

Once we have the individual measures, we also will want to know the group averages.

If you want to know a group’s average time vs activity relationship, you can average the activity values for each individual in that group at each time point.

If instead you want to know a group’s average coverage vs activity relationship, it is not as simple, since plotting average coverage vs average activity is really giving us time domain information in disguise. Each point would really be saying what is the average coverage and average activity of an individual at this time point, rather than the desired result of what is the average activity of an individual who has reached this coverage value.

To get this desired result, we must average a set of points across both measures (coverage is a near-continuous measure, and there is no guarantee that multiple individuals will be recorded at the exact same coverage value).

The all_group_averages function takes care of all this logic and returns a dictionary of x-axis to a dictionary of y-axis to group average data. If you would like to change the density of the average points for any of the non-time averages, you can increase the value of n_points_coverage, n_points_pica, n_points_pgca, or n_bins_percent_coverage to get less dense results (lumping more points together for an average), or decrease those values to get more dense results (lumping fewer points together for an average).

This function will also save out the .csv files of these average measures. They will be found under ‘TutorialResults/Groups/A/Measures’ for GroupA, ‘TutorialResults/Groups/B/Measures’ for GroupB, and ‘TutorialResults/Groups/CombinedGroups/Measures’ for the both groups’ data combined. If you do not want to save the .csv files you can make sure the save_group_csvs and save_all_group_csvs attributes of your Defaults instance are set to False in step 2c.

[ ]:
group_averages = all_group_averages(individual_measures_dfs=individual_dfs,
                                    test_defaults=default_settings,
                                    user_config=user_settings)

5. Fit models to individuals’ measures

5a. Naive bounds

We will first fit models with naive bounds. This means the bounds are set only so that they ensure the parameters have the correct sign. Some individuals may result in parameters that are very unlike the majority of individuals’ parameters in some models.

The fit_all function will fit time vs y-measure models for all y-measures defined in the time_averaged_measures attribute of your Defaults instance. It will also fit coverage-measure vs y-measure models for all y-measures defined in the coverage_averaged_measures attribute of your Defaults instance, for each coverage-measure (coverage, pica, pgca, and percent_coverage). You must have a ModelSpecification instance for each of these models. If you used the set_up_fits function, you should be set. If not, you will need to make sure that you have either created a ModelSpecification for each model listed, or that you have provided a custom argument to time_averaged_measures and/or coverage_averaged_measures.

The fit_all function will return nested dictionaries, the first keyed by group name, then by x-axis, then by y-axis which will ultimately return a dataframe containing the parameters fit on each individual in that group for that x vs y model.

[ ]:
naive_fits = fit_all(individual_measures_dfs=individual_dfs,
                     defaults=default_settings,
                     model_params=model_settings)

5b. Find new bounds

Now that we have fit the models with naive bounds, we want to take care of any outliers. Because there are so many models, we want to preserve the use of all of our data. Additionally, if an individual is an outlier in terms of one parameter, it would be difficult to decide whether we need to throw out that just that parameter, all the parameters of that model (which of course, are not independent) for that individual, or all the parameters of all the models for that individual.

In order to preserve our data, we will instead calculate new bounds and ultimately re-fit all the models with these bounds that ensure we do not get outliers.

The find_fit_bounds function does this by looking at the distribution of the naive fit parameters for each group separately. It finds the mean and standard deviation of each distribution, and then sets the upper and lower bounds for the re-fit to 2 standard deviations above and below the mean (95% of data should fall here, thus we are unlikely to influence many non-outlier individuals). The number of standard deviations to use for the cutoff can be set in the UserInput instance, using the optional bound_level attribute. It will also set the new initial parameter guess to the distribution mean, to ensure that our p0 is within the bounds.

The three returns are all nested dictionaries, first keyed by group, then x-axis, then y-axis. This results in a dataframe of the upper bounds, lower_bounds, or initial parameter values for all each of the parameters in the model.

[ ]:
upper_bounds, lower_bounds, p0s = find_fit_bounds(fits=naive_fits,
                                                  user_inputs=user_settings)

5c. Re-fit models

Now that we have more precise bounds, we need to re-fit the models. The re_fit_all function will fit all the same models that fit_all did, but using the new bounds we calculated in find_fit_bounds.

It will return nested dictionaries, the first keyed by group name, then by x-axis, then by y-axis which will ultimately return a dataframe containing the (outlier-free) parameters fit on each individual in that group for that x vs y model.

[ ]:
bounded_fits = re_fit_all(individual_measures_dfs=individual_dfs,
                          defaults=default_settings,
                          model_params=model_settings,
                          upper=upper_bounds,
                          lower=lower_bounds,
                          initial=p0s )

5d. Format and save the bounded individual fits

Next we can use format_params to create dataframes from the nested dictionaries created in re_fit_all. This will also save out a .csv file for each group that contains all parameter fits for all individuals in that group. They will be found under ‘TutorialResults/Individuals/A/Models’ for GroupA, under ‘TutorialResults/Individuals/B/Models’ for GroupB, and under ‘TutorialResults/Individuals/CombinedGroups/Models’ for both GroupA and GroupB combined.

If you do not want to save the .csv files you can make sure the save_group_model_csvs and save_all_group_model_csvs attributes of your Defaults instance are set to False in step 2c.

[ ]:
formatted_bounded_fits = format_params(bounded_fits=deepcopy(bounded_fits),
                                       defaults=default_settings,
                                       user_inputs=user_settings)

6. Fit models to group measures

6a. Fit the group models with bounds

We also want to fit a model to the entire group’s data. The group_fit_all function will fit all the same models as the fit_all and re_fit_all functions did, but using the data from all individuals in a group. We will use the bounds and p0s that we calculated using the find_fit_bounds.

It returns a nested dictionary, first keyed by group, then by x-measure, then by y-measure. This will yield a dataframe with the all the fit parameters for that model.

[ ]:
group_fits = group_fit_all(individual_measures_dfs=individual_dfs,
                           defaults=default_settings,
                           model_params=model_settings,
                           upper=upper_bounds,
                           lower=lower_bounds,
                           initial=p0s)

6b. Format and save the bounded group fits

Next we can use format_group_params to create dataframes from the nested dictionaries created in group_fit_all. This will save out a .csv file for each group that contains the parameter fits for that group. They will be found under ‘TutorialResults/Groups/A/Models’ for GroupA, under ‘TutorialResults/Groups/B/Models’ for GroupB, and under ‘TutorialResults/Groups/CombinedGroups/Models’ for both GroupA and GroupB combined.

If you do not want to save the .csv files you can make sure the save_group_model_csvs and save_all_group_model_csvs attributes of your Defaults instance are set to False in step 2c. Or, you could skip running this cell.

[ ]:
format_group_params(group_fits=deepcopy(group_fits),
                    defaults=default_settings,
                    user_inputs=user_settings)

7. Run statistical tests

Now, using the formatted parameter fits from the individuals, we can perform MANOVA, ANOVA, and T-tests on the parameters to identify differences between groups. The run_tests function will orchestrate all of these tests. The results can be found in a text file under ‘TutorialResults/Stats’. For information on how to interpret this file, see the Statistics documentation page.

[ ]:
run_tests(formatted_bounded_fits=formatted_bounded_fits,
          defaults=default_settings,
          user_inputs=user_settings)

8. Plot the results

The top-level plotting functions coordinate the creation of many plots at a time, but there are lower-level functions that can create individual plots. If you are interested in these, check out the plotting subpackage documentation for more information.

8a. Plot individuals

The plot_all_individuals function will coordinate plotting every relationship modeled in steps 5-6, for every individual in every group.

It can display the figures and/or save them. If they are saved, you can find them under ‘TutorialResults/Individuals/A/Plots/’ for GroupA, and under ‘TutorialResults/Individuals/G/Plots/’ for GroupB. They are further divided by which measure is used as the x-axis (time, coverage, pica, pgca, or percent_coverage). You can customize the plot colors and the model display by changing the appropriate settings in your PlotSettings instance.

A performance note: this function can be slow, especially if we have display_individual_figures set to True in our PlotSettings instance.

[ ]:
plot_all_individuals(measures=individual_dfs,
                     model_params=bounded_fits,
                     model_info=model_settings,
                     defaults=default_settings,
                     plot_settings=plot_settings,
                     user_config=user_settings)

8b. Plot traces

The plot_traces function will coordinate plotting the track trajectory, for every individual in every group.

It can display the figures and/or save them. If they are saved, you can find them under ‘TutorialResults/Individuals/A/Plots/traces’ for GroupA, and under ‘TutorialResults/Individuals/G/Plots/traces’ for GroupB. You can customize color bar by changing the appropriate setting in your PlotSettings instance.

[ ]:
plot_traces(tracks_by_groups=tracks_by_groups,
            plot_settings=plot_settings,
            user_input=user_settings)

8c. Plot group averages

The plot_all_solo_groups function will coordinate plotting every relationship modeled in steps 5-6, for each group’s average values.

It can display the figures and/or save them. If they are saved, you can find them under ‘TutorialResults/Groups/A/Plots/’ for GroupA, and under ‘TutorialResults/Groups/G/Plots/’ for GroupB. They are further divided by which measure is used as the x-axis (time, coverage, pica, pgca, or percent_coverage). You can customize the plot colors and the model and error display by changing the appropriate settings in your PlotSettings instance.

[ ]:
plot_all_solo_groups(group_averages=group_averages,
                     group_fits=group_fits,
                     model_params=model_settings,
                     test_defaults=default_settings,
                     plot_settings=plot_settings,
                     user_config=user_settings)

8d. Plot group averages with component individuals

The plot_components_of_solo_groups function will coordinate plotting every relationship modeled in steps 5-6, for each group’s average values. On the same plot as these average values, the data from the individuals that make up that group are also displayed, so that you can see how the averages reflect the individuals. This format may be useful in deciding what an appropriate value for the n_points_coverage, n_points_pica, n_points_pgca, or n_bins_percent_coverage attribute of your Defaults instance is.

It can display the figures and/or save them. If they are saved, you can find them under ‘TutorialResults/Groups/A/Plots/’ for GroupA, and under ‘TutorialResults/Groups/G/Plots/’ for GroupB. They are further divided by which measure is used as the x-axis (time, coverage, pica, pgca, or percent_coverage) and include ‘individual_view’ in the folder name. You can customize the plot colors and the model and error display by changing the appropriate settings in your PlotSettings instance.

[ ]:
plot_components_of_solo_groups(individuals=individual_dfs,
                               individual_fits=bounded_fits,
                               groups=group_averages,
                               group_fits=group_fits,
                               model_specs=model_settings,
                               defaults=default_settings,
                               plot_settings=plot_settings,
                               user_inputs=user_settings)

8e. Plot group average comparisons

The plot_all_group_comparisons function will coordinate plotting every relationship modeled in steps 5-6, with all group averages displayed on the same plot. This is where you can see the differences between groups that were identified in the statistical tests, reflected visually.

It can display the figures and/or save them. If they are saved, you can find them under ‘TutorialResults/GroupsComparisonPlots’. They are further divided by which measure is used as the x-axis (time, coverage, pica, pgca, or percent_coverage). You can customize the plot colors and the model and error display by changing the appropriate settings in your PlotSettings instance.

[ ]:
plot_all_group_comparisons(group_averages=group_averages,
                           group_fits=group_fits,
                           model_params=model_settings,
                           test_defaults=default_settings,
                           plot_settings=plot_settings,
                           user_config=user_settings)

9. Summarize the run

Now that we’ve finished, we want to save all the settings we used so that we can replicate the results later, or compare results from using different settings. Running the summary_file function will create a text file under the main ‘TutorialResults’ folder that lists the number of individuals in each group, and the settings from each of the dataclasses we set up throughout step 2.

[ ]:
summary_file(tracks_by_group=tracks_by_groups,
             test_cov_asymptote=asymptote_settings,
             user_defaults=default_settings,
             model_settings=model_settings,
             plot_settings=plot_settings,
             user_config=user_settings)

You’re done!

Now that you have a hang of the analysis, try it out with own data! If you run into any issues, check out the rest of the documentation, or feel free to contact me with any bug reports or feature requests.

Ellen McMullen

[ ]: