At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (2024)

Qiushuo Cheng^a^∗Catherine Morgan^b^cArindam Sikdar^aAlessandro Masullo^aAlan Whone^b^cMajid Mirmehdi^a^a Faculty of Engineering, University of Bristol, UK^b Translational Health Sciences, University of Bristol, UK^c North Bristol NHS Trust, Southmead Hospital, Bristol, UK^∗ Corresponding Author: wl22741@bristol.ac.uk

Abstract

People with Parkinson’s Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings, leaving gait performance outside these controlled environments unaccounted for. Measuring turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints.We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP).We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method.Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on free-living home settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin $45^{\circ}$ based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7°, and a weighted precision (WPrec) of 68.3% for Turn-REMAP. On Turn-H3.6M, it achieves an accuracy of 73.5%, an MAE of 18.5°, and a WPrec of 86.2%.This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting. All data and models are publicly available, providinga baseline for turning parameter measurement to promote future PD gait research.

keywords:

Turning Angle, Human Pose Estimation , Gait Analysis , Parkinson’s Disease , Digital Biomarker

^†^†journal: Computers in Biology and Medicine

\addbibresource

./ref.bib

1 Introduction

Parkinson’s disease (PD) is a progressive neurodegenerative movement disorder, characterised by symptoms such as slowness of movement and gait dysfunction [Jankovic2008] which fluctuate across the day but progress slowly over the years [Holden2018]. Currently, treatment of PD relies on therapies which improve symptoms.There are no treatments available which modify the course of the underlying disease (so-called disease-modifying treatments, or DMTs), despite there being multiple putative DMTs showing promise in laboratory studies [Lang2013].One reason for the slow development of DMTs is the dearth of sensitive, frequent, objective biomarkers to enhance the current gold-standard clinical rating scale [Goetz2008] to measurethe progression of PD. This gold-standard clinical rating scale, the Movement Disorders Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [Goetz2008], includes subjective questionnaires concerning gait and mobility experiences, along with clinicians’ ratings of scripted activities performed by the participants.The assessments typically occur within clinical settings over short durations, offering only a ”snapshot” of symptoms which vary on an hourly basis. It also has limitations including its non-linear and discontinuous scoring system, the inter-rater variability [Post2005] and Hawthorne effect [Paradis2017] of being observed on how someone mobilises [Robles-Garcia2015, Morberg2018].Gait and turning abnormalities are common features of PD with over half of the patients reporting difficulties with turning [Bloem2001] – when someone moves round on their axis while upright, changing the direction they face. Turning changes associated with PD include the ’en bloc’ phenomenon where upper and lower body segments turn simultaneously [Spildooren2013], a longer duration of turn, less accurate turn completion, a narrower base of support [Mellone2016] and the use of ’step turns’ rather than ’pivot turns’ [Hulbert2015].More than 40% of daily steps are during turning [Glaister2007] and turning abnormalities can predispose to falls, thus turning parameters could potentially be used as measures predicting the time to falls in a patient with PD [Bloem2001]. Furthermore, if a fall happens during turning, it is up to 8 times more likely to result in a hip fracture [Cummings1994].In unmedicated early-stage PD, gait parameters from turning are more sensitive to change compared to straight-ahead gait outcomes [Zampieri2010], making measuring aspects of turns potentially of specific use in clinical trials of disease-modifying interventions which typically recruit recently diagnosed patients [Stephenson2021]. People with PD turn differently when being watched by a clinician [Morgan2022PRD], so measuring turning passively in uncontrolled home settings (Figure 1) could give information about mobility not captured by face-to-face assessments in the clinic.

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (1)

Being able to measure the angle of turn therefore could be very helpful in PD assessment, for use in clinical trials and clinical practice. Turning angle alone provides useful insight into the progression of the disease: people with PD take larger angles of turn when they are taking medications, compared to when they withhold their symptom-improving therapies [Conradsson2017]. Turning in gait also comprises other potential measurable elements including foot strike angle, arm swing and turn speed. Calculating the changes of angle over time could help to analyse and interpret these metrics of turning. Previous work shows that the number of turning steps and the turn duration, from unplanned and pre-planned turns, can distinguish between PD medication states (whether someone takes or withholds medication) [Morgan2022PRD]. Turning speed can be used to differentiate between healthy control and PD participants [Mancini2015]. These turning parameters correlate strongly with the MDS-UPDRS scores, showing their potential to evaluate disease severity and progression [Morgan2022PRD]. Therefore, an accurate and robust method to measure the magnitude of the turning serves as the cornerstone for building more sensitive markers of the disease.

In this paper, we present a deep learning-based pipeline to estimate turning angles. We adopt state-of-the-art human pose estimation models to extract 3D human body joint coordinates from monocular RGB videos. The angle of the turn can be calculated by leveragingthe orientation of the paired (left and right) hip and knee joints, which are on the frontal plane of the human bodyand serve as reliable indicators of the direction in which the body is facing. We apply the proposed pipeline on Turn-REMAP, a dataset of turning video clips trimmed from the unique REMAP dataset [Morgan2023_REMAPOpen, morgan2023multimodal], which includes unscripted spontaneous turning activities from passively collected home monitoring videos. To evaluate our proposed method, a retrospective analysis of the trimmed video clips by clinicians serves as the ground truth reference. As it is hard to acquire the precise degree of turning using the naked eye, we adopted a special quantisation method: different from the reference technique used by previous studies [Pham2017, shin2021quantitative], we classify turning angles into the nearest discrete $45^{\circ}$ bins.

To the best of our knowledge, this study is the first to use computer vision technology to measure turning angles in free-living videos for people with PD without relying on the traditional gold standard of motion capture reference typically used in laboratory settings. We also curate Turn-H3.6M comprising 619 turning clips trimmed from the public benchmark Human3.6M [h36m_pami], obtained under controlled settings, and we apply the same turning angle calculation pipeline for comparative evaluation. Due to the availability of 3D data in Turn-H3.6M, we can also compute the turning speed.

In summary our contributions are as follows:

1.
We introduce the Turn-REMAP dataset which provides the first collection of free-living turning videos recorded in a home environment, with both PD patients and healthy controls. The dataset includes discretised ground truth turning angles generated by expert clinicians.
2.
We curate turning videos from the large-scale laboratory-based Human3.6M [h36m_pami] benchmark dataset which includes motion-capture ground truth.
3.
Utilising human pose estimation models, we propose a pipeline to estimate turning angles from single-view RGB videos and validate this pipeline on our proposed datasets. This is the first work to estimate turning angles from natural free-living video data captured from people living with PD.

Next, in Section 2, we review the literature which identifies the gap in current PD gait research and provides the context for our contributions in using free-living video-based settings. Section 3 provides a detailed introduction to our datasets, Turn-REMAP and Turn-H3.6M. Then, Section 4 introduces our methodology for the turning angle estimation pipeline, followed by Section 5 which provides implementation details and the evaluation results of the proposed method. Section 6 includes ablation studies to examine the effect of different design choices within the pipeline. In Section 7, we provide a detailed discussion of the experiment results and highlight the novelty and the contribution of our work. Finally, we present our conclusion and outline potential future work that can be built upon these datasets and the proposed baseline methods in Section 8.Our turning measuring algorithms and extracted skeletons on our datasets are available at link will be provided if published.

2 Literature review

In this section, we consider related literature in the two most pertinent aspects of our work, i.e. sensor-based turning angle estimation and human pose estimation in gait analysis

Turning angle estimation –To acquire objective quantitative turning parameters for human motion analysis, inertial sensors which consist of gyroscopes and accelerometers [mariani2012shoe, El-Gohary2013, Mancini2015, DelDin2016, Pham2017, mancini2018turn, Hillel2019, Shah2020, Rehman2020] and floor pressure sensors [muro2014gait, haji2019turning, shah2020digital, shin2021quantitative] have been well-explored over the years. Many algorithms using inertial sensors placed on shoes or belts have been validated with gold-standard motion capture systems or human raters with reported accuracy on a sub-degree level [mariani2012shoe, El-Gohary2013, Pham2017] but limited to a laboratory environment on the scripted turning course with few predefined turns. Additionally, even though sensors give nearly ground truth readings under these restricted conditions, they require digital devices to be worn on the body of the participant which raises issues of acceptability [AlMahadin2020] and usability [kubota2016machine]. The portable wearable sensors for gait evaluation are power-thirsty and have limited memory storage space, and therefore there are significant burdens for both participants and professionals to replace, recharge, re-configure and transfer data manually. This hinders the generalisation ability of their proposed methods to different patient cohorts, especially in the free-living environment where it is hard to control every relevant factor, like the imperfect use and configuration of wearables. It has been shown that sensor-based algorithms evaluating gait translate poorly from laboratory to home [Toosizadeh2015]. Furthermore, several papers have demonstrated that people mobilise differently in the laboratory compared to home settings [Robles-Garcia2015, DelDin2016, Morberg2018, Hillel2019].

Another inherent limitation of wearable-based methods is that they can only provide kinematics parameters on a few body parts, rather than a holistic view of the position and orientation of the entire body. Furthermore, it is shown in [Rehman2020] that placing wearables on different locations of the body (head, neck, lower back and ankle) causes inconsistency in estimated turning angles.Video-based markerless approaches [kidzinski2020deep, shin2021quantitative, stenum2021two]present a passive and less obtrusive solution to these innate problems of wearable-based approaches. However, compared to marker-based approaches, the accuracy [wade2022applications] of joint angle estimation is not yet good enough for clinical application. The work in [shah2020inertial, wade2022applications] has shown that the reported performances are inconsistent and hard to reproduce outside laboratory environments and among different patient cohorts, as they often use off-the-shelf human pose analysis software and hardware on experiments set up in restricted laboratories with scripted activities. To develop and validate robust video-based gait analysis algorithms, the challenge lies in acquiring videos that are representative enough across different patients in different scenarios including clinics, hospitals and homes. We gather and annotate a dedicated, free-living video data set to estimate turning angles to complement existing research on gait analysis for PD.

Gait Analysis for Parkinson’s Disease –Gait analysis plays an instrumental role in many clinical applications, and is studied closely in PD [di2020gait, zanardi2021gait]. With widely available open-source pose estimation models applied to movement videos collected during clinical assessments, most state-of-the-art works analyse such patient videos using deep learning models and compare their outcomes against clinicians’ annotations to establish the clinical meaning of the measured gait features.

3 Datasets

Our proposed turning angle measurement approach is evaluated on the turning scenes of the recently released free-living dataset, REMAP [Morgan2023_REMAPOpen], and a curated dataset extracted from the public pose estimation benchmark Human3.6M [h36m_pami]. In this section, we discuss the details of the video data and how our annotations enable quantitative evaluation of our method.

Turn-REMAP – REMAP [Morgan2023_REMAPOpen] includes PD and healthy participants engaging in actions, such as sit-to-stand transitions or walking turns within a home environment. These specific actions were recorded during free-living, undirected situations, as well as formal clinical evaluations. We present Turn-REMAP, a subset of this data comprising all its turning actions, loosely-scripted and spontaneous (see Figure 1).The video data is collected using Microsoft Kinect wall-mounted cameras installed on the ground floor (communal areas) of a test-bed house [sphere2015] which captured red-green-blue (RGB) and depth data 2-3 hours daily (during daylight hours at times when participants were at home). The acceptability of using such high-resolution video recordings for validation purposes in home settings in PD has been studied in [McNaney2022, Morgan2022JMIR]. Table 1 summarises the details of Turn-REMAP. The dataset contains 12 spousal/parent-child/friend-friend pairs (24 participants in total) living freely in this sensor-embedded smart home for five days at a time. Each pair consists of one person with PD and one person who was a healthy control volunteer (HC). This pairing was chosen to enable PD vs HC comparison, for safety reasons and also to increase the naturalistic social behaviour (particularly amongst the spousal pairs who already lived together). Of the 24 participants, five females and seven males have PD. The average age of the participants is 60.25 (PD 61.25, Control 59.25) and the average time since PD diagnosis for the person with PD is 11.3 years (range 0.5-19).

# Videos	# Frames	# PD	# HC	Avg. PD Age	Avg. HC Age	Avg. Age	Avg. Time Since Diag.
1386	96984	12	12	61.25	59.25	60.25	11.3 years

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (2)

The RGB videos were watched post-hoc by medical doctors who had undertaken training in the MDS-UPDRS rating score, including gait parameter evaluation. Two clinicians watched up to 4 simultaneously captured video files at a time using ELAN software [aguera2011elan] to manually annotate the videos to the nearest millisecond to the extent possible by a human rater. A pre-prepared annotation template with controlled vocabularies in drop-down menus was used to reduce the variability in the annotations created [Morgan2021_labels]. The parameters annotated included: turning angle estimation ( $90^{\circ}$ - $360^{\circ}$ in $45^{\circ}$ increments, shown in Figure 2) and duration of turn (seconds: milliseconds). Our definition of a turning episode is characterised by the initiation of pelvis rotation, continuing until the completion of the movement, which differs from a turn made within a walking arc, like walking around a table. The duration of labelled data recorded by the cameras for PD and HC is 72.84 and 75.31 hours, respectively.

Two clinicians annotated 50% of the turns each. Around 50% of the total number of annotations were cross-checked (randomly selecting 6 pairs from 12) by both clinician annotators, blinding the cross-checking clinician to the turning annotations produced by the other. Cohen’s Kappa [Cohen1960] statistic was calculated to evaluate inter-rater reliability. Any discrepancies were recorded, discussed, and resolved by the clinician raters, with a final review by a movement disorders specialist. The two clinician raters had an almost perfect [McHugh2012] inter-rater agreement for turning angle annotations (Cohen’s kappa = 0.96).

In addition to free-living movements, the turning clips in Turn-REMAP also include videos where the participants take part in clinical assessments and loosely-scripted activities (see Table 2). In the clinical assessments, participants underwent a series of predefined motor tasks that included completing walking and turning courses that are integral to the MDS-UPDRS (III) motor subscore [Goetz2008]. Additionally, they were required to perform the timed-up-and-go (TUG) test [Podsiadlo1991TUG] twice. Another task involved a 10-metre walk that incorporated three $180^{\circ}$ turns, which participants carried out at their normal, fast, and slow paces. Naturally, the turning clips for these predefined $180^{\circ}$ turns are labelled as $180^{\circ}$ . Compared to free-living activities, the loosely-scripted activities consisted of food preparation tasks undertaken with only broad instructions and no one observing the participants.

Scenario	$\textbf{90}^{\circ}$	$\textbf{135}^{\circ}$	$\textbf{180}^{\circ}$	$\textbf{225}^{\circ}$	Total
Loosely-Scripted	316	36	32	2	386
Clinical Assessment	7	1	41	0	49
Free-living	580	179	188	4	951
Total	903	216	261	6	1386

Turn-H3.6M – To further validate our proposed approach, we curated Turn-H3.6M, a specific turning action video subset of the Human3.6M benchmark [h36m_pami] which consists of 3.6 million frames of RGB and 3D data of 11 professional actors performing various activities in a customised lab environment, such as walking a dog, smoking, taking a photo or talking on the phone. The dataset includes 3D human pose ground truth data.

Previously, IMU-based turning estimation [Rehman2020] has shown that compared to head, neck and ankle, sensory information on the lower back provides a more accurate estimation of turning angle.Following this, we used 3D ground truth to locate frame sequences in the Human3.6M dataset with a consecutive hip rotation equal or larger to $45^{\circ}$ (see example in Figure 3). The $45^{\circ}$ quantity corresponds to the increment between the angle labels within our bins and represents the minimum rotation required to classify a motion as a turning motion. The orientation of ground truth hip joints serves as the ground truth turning angle, and further enables the calculation of actual turning speeds, allowing for comparison with speeds derived from predicted angles.

We manually searched through the entire Human3.6M dataset and extracted 619 legitimate turning video clips at 50 fps, comprising a total of 45199 frames. The clips have an average duration of 1.5 seconds, and the turning angle ranges from $45.2 °$ to $234.7^{\circ}$ (see Table 3).

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (3)

Bin	# Videos	Min Angle	Max Angle	Avg. Angle	Avg. Duration
45^∘	372	45.2^∘	67.5^∘	55.3^∘	1.1s
90^∘	146	67.7^∘	112.0^∘	89.1^∘	1.5s
135^∘	59	115.0^∘	156.4^∘	134.9^∘	2.4s
180^∘	36	163.8^∘	199.6^∘	178.5^∘	2.9s
225^∘	6	213.2^∘	234.7^∘	222.2^∘	3.2s
Total	619	45.2^∘	234.7^∘	79.6^∘	1.5s

4 Methodology

In this section, we provide a detailed description of our proposed framework. Our overall pipeline has two major processes (Figure 4): 3D human joints estimation and turning angle calculation.

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (4)

3D human joints estimation – Our approach comprises a two-stage framework where we first detect 2D human joint locations in each frame of the video sequence and then reconstruct them in 3D space based on the spatial-temporal knowledge extracted from the temporal 2D skeletons series using a deep learning model. Another way of estimating 3D human pose from videos is to use a single deep learning model to infer the 3D coordinates from the RGB pixels directly in an end-to-end manner [pavlakos2017coarse, zhao2019semantic]. However, a more loosely coupled pipeline is chosen over end-to-end frameworks as it has been shown to achieve higher accuracy with significantly lower computational cost on almost all of the benchmarks for human pose estimation [martinez2017simple, stridedTransformer, zhang2022mixste].

To detect the 2D body joints in each video frame, we apply FastPose [fang2022alphapose] as the 2D keypoints detector.The keypoints detector maps input video frames $\mathbf{V}\in\mathbb{R^{\mathit{T*W*H*3}}}$ , into frames of 2D keypoint coordinates $\mathbf{K}\in\mathbb{R^{\mathit{T*J*2}}}$ . $\mathit{T}$ is the number of frames of the video, $\mathit{W}$ and $\mathit{H}$ are the width and height of each frame and $\mathit{J=17}$ is the number of joints (keypoints) in our skeleton, following the skeleton model from Human3.6M [h36m_pami].

FastPose uses a top-down framework, which detects the human object from the frames and estimates the joint coordinates in the form of a heatmap within a bounding box. The model utilises the classical ResNet [resnet] as the image feature extraction backbone, and then uses upsampling modules [wang2018understanding] and 1D convolution to generate heatmaps to represent the probability of each pixel being a human joint. FastPose outputs a heatmap for each joint, selecting the pixel with the maximum value as the joint’s coordinate.Before feeding the video frames into FastPose, we apply standard preprocessing techniques [sun2019deep, fang2022alphapose]: rescaling, normalisation, and flip augmentation. The detected human bounding boxes are first rescaled to a uniform size of 256×196 resolution, as required by the model. Subsequently, the input is normalised by subtracting the mean pixel values for each RGB channel, which helps account for differences in brightness and contrast between frames. Additionally, we employ standard flip augmentation for both training and inference. In this process, we flip the input of FastPose to obtain a flipped output. By flipping the output back and averaging it with the original output, we derive the final prediction.

Having obtained 2D coordinates of human joints, we reconstruct the missing depth information to lift skeletons from 2D to 3D. This is inherently an ill-posed problem, as a single 2D skeleton could have been projected by an infinite number of different 3D poses. However, adding temporal knowledge on how the 2D skeleton changes over time could potentially lead to a more accurate 3D reconstruction.

Numerous architectures have been suggested to address this ill-posed problem [martinez2017simple, zhang2022mixste]. We adopt the state-of-the-art model, Strided Transformer [stridedTransformer], to map the 2D keypoints series $\mathbf{K}\in\mathbb{R^{\mathit{T*J*2}}}$ into 3D skeleton $\mathbf{S}\in\mathbb{R^{\mathit{}}}$ . The Strided Transformer is a transformer-based architecture that converts 2D keypoints into 3D ground truth using the original transformer encoder [vaswani2017attention]. The output is then processed by another transformer encoder with strided convolutions, aggregating the sequence to reconstruct the 3D joints of the centre frame.Notably, the Strided Transformer introduces extra constraints to ensure temporal smoothness while simultaneously aggregating long-range information across the skeleton sequence.

Partial occlusions that are not severe are handled by the 2D keypoint detector FastPose by generating a plausible prediction of missing joint locations. As a result, a complete 2D skeleton is provided as a legitimate input to the Strided Transformer, which then reconstructs the 3D skeleton. Additionally, the Strided Transformer uses the context from surrounding frames to predict 3D joint locations in a central frame of a 27-frame sequence. When a partial occlusion occurs, this temporal smoothness constraint prevents drastic pose changes and helps estimate the joint’s 3D position using information from adjacent frames.

In summary, given an RGB video, the pipeline detects the location of joints on each frame and projects a time series of 2D human skeletons as input into a reconstruction model trained with 3D motion capture ground truth. The final output is then a time series of 3D human skeletons for each turning video clip.

Turning angle estimation –The availability of 3D coordinates for skeleton joints, spanning from the head to the feet, offers the flexibility to conduct precise quantitative assessments of various movements. However, in the context of turning analysis, it is important to note that while the concept of turning has been previously defined, the specific definition of its magnitude or angle has not been explored in prior research focused on skeleton-based gait analysis.In our methodology, the frontal plane is selected over the sagittal or transverse planes to calculate the angle of turning, for it is the anatomical landmark of the human body.The hip and knee joints on the frontal plane, when the human body is in an upright position, are used to estimate the turning angle in a plane parallel to the assumed flat, ground plane, denoted as the $XY$ plane.

The hip and knee vectors $\mathcal{H}_{t}$ , $\mathcal{K}_{t}$ respectively, at frame $t$ are defined as

\mathcal{H}_{t}=(x_{t},y_{t})_{left\_hip}-(x_{t},y_{t})_{right\_hip}~{},

(1)

\mathcal{K}_{t}=(x_{t},y_{t})_{left\_knee}-(x_{t},y_{t})_{right\_knee}~{}.

(2)

For a turning video with $\mathit{T}$ frames, we calculate the angle $\theta$ between the corresponding vectors of two consecutive frames $\mathit{t}$ and $\mathit{t+1}$ for the knee and hip joints, and then sum and average the two angles as

\displaystyle\theta=\frac{1}{2}\sum_{t=0}^{T-2}\left(\sin^{-1}\left(\frac{||%\mathcal{H}_{t}\times\mathcal{H}_{t+1}||}{||\mathcal{H}_{t}||\ ||\mathcal{H}_{%t+1}||}\right)\ +\sin^{-1}\left(\frac{||\mathcal{K}_{t}\times\mathcal{K}_{t+1}%||}{||\mathcal{K}_{t}||\ ||\mathcal{K}_{t+1}||}\right)\right)\ ~{}.

(3)

For our trimmed videos with duration $\mathit{d}$ , the angular speed $\omega$ is subsequently computed as

\omega=\theta/d~{}.

(4)

In ablations, we consider the shoulder, hip and knee joints, together and separately, and show that the combination of hip and knee vectors performs best.

The proposed turning angle estimation algorithm acts as a plug-in-and-play component for any 3D pose estimation model that produces 3D skeletons, providing a compatible method for future comparison.For each video clip, in terms of calculating the overall angle, it is mathematically the same as using only the first and last frame vectors, but the frame-by-frame manner could also inform us of how the velocity changes within one turning motion.

5 Experiments

Implementation and Evaluation –The experiment is conducted in PyTorch on one single NVIDIA 4060Ti GPU and a 12-core AMD Ryzen 5 5500 CPU.In our pipeline, the utilised FastPose model is trained on the MSCOCO pose estimation dataset [lin2014microsoft] and the Strided Transformer [stridedTransformer] is trainedon Human3.6M [h36m_pami], following the standard set-up of related literature [martinez2017simple, zhang2022mixste]. These models are not optimised or fine-tuned to our free-living videos.

We evaluate our proposed method via three key metrics: accuracy, Mean Absolute Error (MAE) in degrees, and weighted precision (WPrec). Accuracy assesses the proportion of predicted angles that correctly fall into their respective bins, showing the categorical correctness of our predictions. MAE is calculated as the average of the absolute difference between the predicted values for angles, as well as speed in Turn-H3.6M, against ground truth.WPrec measures the percentage of true positive predictions for all positive predictions across angle bins, weighted by each bin’s sample size [rabby2023multi]. For example, if a turn is predicted as 90°, WPrec indicates the probability that the actual turn is 90°.

Results on Turn-REMAP – We compared the predicted turning angle against the clinician’s annotations for Turn-REMAP. Based on the rotation of hip and knee joints, our method correctly estimates the angle for 41.6% of all the turns on average, with an overall MAE of $34.7 °$ and WPrec of 68.3% across 1386 videos.

We investigated turning in Turn-REMAP by the turning scenario, location of the turn and subject’s condition. Table 4(a) reports the accuracy under the three scenarios of loosely scripted, clinical, and free-living. Our model across these scenarios yields an accuracy ranging from 26.5% to 44.0%, an MAE ranging from $33.4 °$ to $59.2 °$ and WPrec ranging from 66.2% to 79.1%, with overall averages of 36.0%, $42.5 °$ and 71.7%, respectively. The performance on turns that happened during clinical assessments is significantly worse than the other two scenarios, marking it an outlier.This is largely due to the heightened occurrence of self-occlusion, which hampers the quality of the reconstructed skeleton. Notably, 40 out of 49 turns under clinical assessment are participants performing the predefined $180 °$ turns of the TUG test in the narrow hallway.

Table 4(b) shows that the performance of our model for turns across different locations remains fairly consistent, with the accuracy ranging from 35.9% to 42.9% and an average accuracy of 38.6%.There is a wide range of variation for MAE spanning from 21.7° to 41.3° and an average of 33.9° and a contributing factor to these results is how certain spaces are defined within Turn-REMAP. The kitchen, living room, and stairs are captured as open spaces with no occlusion from furniture, resulting in lower MAE and higher accuracy for turns in these areas. In contrast, the dining room and hallway show increased MAE and reduced accuracy due to frequent occlusions from a centrally located table in the dining room and self-occlusion in the hallway.WPrec ranges from 59.9% to 80.0%, with an average of 71.4%. Finally in Table 4(c), we observe only marginal difference between subjects with PD, who had an accuracy of 42.0%, an MAE of $34.4 °$ and WPrec of 68.2%, and control subjects, who had an accuracy of 41.0%, MAE of $35.1 °$ and WPrec of 68.7%.

Metrics	Scripted	Clinical	Free-living	Avg.
$\text{Accuracy}_{\theta}$	37.6	26.5	44.0	36.0
$\text{MAE}_{\theta}$	34.8	59.2	33.4	42.5
$\text{WPrec}_{\theta}$	79.1	79.7	66.2	71.7
# Turns	386	49	951	462

Metrics	Din.	Hall	Kit.	Liv.	Stairs	Avg.
$\text{Accuracy}_{\theta}$	35.9	36.0	42.9	38.4	40.0	38.6
$\text{MAE}_{\theta}$	41.3	38.4	33.0	35.2	21.7	33.9
$\text{WPrec}_{\theta}$	71.4	75.2	70.3	59.9	80.0	71.4
# Turns	92	89	1062	138	5	277

Metrics	PD	C	Avg.
$\text{Accuracy}_{\theta}$	42.0	41.0	41.5
$\text{MAE}_{\theta}$	34.4	35.1	34.8
$\text{WPrec}_{\theta}$	68.2	68.7	68.5
# Turns	747	639	693

Results on Turn-H3.6M – The availability of 3D ground truth in our curated dataset allows us to calculate the actual turning angle and turning speed, facilitating a direct comparison against the predictions of our model. Our proposed approach on the entire Turn-H3.6M dataset yields an average accuracy of 73.5% and an MAE of 18.5° for angle prediction, with a turning speed MAE of 15.5°/s and a WPrec of 86.2%.

As shown in Table 5(a) the proposed method yields an accuracy ranging from 50.0% to 80.6% and an MAE ranging from $13.4 °$ to $20.7 °$ with averages of 71.6% and $16.1 °$ , respectively, across different turning angle bins. The MAE for turning speed ranges from 5.3^∘/s to 16.9^∘/s and improves for larger turning angles possibly because larger turns may exhibit more pronounced changes in speed.We investigate the performance of our pipeline on different subjects in Table 5(b). Following previous works, such as [martinez2017simple, zhang2022mixste], our model is trained on subjects S1, S5, S6, S7, and S8, while S9 and S11 are held for testing. For turning angle prediction, the accuracy spans from 63.2% to 80.0%, while the MAE varies between 13.3° and 24.7°, with respective averages across different subjects being 74.1% and 17.8°. The MAE for turning speed spans from 6.9 ${}^{\circ}/\mathrm{s}$ to 25.3 ${}^{\circ}/\mathrm{s}$ , with an average of 13.8 ${}^{\circ}/\mathrm{s}$ across different subjects.The WPrec ranges from 75.2% to 93.9% with an average of 85.9%.Although not included in the training phase, the performance of our model on test subjects S9 and S11, in terms of turning angle and speed calculation, falls within the consistent range observed for the other subjects used in training, suggesting the potential for generalisation to previously unseen data. The performance of the turns in S7 stands out as an outlier, showing the poorest results for both speed and angle. A possible explanation could be that the turns of S7 have the lowest average turning angle compared to those of all other subjects. Specifically, 113 out of 144 turns are at 45°, an angle at which our model tends to underperform (Table 5(a)).

	45^∘	90^∘	135^∘	180^∘	225^∘	Avg.
$\text{Accuracy}_{\theta}$	70.2	79.5	78.0	80.6	50.0	71.6
$\text{MAE}_{\theta}$	20.7	15.4	15.3	13.4	15.8	16.1
$\text{MAE}_{\omega}$	19.6	11.3	7.0	5.3	5.4	9.7
# Turns	372	146	59	36	6	124

	S1	S5	S6	S7	S8	S9	S11	Avg.
$\text{Accuracy}_{\theta}$	64.1	75.9	80.0	63.2	78.6	76.7	80.0	74.1
$\text{MAE}_{\theta}$	21.6	16.2	16.9	24.7	13.3	17.3	14.8	17.8
$\text{MAE}_{\omega}$	15.8	12.3	14.3	25.3	6.9	14.3	7.8	13.8
$\text{WPrec}_{\theta}$	75.2	84.6	93.9	93.7	84.8	84.9	84.3	85.9
# Turns	39	116	95	144	56	129	40	88

	Direc.	Eat.	Greet.	Phon.	Pos.	Disc.	Smok.	Walk.	Wait.	Photo	Avg.
$\text{Accuracy}_{\theta}$	76.2	63.3	72.7	74.6	82.4	73.9	72.4	72.9	84.8	77.3	75.1
$\text{MAE}_{\theta}$	15.9	26.1	15.6	19.1	19.8	15.8	18.4	19.0	12.5	14.4	17.7
$\text{MAE}_{\omega}$	13.0	21.0	12.6	14.9	18.0	12.2	16.2	16.6	9.3	11.1	14.5
$\text{WPrec}_{\theta}$	86.2	90.0	87.0	86.3	96.1	79.6	91.6	86.2	90.3	85.5	87.9
# Turns	21	49	33	71	17	46	76	251	33	22	62

In Table 5(c), we see the results of turning angle prediction for turns while the subject performs different actions. Accuracy fluctuates between 63.3% and 84.8%, and MAE spans a range of $12.5 °$ to $26.1 °$ , yielding average values of 75.1% for accuracy and $17.7 °$ for MAE. Our predicted turning speed shows an MAE ranging from 9.3 ${}^{\circ}/\mathrm{s}$ to 21.0 ${}^{\circ}/\mathrm{s}$ , with an average of 14.5 ${}^{\circ}/\mathrm{s}$ . WPrec ranges from 86.2% to 96.1% with an average of 87.9%. The original purpose of these pre-defined activities is to elicit various and diverse human body poses. Although there are imbalanced numbers of turns in different activities, the difference in performance can be attributed to the dynamics of movement including speed and motion pattern.

6 Ablations

The accurate detection of 2D skeleton keypoints in each frame of our input clips is an important contributor to the overall accuracy of our method. Another fundamental concern is which single or combination of ‘body parts’ should be engaged for the computation of the turning angle. We investigate these two issues in our ablation study.

The effect of different 2D keypoints – We investigate how various 2D keypoint detectors impactthe performance of the turning angle estimation in Turn-H3.6M.We applied SimplePose [xiao2018simple], HRNet [sun2019deep] and FastPose [fang2022alphapose] as prospective 2D keypoint detectors respectively and evaluated their performance in estimating turning angles. All three models were trained on the MS-COCO dataset [lin2014microsoft] following the same settings. HRNet and FastPose were chosen because they are state-of-the-art 2D keypoint detection models, while SimplePose, with its minimal yet effective design, was chosen to determine if more complex models are only overfitting the training dataset.

The MAE of these three models does not vary significantly, with values at 18.4° and 18.5°. Among them, FastPose offers the highest accuracy and significantly reduces computational costs in detecting 2D keypoints (see Table6).

2D Keypoints Input	$\text{Accuracy}_{\theta}$	$\text{MAE}_{\theta}$	Params	GFLOPs
with SimplePose	71.6	18.5	34.0M	406.9
with HRNet	71.4	18.4	63.6M	674.0
with FastPose	73.5	18.5	40.5M	246.7

The effect of using different joints –We also calculated the turning angle using different combinations of knee joints, shoulder joints and hip jointsto determine which body part provides the best turning angle estimation. We chose to perform this ablation on Turn-REMAP instead of Turn-H3.6M because the ground truth for turning angles in Turn-REMAP is derived from the clinical expertise of movement disorder specialists. In contrast, the joints used to determine the turning angle ground truth in Turn-H3.6M have already been discussed and defined.

On the human frontal plane, similar to knee and hip joints, shoulder joints are also potentially good indicators of the orientation of the body [lee1985determination]. However, PD patients have difficulty in maintaining lateral balance during weight shifts from one foot to the other while turning, demonstrating a greater inclination angle in the frontal plane than healthy controls [yang2016motion]. This suggests that, in PD, the shoulder joints may become less reliable for initiating turns, whereas combining the hip and knee joints shows less variability and may remain more stable in an upright stance.As shown in Table 7, the average predicted angle using both hip joints and knee joints yields the best accuracy, while averaging all three sets of joints gives the lowest MAE. for turning angle.

Selected Joints	$\text{Accuracy}_{\theta}$	$\text{MAE}_{\theta}$
hip	39.7	36.3
knee	36.7	37.4
shoulder	38.5	36.4
hip+knee	41.6	34.7
hip+shoulder	40.3	35.7
knee+shoulder	41.1	34.4
hip+knee+shoulder	41.5	34.3

7 Discussion

Previous methods for turning analysis have been developed primarily for laboratory or clinical settings to evaluate scripted activities [shin2021quantitative, ribeiro2022public, lee2023gait, zeng2023video]. In Turn-REMAP, we record gait videos in a home-like, unobtrusive environment with PD and control subjects, and provide quantitative evaluations on the accuracy and estimation errors of turning angles during free-living activities.Pham et al. [Pham2017] also measured turning in a free-living environment, however, their method measured turning angles from IMUs alone, while our method is video-based. Pham et al. [Pham2017] recorded videos to manually validate their estimated results and report an overall error of $0.06^{\circ}$ , but we contend that estimating turning angles at an accurate enough resolution to achieve such low error measurements by examining videos with the naked eye is unreliable. Some other IMU-based studies [El-Gohary2013, mancini2018turn, nouredanesh2021fall] have also extended their methodology to home environments, but none of these studies validated the measurement accuracy in the free-living setting.

Although the overall measurement accuracy of our Turn-REMAP dataset is not yet robust enough for clinical diagnosis, it establishes a baseline for future, passive, video-based analysis of turning movements in indoor, free-living environments.Our manual examination of incorrectly classified video clips and their corresponding 3D skeletons revealed that depth reconstruction ambiguity is a significant factor [martinez2017simple, zhang2022mixste] that significantly affects the accurate calculation of turning angles. Recovering the missing depth from a 2D image is inherently an ill-posed problem as infinitely many 3D poses can project to the same 2D skeleton. Despite our utilised models being pretrained on large-scale laboratory 3D motion data, generalising this performance to reconstruct unseen poses in our in-the-wild PD turning dataset remains a substantial challenge.

In Turn-REMAP, we find that the performance of our method on turns in free-living and loosely scripted activities is better than in clinical assessment (Table 4(a)). The reason for the performance degradation on these turns during the clinical assessment isthe heightened occurrence of self-occlusion, where 40 of 49 of these turns are scripted $180 °$ turns in a narrow hallway. This is confirmed by the findings in Table 4(b), which shows that locations like the dining room and hall, which have more occlusions, tend to have lower accuracy and higher MAE. Additionally, we find there is no significant difference in the performance of predicting turning angles for PD and Control (Table 4(c)), suggesting special PD gait characteristics do not affect the performance of our method. In contrast, IMU-based turning measurement methods [salarian2009analyzing, El-Gohary2013, mancini2018turn, haji2019turning] rely heavily on setting thresholds of angular velocity and relative orientation of the sensor attached to a single body part. Compared to skeleton-based models, the isolated sensory kinematic parameters are more easily affected by common PD symptoms, such as freezing of gait and slow turning speed [salarian2009analyzing].

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (5)

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (6)

We further illustrate a comparison of the distribution of turns across different angle labels in the Turn-REMAP dataset against the distribution of turns at the same angles in the Turn-H3.6M dataset in Figure 5(a). The values in Turn-H3.6M are significantly closer to the expected bin angles, while in Turn-REMAP, the predicted angles tend to be underestimated.In these examined bins in Figure 5(a), the standard deviations for the Turn-REMAP dataset are $35.3 °$ , $37.6 °$ , $46.4 °$ and $70.0 °$ , compared to those for the Turn-H3.6M dataset at $22.2 °$ , $20.5 °$ , $16.8 °$ and $21.5 °$ , respectively. This shows that, compared to Turn-REMAP, there is less variability and uncertainty within each bin for predictions in Turn-H3.6M.The difference in performance is further shown in Figure 5(b), where we find that the distribution of MAE for the Turn-REMAP has a wider spread, while in Turn-H3.6M, 89.5% of the errors are smaller than $40 °$ .

The statistics reveal challenges in generalising our pretrained human pose estimation model from the lab-based Turn-H3.6M dataset to the diverse, in-the-wild Turn-REMAP dataset.Different global position distributions [chai2023global], camera parameters [zhan2022ray3d], and diverse human body sizes and shapes, as well as articulated movements [gong2021poseaug, gholami2022adaptpose] highlight the need to enhance model robustness and adaptability to better handle real-world variability. To bridge this gap and enhance the ability to generalise to new, unseen data, it is crucial to implement domain adaptation strategies in deep learning models and conduct cross-dataset validation.

Our model performs at 73.5% accuracy on Turn-H3.6M. This performance is limited by the inherent design of existing pose estimation algorithms, which are not specifically engineered to tackle biomechanical challenges, such as the analysis of turning characteristics. The training of these 2D-3D lifting models is usually guided by Mean Per Joint Position Error (MPJPE) [h36m_pami] loss, which focuses on minimising the absolute distance between the locations of the ground truth joints and the predictions. However, this criterion does not sufficiently address the requirements for temporal smoothness or accurate angular estimation. Therefore, further work on turning analysis involves building a downstream turning analysis algorithm based on the extracted deep learning features.

8 Conclusion and Future Work

Continuously and automatically measuring turning characteristics in a free-living environment could enhance the current clinical rating scale by capturing the true motor symptoms which fluctuate hour by hour. This study is the first effort to detect the fine-grained angle of turn in gait using video data where people are unscripted and in a home setting.In this paper, we introduced the Turn-REMAP and Turn-H3.6M datasets. Turn-REMAP is the first dataset of free-living turning movements that includes clinician-annotated, quantised turning angle ground truth for both PD patients and control subjects across various scenarios and locations. Turn-H3.6M is derived from the lab-based, large-scale 3D pose benchmark known as Human3.6M, curated specifically for turning data analysis. To estimate the turning angle of a subject in raw RGB videos, we utilised a deep learning framework to reconstruct human joints in 3D space. We then proposed a turning angle calculation approach based on joint rotation. Our framework was applied to the unique Turn-REMAP dataset and further validated on Turn-H3.6M.

While the accuracy of our models may not yet allow their application in the real world, they nevertheless establish a previously non-existent baseline and offer valuable insights for future video-based research in challenging free-living scenarios. Our sample size of 24 people, including 12 people with PD, demonstrates that our approach to detecting turning angles is promising and provides a proof of concept.

Automatically computing turning angles in a free-living environment is foundational for future longitudinal, in-home monitoring of PD. There are many potential avenues to build upon our work for more accurate turning angle estimation.Although Turn-REMAP and Turn-H3.6M only consist of trimmed turning clips, our methods can be extended to untrimmed videos. We could also infer additional turning metrics such as turning speed from the estimated turning angle. These metrics can be used to classify PD and control subjects, infer clinical rating scores of disease severity, or assess on/off medication status in free-living video recordings. Another extension for more accurate turning angle computation could be to replace our skeleton model with other models, such as via Human Mesh Recovery [goel2023humans]which could offer additional parameters for turning angle estimation.

Acknowledgments

The authors gratefully thank the study participants for their time and effort in participating in this research. We also acknowledge the local Parkinson’s and other Movement Disorders Health Integration Team (Patient and Public Involvement Group) for their assistance at each step of the study design.This work was supported by the SPHERE Next Steps Project funded by the EPSRC (grant EP/R005273/1),the Elizabeth Blackwell Institute for Health Research at the University of Bristol, the Wellcome Trust Institutional Strategic Support Fund (grant 204813/Z/16/Z), Cure Parkinson’s Trust (grant AW021), and by IXICO (grant R101507-101). Dr Jonathan de Pass and Mrs Georgina de Pass also made a charitable donation to the University of Bristol through the Development and Alumni Relations Office to support research into Parkinson’s Disease. The first author was supported by the Engineering and Physical Sciences Research Council Digital Health and Care Centre for Doctoral Training at the University of Bristol (UKRI Grant No. EP/S023704/1).

Conflict of Interest Statement

The authors have no conflict of interest in this work.

Author contributions

QC: Conceptualisation; Data curation; Formal analysis; Investigation; Methodology; Validation; Visualisation; Writing - original draft; and Writing - review & editing.CM: Resources; Data curation; writing - original draft preparation. Data curation; Formal analysis; Investigation; Supervision; Methodology; writing - original draft; and Writing - review & editing.AS: Conceptualisation; Data curation; Methodology.AM and AW: Supervision; Project administration.MM: Supervision; Project administration; Methodology; Writing - review & editing

\printbibliography