At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (2024)

Qiushuo ChengaCatherine MorganbcArindam SikdaraAlessandro MasulloaAlan WhonebcMajid Mirmehdiaa Faculty of Engineering, University of Bristol, UKb Translational Health Sciences, University of Bristol, UKc North Bristol NHS Trust, Southmead Hospital, Bristol, UK Corresponding Author: wl22741@bristol.ac.uk

Abstract

People with Parkinson’s Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings, leaving gait performance outside these controlled environments unaccounted for. Measuring turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints.We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP).We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method.Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on free-living home settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7°, and a weighted precision (WPrec) of 68.3% for Turn-REMAP. On Turn-H3.6M, it achieves an accuracy of 73.5%, an MAE of 18.5°, and a WPrec of 86.2%.This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting. All data and models are publicly available, providinga baseline for turning parameter measurement to promote future PD gait research.

keywords:

Turning Angle, Human Pose Estimation , Gait Analysis , Parkinson’s Disease , Digital Biomarker

journal: Computers in Biology and Medicine

\addbibresource

./ref.bib

1 Introduction

Parkinson’s disease (PD) is a progressive neurodegenerative movement disorder, characterised by symptoms such as slowness of movement and gait dysfunction [Jankovic2008] which fluctuate across the day but progress slowly over the years [Holden2018]. Currently, treatment of PD relies on therapies which improve symptoms.There are no treatments available which modify the course of the underlying disease (so-called disease-modifying treatments, or DMTs), despite there being multiple putative DMTs showing promise in laboratory studies [Lang2013].One reason for the slow development of DMTs is the dearth of sensitive, frequent, objective biomarkers to enhance the current gold-standard clinical rating scale [Goetz2008] to measurethe progression of PD. This gold-standard clinical rating scale, the Movement Disorders Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [Goetz2008], includes subjective questionnaires concerning gait and mobility experiences, along with clinicians’ ratings of scripted activities performed by the participants.The assessments typically occur within clinical settings over short durations, offering only a ”snapshot” of symptoms which vary on an hourly basis. It also has limitations including its non-linear and discontinuous scoring system, the inter-rater variability [Post2005] and Hawthorne effect [Paradis2017] of being observed on how someone mobilises [Robles-Garcia2015, Morberg2018].Gait and turning abnormalities are common features of PD with over half of the patients reporting difficulties with turning [Bloem2001] – when someone moves round on their axis while upright, changing the direction they face. Turning changes associated with PD include the ’en bloc’ phenomenon where upper and lower body segments turn simultaneously [Spildooren2013], a longer duration of turn, less accurate turn completion, a narrower base of support [Mellone2016] and the use of ’step turns’ rather than ’pivot turns’ [Hulbert2015].More than 40% of daily steps are during turning [Glaister2007] and turning abnormalities can predispose to falls, thus turning parameters could potentially be used as measures predicting the time to falls in a patient with PD [Bloem2001]. Furthermore, if a fall happens during turning, it is up to 8 times more likely to result in a hip fracture [Cummings1994].In unmedicated early-stage PD, gait parameters from turning are more sensitive to change compared to straight-ahead gait outcomes [Zampieri2010], making measuring aspects of turns potentially of specific use in clinical trials of disease-modifying interventions which typically recruit recently diagnosed patients [Stephenson2021]. People with PD turn differently when being watched by a clinician [Morgan2022PRD], so measuring turning passively in uncontrolled home settings (Figure 1) could give information about mobility not captured by face-to-face assessments in the clinic.

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (1)

Being able to measure the angle of turn therefore could be very helpful in PD assessment, for use in clinical trials and clinical practice. Turning angle alone provides useful insight into the progression of the disease: people with PD take larger angles of turn when they are taking medications, compared to when they withhold their symptom-improving therapies [Conradsson2017]. Turning in gait also comprises other potential measurable elements including foot strike angle, arm swing and turn speed. Calculating the changes of angle over time could help to analyse and interpret these metrics of turning. Previous work shows that the number of turning steps and the turn duration, from unplanned and pre-planned turns, can distinguish between PD medication states (whether someone takes or withholds medication) [Morgan2022PRD]. Turning speed can be used to differentiate between healthy control and PD participants [Mancini2015]. These turning parameters correlate strongly with the MDS-UPDRS scores, showing their potential to evaluate disease severity and progression [Morgan2022PRD]. Therefore, an accurate and robust method to measure the magnitude of the turning serves as the cornerstone for building more sensitive markers of the disease.

In this paper, we present a deep learning-based pipeline to estimate turning angles. We adopt state-of-the-art human pose estimation models to extract 3D human body joint coordinates from monocular RGB videos. The angle of the turn can be calculated by leveragingthe orientation of the paired (left and right) hip and knee joints, which are on the frontal plane of the human bodyand serve as reliable indicators of the direction in which the body is facing. We apply the proposed pipeline on Turn-REMAP, a dataset of turning video clips trimmed from the unique REMAP dataset [Morgan2023_REMAPOpen, morgan2023multimodal], which includes unscripted spontaneous turning activities from passively collected home monitoring videos. To evaluate our proposed method, a retrospective analysis of the trimmed video clips by clinicians serves as the ground truth reference. As it is hard to acquire the precise degree of turning using the naked eye, we adopted a special quantisation method: different from the reference technique used by previous studies [Pham2017, shin2021quantitative], we classify turning angles into the nearest discrete 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT bins.

To the best of our knowledge, this study is the first to use computer vision technology to measure turning angles in free-living videos for people with PD without relying on the traditional gold standard of motion capture reference typically used in laboratory settings. We also curate Turn-H3.6M comprising 619 turning clips trimmed from the public benchmark Human3.6M [h36m_pami], obtained under controlled settings, and we apply the same turning angle calculation pipeline for comparative evaluation. Due to the availability of 3D data in Turn-H3.6M, we can also compute the turning speed.

In summary our contributions are as follows:

  • 1.

    We introduce the Turn-REMAP dataset which provides the first collection of free-living turning videos recorded in a home environment, with both PD patients and healthy controls. The dataset includes discretised ground truth turning angles generated by expert clinicians.

  • 2.

    We curate turning videos from the large-scale laboratory-based Human3.6M [h36m_pami] benchmark dataset which includes motion-capture ground truth.

  • 3.

    Utilising human pose estimation models, we propose a pipeline to estimate turning angles from single-view RGB videos and validate this pipeline on our proposed datasets. This is the first work to estimate turning angles from natural free-living video data captured from people living with PD.

Next, in Section 2, we review the literature which identifies the gap in current PD gait research and provides the context for our contributions in using free-living video-based settings. Section 3 provides a detailed introduction to our datasets, Turn-REMAP and Turn-H3.6M. Then, Section 4 introduces our methodology for the turning angle estimation pipeline, followed by Section 5 which provides implementation details and the evaluation results of the proposed method. Section 6 includes ablation studies to examine the effect of different design choices within the pipeline. In Section 7, we provide a detailed discussion of the experiment results and highlight the novelty and the contribution of our work. Finally, we present our conclusion and outline potential future work that can be built upon these datasets and the proposed baseline methods in Section 8.Our turning measuring algorithms and extracted skeletons on our datasets are available at link will be provided if published.

2 Literature review

In this section, we consider related literature in the two most pertinent aspects of our work, i.e. sensor-based turning angle estimation and human pose estimation in gait analysis

Turning angle estimation –To acquire objective quantitative turning parameters for human motion analysis, inertial sensors which consist of gyroscopes and accelerometers [mariani2012shoe, El-Gohary2013, Mancini2015, DelDin2016, Pham2017, mancini2018turn, Hillel2019, Shah2020, Rehman2020] and floor pressure sensors [muro2014gait, haji2019turning, shah2020digital, shin2021quantitative] have been well-explored over the years. Many algorithms using inertial sensors placed on shoes or belts have been validated with gold-standard motion capture systems or human raters with reported accuracy on a sub-degree level [mariani2012shoe, El-Gohary2013, Pham2017] but limited to a laboratory environment on the scripted turning course with few predefined turns. Additionally, even though sensors give nearly ground truth readings under these restricted conditions, they require digital devices to be worn on the body of the participant which raises issues of acceptability [AlMahadin2020] and usability [kubota2016machine]. The portable wearable sensors for gait evaluation are power-thirsty and have limited memory storage space, and therefore there are significant burdens for both participants and professionals to replace, recharge, re-configure and transfer data manually. This hinders the generalisation ability of their proposed methods to different patient cohorts, especially in the free-living environment where it is hard to control every relevant factor, like the imperfect use and configuration of wearables. It has been shown that sensor-based algorithms evaluating gait translate poorly from laboratory to home [Toosizadeh2015]. Furthermore, several papers have demonstrated that people mobilise differently in the laboratory compared to home settings [Robles-Garcia2015, DelDin2016, Morberg2018, Hillel2019].

Another inherent limitation of wearable-based methods is that they can only provide kinematics parameters on a few body parts, rather than a holistic view of the position and orientation of the entire body. Furthermore, it is shown in [Rehman2020] that placing wearables on different locations of the body (head, neck, lower back and ankle) causes inconsistency in estimated turning angles.Video-based markerless approaches [kidzinski2020deep, shin2021quantitative, stenum2021two]present a passive and less obtrusive solution to these innate problems of wearable-based approaches. However, compared to marker-based approaches, the accuracy [wade2022applications] of joint angle estimation is not yet good enough for clinical application. The work in [shah2020inertial, wade2022applications] has shown that the reported performances are inconsistent and hard to reproduce outside laboratory environments and among different patient cohorts, as they often use off-the-shelf human pose analysis software and hardware on experiments set up in restricted laboratories with scripted activities. To develop and validate robust video-based gait analysis algorithms, the challenge lies in acquiring videos that are representative enough across different patients in different scenarios including clinics, hospitals and homes. We gather and annotate a dedicated, free-living video data set to estimate turning angles to complement existing research on gait analysis for PD.

Gait Analysis for Parkinson’s Disease –Gait analysis plays an instrumental role in many clinical applications, and is studied closely in PD [di2020gait, zanardi2021gait]. With widely available open-source pose estimation models applied to movement videos collected during clinical assessments, most state-of-the-art works analyse such patient videos using deep learning models and compare their outcomes against clinicians’ annotations to establish the clinical meaning of the measured gait features.

Sato et al. [sato2019quantifying] used Openpose [openpose] to extract skeleton keypoints and designed handcrafted features to measure step cadence. They do not perform any quantitative evaluations on the accuracy of their measured parameters and only provide correlation analysis of their measured gait features with MDS-UPDRS gait scores. Rupprechter et al. [rupprechter2021clinically] also applied Openpose [openpose] and hand-crafted features from extracted skeletons, but on a large-scale video dataset of hundreds of PD patients, topped with a machine learning classifier to output MDS-UPDRS scores. Similarly, Sabo et al. [sabo2022estimating] used Kinect-generated 3D data with clinical annotations to fine-tune ST-GCN [stgcn] to regress the MDS-UPDRS gait scores, whereas the model was originally designed for the task of action recognition. Lu et al. [lu2020vision] developed and trained their own deep learning model using self-recorded gait examination videos along with similar gait videos from the CASIA Gait Database [wang2003silhouette]to extract 3D skeletons and predict MDS-UPDRS gait scores in an end-to-end manner. Guo et al. [guo2021multi] developed a graph convolutional neural network to predict gait scores of MDS-UPDRS for 157 PD participants. Mehta et al. [mehta2021towards] deployed existing pose estimation models [osokin2018real, asif2020sshfd] to extract 3D skeletons from sit-to-stand videos, and then tested several deep learning models [resnet, stgcn, li2018co] to infer MDS-UPDRS sub-scores of gait disorder and bradykinesia.

Inferring subscores of MDS-UPDRS from videos potentially could contribute to early diagnosis, remote screening of PD and enable more frequent self-assessment. However, the current studies are still confined within the limitations of traditional clinical rating scales. As shown in [liu2022monitoring], the changes in MDS-UPDRS at baseline, month 6, and month 12 could not reflect the decline in gait speed caused by PD, compared to healthy control. This effectively proves that the traditional rating scale cannot sensitively detect the progression of disease which evolves slowly over the years. Therefore, to complement the current clinical tools, it is necessary to quantitatively validate the accuracy of measured gait parameters, and then translate the output to the associated clinical annotations.

We argue that for PD, it is important to accurately measure absolute gait parameters like turning angle, in a natural, non-hospital setting, to build sufficiently sensitive markers to reliably track changes in gait throughout the day and over the years.To this end, most of the previous pose-based research on gait analysis with angular measurement focuses on sagittal [kidzinski2020deep, abe2021openpose] or coronal [shin2021quantitative, tang20222d] joint angles. Only [cao2019human, kondragunta2019estimation, shin2021quantitative] use pose estimation algorithms on turning analysis, but all three are solely for turn detection, or other gait metrics like step length rather than turning angle.Also, other sensors like 3D depth cameras [clark2019three] have been used for human pose estimation using depth maps or point clouds. While such an approach is optimal in some applications like virtual reality, there are many unresolved challenges such as being unable to handle self-occlusions or multi-person detection [xu2021review]. In the complex, unscripted scenario of monitoring everyday activities, it may not be suitable to use depth sensors alone; however, combining it with RGB data could potentially lead to better results.

3 Datasets

Our proposed turning angle measurement approach is evaluated on the turning scenes of the recently released free-living dataset, REMAP [Morgan2023_REMAPOpen], and a curated dataset extracted from the public pose estimation benchmark Human3.6M [h36m_pami]. In this section, we discuss the details of the video data and how our annotations enable quantitative evaluation of our method.

Turn-REMAP – REMAP [Morgan2023_REMAPOpen] includes PD and healthy participants engaging in actions, such as sit-to-stand transitions or walking turns within a home environment. These specific actions were recorded during free-living, undirected situations, as well as formal clinical evaluations. We present Turn-REMAP, a subset of this data comprising all its turning actions, loosely-scripted and spontaneous (see Figure 1).The video data is collected using Microsoft Kinect wall-mounted cameras installed on the ground floor (communal areas) of a test-bed house [sphere2015] which captured red-green-blue (RGB) and depth data 2-3 hours daily (during daylight hours at times when participants were at home). The acceptability of using such high-resolution video recordings for validation purposes in home settings in PD has been studied in [McNaney2022, Morgan2022JMIR]. Table 1 summarises the details of Turn-REMAP. The dataset contains 12 spousal/parent-child/friend-friend pairs (24 participants in total) living freely in this sensor-embedded smart home for five days at a time. Each pair consists of one person with PD and one person who was a healthy control volunteer (HC). This pairing was chosen to enable PD vs HC comparison, for safety reasons and also to increase the naturalistic social behaviour (particularly amongst the spousal pairs who already lived together). Of the 24 participants, five females and seven males have PD. The average age of the participants is 60.25 (PD 61.25, Control 59.25) and the average time since PD diagnosis for the person with PD is 11.3 years (range 0.5-19).

# Videos# Frames# PD# HCAvg. PD AgeAvg. HC AgeAvg. AgeAvg. Time Since Diag.
138696984121261.2559.2560.2511.3 years
At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (2)

The RGB videos were watched post-hoc by medical doctors who had undertaken training in the MDS-UPDRS rating score, including gait parameter evaluation. Two clinicians watched up to 4 simultaneously captured video files at a time using ELAN software [aguera2011elan] to manually annotate the videos to the nearest millisecond to the extent possible by a human rater. A pre-prepared annotation template with controlled vocabularies in drop-down menus was used to reduce the variability in the annotations created [Morgan2021_labels]. The parameters annotated included: turning angle estimation (90superscript9090^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT-360superscript360360^{\circ}360 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT in 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT increments, shown in Figure 2) and duration of turn (seconds: milliseconds). Our definition of a turning episode is characterised by the initiation of pelvis rotation, continuing until the completion of the movement, which differs from a turn made within a walking arc, like walking around a table. The duration of labelled data recorded by the cameras for PD and HC is 72.84 and 75.31 hours, respectively.

Two clinicians annotated 50% of the turns each. Around 50% of the total number of annotations were cross-checked (randomly selecting 6 pairs from 12) by both clinician annotators, blinding the cross-checking clinician to the turning annotations produced by the other. Cohen’s Kappa [Cohen1960] statistic was calculated to evaluate inter-rater reliability. Any discrepancies were recorded, discussed, and resolved by the clinician raters, with a final review by a movement disorders specialist. The two clinician raters had an almost perfect [McHugh2012] inter-rater agreement for turning angle annotations (Cohen’s kappa = 0.96).

In addition to free-living movements, the turning clips in Turn-REMAP also include videos where the participants take part in clinical assessments and loosely-scripted activities (see Table 2). In the clinical assessments, participants underwent a series of predefined motor tasks that included completing walking and turning courses that are integral to the MDS-UPDRS (III) motor subscore [Goetz2008]. Additionally, they were required to perform the timed-up-and-go (TUG) test [Podsiadlo1991TUG] twice. Another task involved a 10-metre walk that incorporated three 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT turns, which participants carried out at their normal, fast, and slow paces. Naturally, the turning clips for these predefined 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT turns are labelled as 180superscript180180^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT. Compared to free-living activities, the loosely-scripted activities consisted of food preparation tasks undertaken with only broad instructions and no one observing the participants.

Scenario90superscript90\textbf{90}^{\circ}90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT135superscript135\textbf{135}^{\circ}135 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT180superscript180\textbf{180}^{\circ}180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT225superscript225\textbf{225}^{\circ}225 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPTTotal
Loosely-Scripted31636322386
Clinical Assessment7141049
Free-living5801791884951
Total90321626161386

Turn-H3.6M – To further validate our proposed approach, we curated Turn-H3.6M, a specific turning action video subset of the Human3.6M benchmark [h36m_pami] which consists of 3.6 million frames of RGB and 3D data of 11 professional actors performing various activities in a customised lab environment, such as walking a dog, smoking, taking a photo or talking on the phone. The dataset includes 3D human pose ground truth data.

Previously, IMU-based turning estimation [Rehman2020] has shown that compared to head, neck and ankle, sensory information on the lower back provides a more accurate estimation of turning angle.Following this, we used 3D ground truth to locate frame sequences in the Human3.6M dataset with a consecutive hip rotation equal or larger to 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT (see example in Figure 3). The 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT quantity corresponds to the increment between the angle labels within our bins and represents the minimum rotation required to classify a motion as a turning motion. The orientation of ground truth hip joints serves as the ground truth turning angle, and further enables the calculation of actual turning speeds, allowing for comparison with speeds derived from predicted angles.

We manually searched through the entire Human3.6M dataset and extracted 619 legitimate turning video clips at 50 fps, comprising a total of 45199 frames. The clips have an average duration of 1.5 seconds, and the turning angle ranges from 45.2°45.2°45.2 ⁢ ° to 234.7superscript234.7234.7^{\circ}234.7 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT (see Table 3).

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (3)
Bin# VideosMin AngleMax AngleAvg. AngleAvg. Duration
4537245.267.555.31.1s
9014667.7112.089.11.5s
13559115.0156.4134.92.4s
18036163.8199.6178.52.9s
2256213.2234.7222.23.2s
Total61945.2234.779.61.5s

4 Methodology

In this section, we provide a detailed description of our proposed framework. Our overall pipeline has two major processes (Figure 4): 3D human joints estimation and turning angle calculation.

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (4)

3D human joints estimation – Our approach comprises a two-stage framework where we first detect 2D human joint locations in each frame of the video sequence and then reconstruct them in 3D space based on the spatial-temporal knowledge extracted from the temporal 2D skeletons series using a deep learning model. Another way of estimating 3D human pose from videos is to use a single deep learning model to infer the 3D coordinates from the RGB pixels directly in an end-to-end manner [pavlakos2017coarse, zhao2019semantic]. However, a more loosely coupled pipeline is chosen over end-to-end frameworks as it has been shown to achieve higher accuracy with significantly lower computational cost on almost all of the benchmarks for human pose estimation [martinez2017simple, stridedTransformer, zhang2022mixste].

To detect the 2D body joints in each video frame, we apply FastPose [fang2022alphapose] as the 2D keypoints detector.The keypoints detector maps input video frames 𝐕TWH3𝐕superscript𝑇𝑊𝐻3\mathbf{V}\in\mathbb{R^{\mathit{T*W*H*3}}}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_T ∗ italic_W ∗ italic_H ∗ italic_3 end_POSTSUPERSCRIPT, into frames of 2D keypoint coordinates 𝐊TJ2𝐊superscript𝑇𝐽2\mathbf{K}\in\mathbb{R^{\mathit{T*J*2}}}bold_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_T ∗ italic_J ∗ italic_2 end_POSTSUPERSCRIPT. T𝑇\mathit{T}italic_T is the number of frames of the video, W𝑊\mathit{W}italic_W and H𝐻\mathit{H}italic_H are the width and height of each frame and J=17𝐽17\mathit{J=17}italic_J = italic_17 is the number of joints (keypoints) in our skeleton, following the skeleton model from Human3.6M [h36m_pami].

FastPose uses a top-down framework, which detects the human object from the frames and estimates the joint coordinates in the form of a heatmap within a bounding box. The model utilises the classical ResNet [resnet] as the image feature extraction backbone, and then uses upsampling modules [wang2018understanding] and 1D convolution to generate heatmaps to represent the probability of each pixel being a human joint. FastPose outputs a heatmap for each joint, selecting the pixel with the maximum value as the joint’s coordinate.Before feeding the video frames into FastPose, we apply standard preprocessing techniques [sun2019deep, fang2022alphapose]: rescaling, normalisation, and flip augmentation. The detected human bounding boxes are first rescaled to a uniform size of 256×196 resolution, as required by the model. Subsequently, the input is normalised by subtracting the mean pixel values for each RGB channel, which helps account for differences in brightness and contrast between frames. Additionally, we employ standard flip augmentation for both training and inference. In this process, we flip the input of FastPose to obtain a flipped output. By flipping the output back and averaging it with the original output, we derive the final prediction.

Having obtained 2D coordinates of human joints, we reconstruct the missing depth information to lift skeletons from 2D to 3D. This is inherently an ill-posed problem, as a single 2D skeleton could have been projected by an infinite number of different 3D poses. However, adding temporal knowledge on how the 2D skeleton changes over time could potentially lead to a more accurate 3D reconstruction.

Numerous architectures have been suggested to address this ill-posed problem [martinez2017simple, zhang2022mixste]. We adopt the state-of-the-art model, Strided Transformer [stridedTransformer], to map the 2D keypoints series 𝐊TJ2𝐊superscript𝑇𝐽2\mathbf{K}\in\mathbb{R^{\mathit{T*J*2}}}bold_K ∈ blackboard_R start_POSTSUPERSCRIPT italic_T ∗ italic_J ∗ italic_2 end_POSTSUPERSCRIPT into 3D skeleton 𝐒𝐒superscriptabsent\mathbf{S}\in\mathbb{R^{\mathit{}}}bold_S ∈ blackboard_R start_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The Strided Transformer is a transformer-based architecture that converts 2D keypoints into 3D ground truth using the original transformer encoder [vaswani2017attention]. The output is then processed by another transformer encoder with strided convolutions, aggregating the sequence to reconstruct the 3D joints of the centre frame.Notably, the Strided Transformer introduces extra constraints to ensure temporal smoothness while simultaneously aggregating long-range information across the skeleton sequence.

Partial occlusions that are not severe are handled by the 2D keypoint detector FastPose by generating a plausible prediction of missing joint locations. As a result, a complete 2D skeleton is provided as a legitimate input to the Strided Transformer, which then reconstructs the 3D skeleton. Additionally, the Strided Transformer uses the context from surrounding frames to predict 3D joint locations in a central frame of a 27-frame sequence. When a partial occlusion occurs, this temporal smoothness constraint prevents drastic pose changes and helps estimate the joint’s 3D position using information from adjacent frames.

In summary, given an RGB video, the pipeline detects the location of joints on each frame and projects a time series of 2D human skeletons as input into a reconstruction model trained with 3D motion capture ground truth. The final output is then a time series of 3D human skeletons for each turning video clip.

Turning angle estimation –The availability of 3D coordinates for skeleton joints, spanning from the head to the feet, offers the flexibility to conduct precise quantitative assessments of various movements. However, in the context of turning analysis, it is important to note that while the concept of turning has been previously defined, the specific definition of its magnitude or angle has not been explored in prior research focused on skeleton-based gait analysis.In our methodology, the frontal plane is selected over the sagittal or transverse planes to calculate the angle of turning, for it is the anatomical landmark of the human body.The hip and knee joints on the frontal plane, when the human body is in an upright position, are used to estimate the turning angle in a plane parallel to the assumed flat, ground plane, denoted as the XY𝑋𝑌XYitalic_X italic_Y plane.

The hip and knee vectors tsubscript𝑡\mathcal{H}_{t}caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, 𝒦tsubscript𝒦𝑡\mathcal{K}_{t}caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT respectively, at frame t𝑡titalic_t are defined as

t=(xt,yt)left_hip(xt,yt)right_hip,subscript𝑡subscriptsubscript𝑥𝑡subscript𝑦𝑡𝑙𝑒𝑓𝑡_𝑖𝑝subscriptsubscript𝑥𝑡subscript𝑦𝑡𝑟𝑖𝑔𝑡_𝑖𝑝\mathcal{H}_{t}=(x_{t},y_{t})_{left\_hip}-(x_{t},y_{t})_{right\_hip}~{},caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_l italic_e italic_f italic_t _ italic_h italic_i italic_p end_POSTSUBSCRIPT - ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_r italic_i italic_g italic_h italic_t _ italic_h italic_i italic_p end_POSTSUBSCRIPT ,(1)
𝒦t=(xt,yt)left_knee(xt,yt)right_knee.subscript𝒦𝑡subscriptsubscript𝑥𝑡subscript𝑦𝑡𝑙𝑒𝑓𝑡_𝑘𝑛𝑒𝑒subscriptsubscript𝑥𝑡subscript𝑦𝑡𝑟𝑖𝑔𝑡_𝑘𝑛𝑒𝑒\mathcal{K}_{t}=(x_{t},y_{t})_{left\_knee}-(x_{t},y_{t})_{right\_knee}~{}.caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_l italic_e italic_f italic_t _ italic_k italic_n italic_e italic_e end_POSTSUBSCRIPT - ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_r italic_i italic_g italic_h italic_t _ italic_k italic_n italic_e italic_e end_POSTSUBSCRIPT .(2)

For a turning video with T𝑇\mathit{T}italic_T frames, we calculate the angle θ𝜃\thetaitalic_θ between the corresponding vectors of two consecutive frames t𝑡\mathit{t}italic_t and t+1𝑡1\mathit{t+1}italic_t + italic_1 for the knee and hip joints, and then sum and average the two angles as

θ=12t=0T2(sin1(t×t+1tt+1)+sin1(𝒦t×𝒦t+1𝒦t𝒦t+1)).𝜃12superscriptsubscript𝑡0𝑇2superscript1normsubscript𝑡subscript𝑡1normsubscript𝑡normsubscript𝑡1superscript1normsubscript𝒦𝑡subscript𝒦𝑡1normsubscript𝒦𝑡normsubscript𝒦𝑡1\displaystyle\theta=\frac{1}{2}\sum_{t=0}^{T-2}\left(\sin^{-1}\left(\frac{||%\mathcal{H}_{t}\times\mathcal{H}_{t+1}||}{||\mathcal{H}_{t}||\ ||\mathcal{H}_{%t+1}||}\right)\ +\sin^{-1}\left(\frac{||\mathcal{K}_{t}\times\mathcal{K}_{t+1}%||}{||\mathcal{K}_{t}||\ ||\mathcal{K}_{t+1}||}\right)\right)\ ~{}.italic_θ = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 2 end_POSTSUPERSCRIPT ( roman_sin start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG | | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_H start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | | end_ARG start_ARG | | caligraphic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | | | caligraphic_H start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | | end_ARG ) + roman_sin start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( divide start_ARG | | caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT × caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | | end_ARG start_ARG | | caligraphic_K start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | | | caligraphic_K start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | | end_ARG ) ) .(3)

For our trimmed videos with duration d𝑑\mathit{d}italic_d, the angular speed ω𝜔\omegaitalic_ω is subsequently computed as

ω=θ/d.𝜔𝜃𝑑\omega=\theta/d~{}.italic_ω = italic_θ / italic_d .(4)

In ablations, we consider the shoulder, hip and knee joints, together and separately, and show that the combination of hip and knee vectors performs best.

The proposed turning angle estimation algorithm acts as a plug-in-and-play component for any 3D pose estimation model that produces 3D skeletons, providing a compatible method for future comparison.For each video clip, in terms of calculating the overall angle, it is mathematically the same as using only the first and last frame vectors, but the frame-by-frame manner could also inform us of how the velocity changes within one turning motion.

5 Experiments

Implementation and Evaluation –The experiment is conducted in PyTorch on one single NVIDIA 4060Ti GPU and a 12-core AMD Ryzen 5 5500 CPU.In our pipeline, the utilised FastPose model is trained on the MSCOCO pose estimation dataset [lin2014microsoft] and the Strided Transformer [stridedTransformer] is trainedon Human3.6M [h36m_pami], following the standard set-up of related literature [martinez2017simple, zhang2022mixste]. These models are not optimised or fine-tuned to our free-living videos.

We evaluate our proposed method via three key metrics: accuracy, Mean Absolute Error (MAE) in degrees, and weighted precision (WPrec). Accuracy assesses the proportion of predicted angles that correctly fall into their respective bins, showing the categorical correctness of our predictions. MAE is calculated as the average of the absolute difference between the predicted values for angles, as well as speed in Turn-H3.6M, against ground truth.WPrec measures the percentage of true positive predictions for all positive predictions across angle bins, weighted by each bin’s sample size [rabby2023multi]. For example, if a turn is predicted as 90°, WPrec indicates the probability that the actual turn is 90°.

Results on Turn-REMAP – We compared the predicted turning angle against the clinician’s annotations for Turn-REMAP. Based on the rotation of hip and knee joints, our method correctly estimates the angle for 41.6% of all the turns on average, with an overall MAE of 34.7°34.7°34.7 ⁢ ° and WPrec of 68.3% across 1386 videos.

We investigated turning in Turn-REMAP by the turning scenario, location of the turn and subject’s condition. Table 4(a) reports the accuracy under the three scenarios of loosely scripted, clinical, and free-living. Our model across these scenarios yields an accuracy ranging from 26.5% to 44.0%, an MAE ranging from 33.4°33.4°33.4 ⁢ ° to 59.2°59.2°59.2 ⁢ ° and WPrec ranging from 66.2% to 79.1%, with overall averages of 36.0%, 42.5°42.5°42.5 ⁢ ° and 71.7%, respectively. The performance on turns that happened during clinical assessments is significantly worse than the other two scenarios, marking it an outlier.This is largely due to the heightened occurrence of self-occlusion, which hampers the quality of the reconstructed skeleton. Notably, 40 out of 49 turns under clinical assessment are participants performing the predefined 180°180°180 ⁢ ° turns of the TUG test in the narrow hallway.

Table 4(b) shows that the performance of our model for turns across different locations remains fairly consistent, with the accuracy ranging from 35.9% to 42.9% and an average accuracy of 38.6%.There is a wide range of variation for MAE spanning from 21.7° to 41.3° and an average of 33.9° and a contributing factor to these results is how certain spaces are defined within Turn-REMAP. The kitchen, living room, and stairs are captured as open spaces with no occlusion from furniture, resulting in lower MAE and higher accuracy for turns in these areas. In contrast, the dining room and hallway show increased MAE and reduced accuracy due to frequent occlusions from a centrally located table in the dining room and self-occlusion in the hallway.WPrec ranges from 59.9% to 80.0%, with an average of 71.4%. Finally in Table 4(c), we observe only marginal difference between subjects with PD, who had an accuracy of 42.0%, an MAE of 34.4°34.4°34.4 ⁢ ° and WPrec of 68.2%, and control subjects, who had an accuracy of 41.0%, MAE of 35.1°35.1°35.1 ⁢ ° and WPrec of 68.7%.

MetricsScriptedClinicalFree-livingAvg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT37.626.544.036.0
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT34.859.233.442.5
WPrecθsubscriptWPrec𝜃\text{WPrec}_{\theta}WPrec start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT79.179.766.271.7
# Turns38649951462
MetricsDin.HallKit.Liv.StairsAvg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT35.936.042.938.440.038.6
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT41.338.433.035.221.733.9
WPrecθsubscriptWPrec𝜃\text{WPrec}_{\theta}WPrec start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT71.475.270.359.980.071.4
# Turns928910621385277
MetricsPDCAvg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT42.041.041.5
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT34.435.134.8
WPrecθsubscriptWPrec𝜃\text{WPrec}_{\theta}WPrec start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT68.268.768.5
# Turns747639693

Results on Turn-H3.6M – The availability of 3D ground truth in our curated dataset allows us to calculate the actual turning angle and turning speed, facilitating a direct comparison against the predictions of our model. Our proposed approach on the entire Turn-H3.6M dataset yields an average accuracy of 73.5% and an MAE of 18.5° for angle prediction, with a turning speed MAE of 15.5°/s and a WPrec of 86.2%.

As shown in Table 5(a) the proposed method yields an accuracy ranging from 50.0% to 80.6% and an MAE ranging from 13.4°13.4°13.4 ⁢ ° to 20.7°20.7°20.7 ⁢ ° with averages of 71.6% and 16.1°16.1°16.1 ⁢ °, respectively, across different turning angle bins. The MAE for turning speed ranges from 5.3/s to 16.9/s and improves for larger turning angles possibly because larger turns may exhibit more pronounced changes in speed.We investigate the performance of our pipeline on different subjects in Table 5(b). Following previous works, such as [martinez2017simple, zhang2022mixste], our model is trained on subjects S1, S5, S6, S7, and S8, while S9 and S11 are held for testing. For turning angle prediction, the accuracy spans from 63.2% to 80.0%, while the MAE varies between 13.3° and 24.7°, with respective averages across different subjects being 74.1% and 17.8°. The MAE for turning speed spans from 6.9/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s to 25.3/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s, with an average of 13.8/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s across different subjects.The WPrec ranges from 75.2% to 93.9% with an average of 85.9%.Although not included in the training phase, the performance of our model on test subjects S9 and S11, in terms of turning angle and speed calculation, falls within the consistent range observed for the other subjects used in training, suggesting the potential for generalisation to previously unseen data. The performance of the turns in S7 stands out as an outlier, showing the poorest results for both speed and angle. A possible explanation could be that the turns of S7 have the lowest average turning angle compared to those of all other subjects. Specifically, 113 out of 144 turns are at 45°, an angle at which our model tends to underperform (Table 5(a)).

4590135180225Avg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT70.279.578.080.650.071.6
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT20.715.415.313.415.816.1
MAEωsubscriptMAE𝜔\text{MAE}_{\omega}MAE start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT19.611.37.05.35.49.7
# Turns37214659366124
S1S5S6S7S8S9S11Avg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT64.175.980.063.278.676.780.074.1
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT21.616.216.924.713.317.314.817.8
MAEωsubscriptMAE𝜔\text{MAE}_{\omega}MAE start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT15.812.314.325.36.914.37.813.8
WPrecθsubscriptWPrec𝜃\text{WPrec}_{\theta}WPrec start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT75.284.693.993.784.884.984.385.9
# Turns3911695144561294088
Direc.Eat.Greet.Phon.Pos.Disc.Smok.Walk.Wait.PhotoAvg.
AccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT76.263.372.774.682.473.972.472.984.877.375.1
MAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT15.926.115.619.119.815.818.419.012.514.417.7
MAEωsubscriptMAE𝜔\text{MAE}_{\omega}MAE start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT13.021.012.614.918.012.216.216.69.311.114.5
WPrecθsubscriptWPrec𝜃\text{WPrec}_{\theta}WPrec start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT86.290.087.086.396.179.691.686.290.385.587.9
# Turns21493371174676251332262

In Table 5(c), we see the results of turning angle prediction for turns while the subject performs different actions. Accuracy fluctuates between 63.3% and 84.8%, and MAE spans a range of 12.5°12.5°12.5 ⁢ ° to 26.1°26.1°26.1 ⁢ °, yielding average values of 75.1% for accuracy and 17.7°17.7°17.7 ⁢ ° for MAE. Our predicted turning speed shows an MAE ranging from 9.3/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s to 21.0/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s, with an average of 14.5/s{}^{\circ}/\mathrm{s}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT / roman_s. WPrec ranges from 86.2% to 96.1% with an average of 87.9%. The original purpose of these pre-defined activities is to elicit various and diverse human body poses. Although there are imbalanced numbers of turns in different activities, the difference in performance can be attributed to the dynamics of movement including speed and motion pattern.

6 Ablations

The accurate detection of 2D skeleton keypoints in each frame of our input clips is an important contributor to the overall accuracy of our method. Another fundamental concern is which single or combination of ‘body parts’ should be engaged for the computation of the turning angle. We investigate these two issues in our ablation study.

The effect of different 2D keypoints – We investigate how various 2D keypoint detectors impactthe performance of the turning angle estimation in Turn-H3.6M.We applied SimplePose [xiao2018simple], HRNet [sun2019deep] and FastPose [fang2022alphapose] as prospective 2D keypoint detectors respectively and evaluated their performance in estimating turning angles. All three models were trained on the MS-COCO dataset [lin2014microsoft] following the same settings. HRNet and FastPose were chosen because they are state-of-the-art 2D keypoint detection models, while SimplePose, with its minimal yet effective design, was chosen to determine if more complex models are only overfitting the training dataset.

The MAE of these three models does not vary significantly, with values at 18.4° and 18.5°. Among them, FastPose offers the highest accuracy and significantly reduces computational costs in detecting 2D keypoints (see Table6).

2D Keypoints InputAccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPTMAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPTParamsGFLOPs
with SimplePose71.618.534.0M406.9
with HRNet71.418.463.6M674.0
with FastPose73.518.540.5M246.7

The effect of using different joints –We also calculated the turning angle using different combinations of knee joints, shoulder joints and hip jointsto determine which body part provides the best turning angle estimation. We chose to perform this ablation on Turn-REMAP instead of Turn-H3.6M because the ground truth for turning angles in Turn-REMAP is derived from the clinical expertise of movement disorder specialists. In contrast, the joints used to determine the turning angle ground truth in Turn-H3.6M have already been discussed and defined.

On the human frontal plane, similar to knee and hip joints, shoulder joints are also potentially good indicators of the orientation of the body [lee1985determination]. However, PD patients have difficulty in maintaining lateral balance during weight shifts from one foot to the other while turning, demonstrating a greater inclination angle in the frontal plane than healthy controls [yang2016motion]. This suggests that, in PD, the shoulder joints may become less reliable for initiating turns, whereas combining the hip and knee joints shows less variability and may remain more stable in an upright stance.As shown in Table 7, the average predicted angle using both hip joints and knee joints yields the best accuracy, while averaging all three sets of joints gives the lowest MAE. for turning angle.

Selected JointsAccuracyθsubscriptAccuracy𝜃\text{Accuracy}_{\theta}Accuracy start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPTMAEθsubscriptMAE𝜃\text{MAE}_{\theta}MAE start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT
hip39.736.3
knee36.737.4
shoulder38.536.4
hip+knee41.634.7
hip+shoulder40.335.7
knee+shoulder41.134.4
hip+knee+shoulder41.534.3

7 Discussion

Previous methods for turning analysis have been developed primarily for laboratory or clinical settings to evaluate scripted activities [shin2021quantitative, ribeiro2022public, lee2023gait, zeng2023video]. In Turn-REMAP, we record gait videos in a home-like, unobtrusive environment with PD and control subjects, and provide quantitative evaluations on the accuracy and estimation errors of turning angles during free-living activities.Pham et al. [Pham2017] also measured turning in a free-living environment, however, their method measured turning angles from IMUs alone, while our method is video-based. Pham et al. [Pham2017] recorded videos to manually validate their estimated results and report an overall error of 0.06superscript0.060.06^{\circ}0.06 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, but we contend that estimating turning angles at an accurate enough resolution to achieve such low error measurements by examining videos with the naked eye is unreliable. Some other IMU-based studies [El-Gohary2013, mancini2018turn, nouredanesh2021fall] have also extended their methodology to home environments, but none of these studies validated the measurement accuracy in the free-living setting.

Although the overall measurement accuracy of our Turn-REMAP dataset is not yet robust enough for clinical diagnosis, it establishes a baseline for future, passive, video-based analysis of turning movements in indoor, free-living environments.Our manual examination of incorrectly classified video clips and their corresponding 3D skeletons revealed that depth reconstruction ambiguity is a significant factor [martinez2017simple, zhang2022mixste] that significantly affects the accurate calculation of turning angles. Recovering the missing depth from a 2D image is inherently an ill-posed problem as infinitely many 3D poses can project to the same 2D skeleton. Despite our utilised models being pretrained on large-scale laboratory 3D motion data, generalising this performance to reconstruct unseen poses in our in-the-wild PD turning dataset remains a substantial challenge.

In Turn-REMAP, we find that the performance of our method on turns in free-living and loosely scripted activities is better than in clinical assessment (Table 4(a)). The reason for the performance degradation on these turns during the clinical assessment isthe heightened occurrence of self-occlusion, where 40 of 49 of these turns are scripted 180°180°180 ⁢ ° turns in a narrow hallway. This is confirmed by the findings in Table 4(b), which shows that locations like the dining room and hall, which have more occlusions, tend to have lower accuracy and higher MAE. Additionally, we find there is no significant difference in the performance of predicting turning angles for PD and Control (Table 4(c)), suggesting special PD gait characteristics do not affect the performance of our method. In contrast, IMU-based turning measurement methods [salarian2009analyzing, El-Gohary2013, mancini2018turn, haji2019turning] rely heavily on setting thresholds of angular velocity and relative orientation of the sensor attached to a single body part. Compared to skeleton-based models, the isolated sensory kinematic parameters are more easily affected by common PD symptoms, such as freezing of gait and slow turning speed [salarian2009analyzing].

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (5)
At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (6)

We further illustrate a comparison of the distribution of turns across different angle labels in the Turn-REMAP dataset against the distribution of turns at the same angles in the Turn-H3.6M dataset in Figure 5(a). The values in Turn-H3.6M are significantly closer to the expected bin angles, while in Turn-REMAP, the predicted angles tend to be underestimated.In these examined bins in Figure 5(a), the standard deviations for the Turn-REMAP dataset are 35.3°35.3°35.3 ⁢ °, 37.6°37.6°37.6 ⁢ °, 46.4°46.4°46.4 ⁢ ° and 70.0°70.0°70.0 ⁢ °, compared to those for the Turn-H3.6M dataset at 22.2°22.2°22.2 ⁢ °, 20.5°20.5°20.5 ⁢ °, 16.8°16.8°16.8 ⁢ ° and 21.5°21.5°21.5 ⁢ °, respectively. This shows that, compared to Turn-REMAP, there is less variability and uncertainty within each bin for predictions in Turn-H3.6M.The difference in performance is further shown in Figure 5(b), where we find that the distribution of MAE for the Turn-REMAP has a wider spread, while in Turn-H3.6M, 89.5% of the errors are smaller than 40°40°40 ⁢ °.

The statistics reveal challenges in generalising our pretrained human pose estimation model from the lab-based Turn-H3.6M dataset to the diverse, in-the-wild Turn-REMAP dataset.Different global position distributions [chai2023global], camera parameters [zhan2022ray3d], and diverse human body sizes and shapes, as well as articulated movements [gong2021poseaug, gholami2022adaptpose] highlight the need to enhance model robustness and adaptability to better handle real-world variability. To bridge this gap and enhance the ability to generalise to new, unseen data, it is crucial to implement domain adaptation strategies in deep learning models and conduct cross-dataset validation.

Our model performs at 73.5% accuracy on Turn-H3.6M. This performance is limited by the inherent design of existing pose estimation algorithms, which are not specifically engineered to tackle biomechanical challenges, such as the analysis of turning characteristics. The training of these 2D-3D lifting models is usually guided by Mean Per Joint Position Error (MPJPE) [h36m_pami] loss, which focuses on minimising the absolute distance between the locations of the ground truth joints and the predictions. However, this criterion does not sufficiently address the requirements for temporal smoothness or accurate angular estimation. Therefore, further work on turning analysis involves building a downstream turning analysis algorithm based on the extracted deep learning features.

8 Conclusion and Future Work

Continuously and automatically measuring turning characteristics in a free-living environment could enhance the current clinical rating scale by capturing the true motor symptoms which fluctuate hour by hour. This study is the first effort to detect the fine-grained angle of turn in gait using video data where people are unscripted and in a home setting.In this paper, we introduced the Turn-REMAP and Turn-H3.6M datasets. Turn-REMAP is the first dataset of free-living turning movements that includes clinician-annotated, quantised turning angle ground truth for both PD patients and control subjects across various scenarios and locations. Turn-H3.6M is derived from the lab-based, large-scale 3D pose benchmark known as Human3.6M, curated specifically for turning data analysis. To estimate the turning angle of a subject in raw RGB videos, we utilised a deep learning framework to reconstruct human joints in 3D space. We then proposed a turning angle calculation approach based on joint rotation. Our framework was applied to the unique Turn-REMAP dataset and further validated on Turn-H3.6M.

While the accuracy of our models may not yet allow their application in the real world, they nevertheless establish a previously non-existent baseline and offer valuable insights for future video-based research in challenging free-living scenarios. Our sample size of 24 people, including 12 people with PD, demonstrates that our approach to detecting turning angles is promising and provides a proof of concept.

Automatically computing turning angles in a free-living environment is foundational for future longitudinal, in-home monitoring of PD. There are many potential avenues to build upon our work for more accurate turning angle estimation.Although Turn-REMAP and Turn-H3.6M only consist of trimmed turning clips, our methods can be extended to untrimmed videos. We could also infer additional turning metrics such as turning speed from the estimated turning angle. These metrics can be used to classify PD and control subjects, infer clinical rating scores of disease severity, or assess on/off medication status in free-living video recordings. Another extension for more accurate turning angle computation could be to replace our skeleton model with other models, such as via Human Mesh Recovery [goel2023humans]which could offer additional parameters for turning angle estimation.

Acknowledgments

The authors gratefully thank the study participants for their time and effort in participating in this research. We also acknowledge the local Parkinson’s and other Movement Disorders Health Integration Team (Patient and Public Involvement Group) for their assistance at each step of the study design.This work was supported by the SPHERE Next Steps Project funded by the EPSRC (grant EP/R005273/1),the Elizabeth Blackwell Institute for Health Research at the University of Bristol, the Wellcome Trust Institutional Strategic Support Fund (grant 204813/Z/16/Z), Cure Parkinson’s Trust (grant AW021), and by IXICO (grant R101507-101). Dr Jonathan de Pass and Mrs Georgina de Pass also made a charitable donation to the University of Bristol through the Development and Alumni Relations Office to support research into Parkinson’s Disease. The first author was supported by the Engineering and Physical Sciences Research Council Digital Health and Care Centre for Doctoral Training at the University of Bristol (UKRI Grant No. EP/S023704/1).

Conflict of Interest Statement

The authors have no conflict of interest in this work.

Author contributions

QC: Conceptualisation; Data curation; Formal analysis; Investigation; Methodology; Validation; Visualisation; Writing - original draft; and Writing - review & editing.CM: Resources; Data curation; writing - original draft preparation. Data curation; Formal analysis; Investigation; Supervision; Methodology; writing - original draft; and Writing - review & editing.AS: Conceptualisation; Data curation; Methodology.AM and AW: Supervision; Project administration.MM: Supervision; Project administration; Methodology; Writing - review & editing

\printbibliography

At Home Turning Angle Estimation for Parkinson’s Disease Severity Assessment (2024)

References

Top Articles
The Complete Guide to Mudae: How to Fully Customize and Master the Ultimate Anime Discord Bot – TheLinuxCode
Moz At Rnc
Cpmc Mission Bernal Campus & Orthopedic Institute Photos
Victor Spizzirri Linkedin
Www.paystubportal.com/7-11 Login
Rubratings Tampa
Asist Liberty
What Are Romance Scams and How to Avoid Them
Beacon Schnider
Myhr North Memorial
Parks in Wien gesperrt
Best Cheap Action Camera
Lesson 1 Homework 5.5 Answer Key
Camstreams Download
Slag bij Plataeae tussen de Grieken en de Perzen
Oscar Nominated Brings Winning Profile to the Kentucky Turf Cup
Craigslist Pets Athens Ohio
Best Nail Salon Rome Ga
Enterprise Car Sales Jacksonville Used Cars
Northeastern Nupath
Jalapeno Grill Ponca City Menu
Publix Super Market At Rainbow Square Shopping Center Dunnellon Photos
Lakers Game Summary
Best Nail Salons Open Near Me
Pasco Telestaff
Directions To Cvs Pharmacy
8000 Cranberry Springs Drive Suite 2M600
Meijer Deli Trays Brochure
Gopher Carts Pensacola Beach
Ts Modesto
Shoe Station Store Locator
Sony Wf-1000Xm4 Controls
Grove City Craigslist Pets
Baldur's Gate 3 Dislocated Shoulder
Nail Salon Open On Monday Near Me
Upstate Ny Craigslist Pets
Reborn Rich Ep 12 Eng Sub
Go Smiles Herndon Reviews
That1Iggirl Mega
Weather Underground Corvallis
Discover Things To Do In Lubbock
Pa Legion Baseball
Coroner Photos Timothy Treadwell
Lucifer Morningstar Wiki
Deepwoken: How To Unlock All Fighting Styles Guide - Item Level Gaming
Breaking down the Stafford trade
Elvis Costello announces King Of America & Other Realms
Campaign Blacksmith Bench
The 5 Types of Intimacy Every Healthy Relationship Needs | All Points North
Round Yellow Adderall
Mazda 3 Depreciation
Acellus Grading Scale
Latest Posts
Article information

Author: Melvina Ondricka

Last Updated:

Views: 6361

Rating: 4.8 / 5 (68 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Melvina Ondricka

Birthday: 2000-12-23

Address: Suite 382 139 Shaniqua Locks, Paulaborough, UT 90498

Phone: +636383657021

Job: Dynamic Government Specialist

Hobby: Kite flying, Watching movies, Knitting, Model building, Reading, Wood carving, Paintball

Introduction: My name is Melvina Ondricka, I am a helpful, fancy, friendly, innocent, outstanding, courageous, thoughtful person who loves writing and wants to share my knowledge and understanding with you.