Background: Current research emphasizes the high prevalence and costs of low back pain (LBP). The STarT Back Tool was designed to support primary care decision making for treatment by helping to determine the treatment prognosis of patients with non-specific low back pain. The German version is the STarT-G. The cross-cultural translation of the tool followed a structured and widely accepted process but to date it was only partially validated with a small sample. The aim of the study was to test the psychometric properties construct validity, discriminative ability, internal consistency and test-retest-reliability of the STarT-G and to compare them with values given for the original English version. Methods: A consecutive cohort study with a two-week retest was conducted among patients with non-specific LBP, aged 18 to 60 years, from primary care practices. Questionnaires were collected before the first consultation, and two weeks later by post, using the following reference standards: the Roland and Morris disability questionnaire, the Tampa Scale of Kinesiophobia, the Pain Catastrophizing Scale and the Hospital Anxiety and Depression Scale. Psychometric properties examined included the tool's discriminative abilities, whether the psychosocial subscale was one factor, internal consistency, item redundancy, test-retest reliability and floor and ceiling effects. Results: There were 228 patients recruited with a mean age of 42.2 (SD 11.0) years, and 53 % were female. The areas under the curve (AUC) for discriminative ability ranged from 0.70 (STarT-G Subscale - Pain Catastrophizing Scale; CI95 0.63, 0.78) to 0.77 (STarT-G Total - Composite reference standard, CI95 0.60, 0.94). Factor loadings ranged from 0.49 to 0.74. Cronbach's alpha testing the internal consistency and redundancy for the total/subscale scores were α = 0.52/0.55 respectively. The STarT-G test-retest reliability Kappa values for the total/subscale scores were 0.67/0.68 respectively. No floor or ceiling effects were present. Conclusions: The STarT-G shows acceptable psychometric properties although not in exact agreement with the original English version. The items previously regarded as a psychosocial subscale may be better seen as an index of different individual psychosocial constructs. The relevance of using the tool at the point of consultation should be further examined.