例数設計 - Researchmap

by user

on 28 марта 2017

Category: Documents

>> Downloads: 1

views

Report

Comments

Description

Download 例数設計 - Researchmap

Transcript

例数設計 - Researchmap

臨床疫学研究における報告の質向上のための統計学の研究会（REQUIRE研究会）
第15回研究集会（2014/5/24）
ランダム化比較試験と観察研究における
二変量解析の例数設計
ランダム化比較試験における
例数設計の書き方
東京医科歯科大学大学院医歯学総合研究科
博士課程市倉加奈子
（[email protected]）
はじめに
ランダム化比較試験における報告の質ガイドライン
(CONSORT 2010 Explanation and Elaboration: updated guidelines for
reporting parallel group randomised tirals)
2
はじめに
“For scientific and ethical reasons, the
sample size for a trial needs to be planned
carefully, with a balance between medical
and statistical considerations.”
「科学的かつ倫理的な理由で，サンプルサイズは医学的な検討
事項と統計学的な検討事項のバランスをとりながら，慎重に計画
される必要がある。」
(CONSORT 2010 Explanation and Elaboration: updated guidelines for
reporting parallel group randomised tirals)
3
はじめに
例数設計をすることの意味
サンプルが小さいと・・・
・タイプⅠエラー（偽陽性）を引き起こす
・タイプⅡエラー（偽陰性）を引き起こす
→科学的に、小さくし過ぎないことが重要！
サンプルが大きいと・・・
・余分なデータを収集することになる
→倫理的に、大きくし過ぎないことが重要！
4
はじめに
CONSORT声明改定の経緯
初期CONSORT声明の対象
・並行群間試験（Parallel-group trial）
・優越性試験（superiority trial）
→その他のデザインにも対応できる声明に改定しよう！
→CONSORT2008
→CONSORT2010
(Reporting of noninferiority and equivalence randomized trials –Extension of
the CONSORT 2010 statement-, JAMA, 2012, 308, pp2594-2604)
5
はじめに
例数設計におけるCONSORT声明改定
Item 7a: Sample Size
How sample size was determined.
サンプルサイズをどのように決めたか。
＋
Whether the sample size was calculated using a noninferiority
criterion and, if so, what the noninferiority margin was.
非劣性試験の基準を用いてサンプルサイズを計算したかどうか。
その場合、非劣性のマージンはいくつにしたか。
(Reporting of noninferiority and equivalence randomized trials –Extension of
the CONSORT 2010 statement-, JAMA, 2012, 308, pp2594-2604)
6
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
7
例数設計の前提
①目的や仮説が明確であること
②適切な統計手法が用いられる妥当なデザイン
であること
③例数設計が仮説検定に基づいていること
④例数設計がプライマリ・エンドポイントについて
なされていること
⑤検出されるプライマリ・エンドポイントは，臨床的
に意味のある差であること
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)ｃ
8
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
9
目的
ランダム化比較試験の主な目的は
「Efficacy（効能）」＆「Safety（安全性）」の検証
効能：薬や治療法が有効かどうか
例）死亡率，5年生存率，再入院率，症状，QOL・・・
安全性：薬や治療法に害がないかどうか
例）有害事象，副作用・・・
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)ｃ
10
仮説
リサーチクエッションの構造化 PICO
Patients （対象は誰か？）
Intervention （介入は何か？）
Comparison （何と比較して？）
Outcome （結果はどうなる？）
福原俊一，「リサーチ・クエスチョンの作り方 –診療上の疑問を研究可能な形に-」（特定非営利活
動法人健康医療評価研究機構）
11
仮説
リサーチクエッションの構造化
P（対象）：急性心筋梗塞3時間以内の患者
I （介入）：ドパミン塩酸塩●mgの服用
C（比較）：プラセボ薬●mgの服用
O（結果）：24時間後の死亡
P（対象）：初診うつ病患者
I （介入）：通常診療＋認知行動療法（週1回×■）
C（比較）：通常診療
O（結果）：２か月後のBDI（うつ症状）得点
福原俊一，「リサーチ・クエスチョンの作り方 –診療上の疑問を研究可能な形に-」（特定非営利活
動法人健康医療評価研究機構）
12
デザイン
目的に応じた「介入（I）」と「比較（C）」
標準治療＋新治療 vs 標準治療
標準治療に新たな治療法を加えることによる効能を示す時
新治療 vs 標準治療
標準治療に代わる治療法の効能を示す時
標準治療と比較して有害でない新治療を開発する時
新治療 vs 標準治療 vs 標準治療＋新治療
※その他，これらを組み合わせた複雑なデザインもあるが，その場合は，一般
的なサンプルサイズ算出の手続きは直接適用できない。
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
13
デザイン
目的に応じた3つのデザイン
①優越性試験（Test for Superiority）
②非劣性試験（Test for Non-Inferiority）
③同等性試験（Test for equivalence）
※CONSORT声明では、タイトルに入れることを推奨！
例） ~~~~~~: a randomised controlled superiority trial
例） Randomized noninferiority trial of ~~~~~~
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
14
デザイン
目的に応じた3つのデザイン
①優越性試験（Test for Superiority）
δ
新薬に効果あり
Lesaffre E: Bull NYU Hosp Jt Dis. 2008;66(2):150-4.
0
標準薬に効果あり
15
デザイン
目的に応じた3つのデザイン
①優越性試験（Test for Superiority）
帰無仮説：新薬の平均－標準薬の平均 ≦ δ
仮説：新薬の平均－標準薬の平均＞ δ
→新薬と標準薬の差は、臨床的に意味のある差より大きい
→新薬は標準薬よりも効果がある
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
16
デザイン
目的に応じた3つのデザイン
②非劣性試験（Test for Non-Inferiority）
0
新薬に効果あり
Lesaffre E: Bull NYU Hosp Jt Dis. 2008;66(2):150-4.
δ
標準薬に効果あり
17
デザイン
目的に応じた「結果（O）」
②非劣性試験（Test for Non-Inferiority）
帰無仮説：標準薬の平均－新薬の平均 ≧ δ
仮説：標準薬の平均－新薬の平均＜ δ
→新薬と標準薬の差は、臨床的に意味のある差より小さい
→新薬は標準薬に劣らずに効果がある
※同等性との違いは主に治療・予防の分野で用いられる
※標準薬と比べて、新薬の方が「有毒でない」「管理が簡単」「高
くない」ような場合に良く使われる
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
18
デザイン
目的に応じた3つのデザイン
③同等性試験（Test for equivalence）
－δ
新薬に効果あり
Lesaffre E: Bull NYU Hosp Jt Dis. 2008;66(2):150-4.
0
＋δ
標準薬に効果あり
19
デザイン
目的に応じた3つのデザイン
③同等性試験（Test for equivalence）
帰無仮説：｜新薬の平均－標準薬の平均｜ ≧ δ
仮説：｜新薬の平均－標準薬の平均｜＜ δ
→新薬と標準薬の間に、臨床的に意味のある差はない
→新薬は標準薬と同等の効果がある
※非劣性との違いは，主に薬物動態の分野で用いられる
※標準薬と比べた新薬の効果が不明な時に同等性試験の仮説
が考えられる（多くの臨床研究のスタートが該当！）
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
20
デザイン
目的に応じた仮説に合わせた様々なデザイン
「Efficacy（効能）」の検証
「Safety（安全性）」の検証
安全性
効能
同等性試験（E）
非劣性試験（N）
優越性試験（S）
同等性試験（E）
E/E
E/N
E/S
非劣性試験（N）
N/E
N/N
N/S
優越性試験（S）
S/E
S/N
S/S
※優越性試験／非劣性試験 → 片側検定
※同等性試験 → 両側検定
Chow SC, Shao J & Wang H, “Sample size calculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
21
デザイン
Efficacy and safety of zotarolimus-eluting and
sirolimus-eluting coronary stents in routine clinical
care (SORT OUT III): a randomised controlled
superiority trial
(Lancet, 2010; 375: 1090-1099)
We therefore aimed to compare the efficacy and safety
(defined by cardiac death, myocardial infarction, and stent
thrombosis) of the zotarolimus-eluting versus the extensively
used and validated sirolimus-eluting stent in a routine clinical
setting with no direct follow-up.
「Ｐ」
「Ｉ」
「Ｃ」
「Ｏ」
冠動脈疾患
ゾタロリムス溶出ステント
シロリムス溶出ステント
効能と安全性（心臓死・心筋梗塞・ステント血栓症）
22
デザイン
Ranibizumab versus Bevacizumab for neovascular
age-related macular degeneration: Results from the
GEFAL noninferiority randomized trial
(Ophathamology, 2013; 120: 2300-2309)
・・・, a noninferiority, double-masked, randomized trial
designed to assess the relative efficacy and safety profile of
ranibizumab and bavacizumab in neovascular AMD
administered with an as-needed regimen over a 1-year
period.
「Ｐ」
「Ｉ」
「Ｃ」
「Ｏ」
新生血管の加齢性黄斑変性症（AMD）
ラニビズマブ
ベバシズマブ
効能と安全性（methodsには、最良矯正視力）
23
エンドポイント
【エンドポイントの定め方の基準】
①生物学的かつ（または）臨床的に重要な指標
②研究目的や領域に基づく指標
例）冠動脈疾患→死亡率
例）心不全→死亡率、入院回数、運動耐用能
例）高血圧→血圧、心血管死亡率
例）ぜんそく→努力性呼気1秒料（FEV1）
例）アルツハイマー病→認知機能スケール
例）偏見性関節炎→関節の圧痛、痛みの機能
24
エンドポイント
【エンドポイントの定め方の基準】
③エンドポイント同士の強い相関がないこと
④統計的仮説の十分な検出力があること
⑤なるべく少なくすること（多くとも４つ）
→ただしサンプルサイズは1つのプライマリ・
エンドポイントに基づいて計算される！
25
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
26
臨床的に意味のある差
【臨床的に意味のある差δの定め方】
臨床的に意味のある差（限界値）δとは？
優越性：新薬の平均－標準薬の平均＞ δ
非劣性：標準薬の平均－新薬の平均＜ δ
同等性：｜新薬の平均－標準薬の平均｜＜ δ
→統計的推論＋臨床判断の双方に基づくべきである
→標準的なルールはない！！
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
27
臨床的に意味のある差
【臨床的に意味のある差δの定め方】
例１）非劣性試験の場合
※非劣性試験のδは，計画と類似の状況下でデザインされたプラセボコント
ロール研究の結果を参照すべき
※非劣性試験のδは，プラセボとの比較で予測される最小の効果量より小さく
すべき（臨床判断より小さくなる可能性も）
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
28
臨床的に意味のある差
【臨床的に意味のある差δの定め方】
例２）ガイドラインなどで定められている場合
※抗感染薬では，アクティブコントロール（実薬）の治癒率に合わせて，新薬と
の差であるδの定め方を推奨している
δ (%)
Response rate for the active control (%)
20
15
10
5
50 – 80
80 – 90
90 – 95
> 95
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
29
臨床的に意味のある差
【臨床的に意味のある差δの定め方】
例３）生物学的同等性試験の場合
※健常者を対象に後発品が承認された医薬品と同等の生物学的効果がある
かどうかを示す場合には，平均値差である「血中または血漿中濃度―時間
曲線下面積（Area under the blood or plasma concentration-time curve:
AUC)」や「最高濃度（maximum concentration: Cmax)」などの対数変換値に
おけるδを以下の通り定める。
δ = log (1.25）
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
30
臨床的に意味のある差
【臨床的に意味のある差δの定め方】
例４）試験薬の臨床成績に関する情報がない場合
※ほとんどの臨床研究で観察された臨床的に重要な効果量は，0.25から0.5
の間に含まれる，という事実をもとに，定め方が推奨されている
δ = 0.25 ～ δ = 0.5
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
31
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
32
計算に必要なパラメータ
①検定力（検出力）：1-β
②有意水準：α
③効果量や限界値
④【連続データの場合】変数の標準偏差（SD）
※必要なパラメータ全て記載している研究：53%
（New England Journal of Medicine, JAMA, The Lancet, BMJ, PLoS Medicine のうち）
Pierre C. et al., Reporting of sample size calculation in randomised controlled trials: review,
BMJ, 2009
33
計算に必要なパラメータ
①検定力／検出力（Power）
(1-β)×100
80%
85%
90%
95%
その他
n (%)
107 (54%)
9 (5%)
66 (33%)
4 (2%)
14 (7%)
※New England Journal of Medicine, JAMA, The Lancet, BMJ, PLoS Medicine のうち
→タイプⅡ（β）エラーを最小限にしたい場合（偽陰性を避けたい場合）は，検定力をな
るべく高く設定する！！
Pierre C. et al., Reporting of sample size calculation in randomised controlled trials: review,
BMJ, 2009
34
計算に必要なパラメータ
②有意水準（α risk）
α
n (%)
0.05
183 (96%)
両側検定
片側検定
不明
0.025（片側）
中間解析に適応
119 (65%)
7 (4%)
57 (31%)
2 (1%)
6 (3%)
※New England Journal of Medicine, JAMA, The Lancet, BMJ, PLoS Medicine のうち
→タイプⅠ（α）エラーを最小限にしたい場合（偽陽性を避けたい場合）は，有意水準を
なるべく低く設定する！！
Pierre C. et al., Reporting of sample size calculation in randomised controlled trials: review,
BMJ, 2009
35
計算に必要なパラメータ
③効果量の見積り（control群）
先行研究の結果
予備試験の結果
観察研究のデータ
系統的レビューの結果
その他
n (%)
54 (67%)
15 (19%)
6 (7%)
2 (3%)
4 (5%)
※New England Journal of Medicine, JAMA, The Lancet, BMJ, PLoS Medicine のうち
Pierre C. et al., Reporting of sample size calculation in randomised controlled trials: review,
BMJ, 2009
36
計算に必要なパラメータ
③効果量の見積り（treatment群）
類似の他の試験や治療
臨床的妥当性
観察研究のデータ
メタアナリシスの結果
n (%)
41 (82%)
7 (14%)
1 (2%)
1 (2%)
※New England Journal of Medicine, JAMA, The Lancet, BMJ, PLoS Medicine のうち
Pierre C. et al., Reporting of sample size calculation in randomised controlled trials: review,
BMJ, 2009
37
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
38
例数設計の手続き
Sample size estimation
統計的に必要なサンプルサイズを見積もる
Sample size justification
予算や臨床的問題を考えて妥当かを証明する
Sample size adjustment
ドロップアウトや共変量に合わせて調整する
Sample size re-estimation
中間点までの情報をもとに，再度見積もる
Chow SC, Shao J & Wang H, “Sample size caluculations in clinical research –Second edition-”
(Chapman & Hall/CRC)
39
前半から・・・
ようは何を書かないといけないのか！！
（Methodsに仮説・デザイン・エンドポイントを示した上で）
1. どういう判断で効果量や限界値（δ）を定めた？
2. 臨床的に意味のある差（δ）をいくつに定めた？
3. パラメータ（各群の効果量含む）をいくつに定めた？
4. 計算上のサンプルサイズはいくつになった？
5. 脱落や欠損をどう見積もった？
6. 最終的なサンプルサイズはいくつになった？
40
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
41
1. Effect of home-based hand exercises in women with
hand osteoarthritis: a randomised controlled trial.
(Ann Rheum Dis., 2014, Online)
優越性試験
Patients: 上肢変形性関節炎患者の女性
Interventions: 情報提供＋自宅での上肢体操
Controls: 情報提供のみ
Outcome: 活動能力（PSFS）
42
1.Effect of home-based hand exercises in women with
hand osteoarthritis: a randomised controlled trial.
(Ann Rheum Dis., 2014, Online)
[Sample size]
We used the baseline scores of PSFS in a study of patients with knee
dysfunction for sample calculation. Based on this study, the mean (SD)
PSFS-total score was set as 3.1 (1.8). The minimal clinical important
difference in PSFS has been estimated as 2.2 points. We expected a 20%
loss to follow-up each group was required to detect a difference of 2.2
points between groups with a significance level of 0.05 and a power of
80%. Owing to uncertainty related to the estimates of PSFS mean and SD,
which might be different in HOA, we decided to include 40 participants in
each group.
43
1.Effect of home-based hand exercises in women with
hand osteoarthritis: a randomised controlled trial.
(Ann Rheum Dis., 2014, Online)
1.
2.
3.
4.
5.
6.
ひざ関節障害の患者における先行研究をもとに設定
marginδ = 2.2点
α = 0.05，1-β = 0.80，※その他の情報が×
×
脱落率を20%と想定
最終的なサンプルサイズは，各群40名
両群の平均値差，標準偏差の情報が
書かれていない。情報不足。
44
2. Use of corticosteroids after hepatoportenterostomy
for bile drainage in infants with biliary atresia. –The
START randomized clinical Trial(JAMA, 2014; 311(17), 991750-1759)
優越性試験
Patients: 肝門部腸吻合後の胆道閉鎖症の乳児
Interventions: コルチコステロイド
Controls: プラセボ
Outcome: 胆汁排泄の成功
45
2. Use of corticosteroids after hepatoportenterostomy
for bile drainage in infants with biliary atresia. –The
START randomized clinical Trial(JAMA, 2014; 311(17), 991750-1759)
Seventy participants per group were calculated to provide 80% power to
detect 25% absolute treatment difference in the primary end point on the
basis of a 2-sample test of proportions, with a 2-sided significance level
of .05 and allowing for 20% attrition and 2 interim analyses based on the
O’Brien-Fleming spending function. A retrospective study of the level of
serum total bilirubin and survival with the native liver in children with biliary
atresia treated with hepatoportoenterostomy at the participating centers
provided our estimate of 50% for the primary end point in the placebo
group. The expectation for steroids to improve the primary outcome to
75% was based on 2studies published before the initiation of START
reporting that the use of corticosteroids after hepatoportoenterostomy
was associated with resolution of jaundice in 76% to 79% of patients.
46
2. Use of corticosteroids after hepatoportenterostomy
for bile drainage in infants with biliary atresia. –The
START randomized clinical Trial(JAMA, 2014; 311(17), 991750-1759)
1.
2.
3.
4.
5.
6.
2つの先行研究をもとに設定
×
α = 0.05，1-β = 0.80，介入群成功率 = 75%，統制群成功率 = 50%
×
脱落率を20%と想定
最終的なサンプルサイズは，各群70名
臨床的に意味のある差（δ）が設定されていない。情報
不足。（仮に，0.03とすると，ほぼ本研究の例数設計通り
になるが，臨床的に妥当な値であるか不明）
47
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
48
3. Ranibizumab versus Bevacizumab for neovascular
age-related macular Degeneration: Results from the
GEFAL noninferiority randomized trial
(Ophthalmology, 2013; 120, pp883-891)
非劣性試験
Patients:新生血管の加齢性黄斑変性症
Interventions:ラニビズマブ（商：ルセンティス）
Controls:ベバシズマブ（商：アバスチン）
Outcome:最良矯正視力＜ETDRS＞
49
3. Ranibizumab versus Bevacizumab for neovascular
age-related macular Degeneration: Results from the
GEFAL noninferiority randomized trial
(Ophthalmology, 2013; 120, pp883-891)
The margin of clinical noninferiority was fixed at 5 letters (the difference
between groups in mean BCVA change from baseline to final evaluation).
By assuming a standard deviation (SD) for changes in BCVA of 15 letters,
the sample size would be 200 patients per group to provide a power of at
least 90%. The total number of patients to be included in the study was
500 to allow for potential dropouts.
1.
2.
3.
4.
5.
6.
×
marginδ = 5文字（標準偏差15文字）
α = ×，1-β = 0.90，※それ以外の情報×
計算上のサンプルサイズは，各群200名
脱落があることを想定，※脱落率の情報×
最終的なサンプルサイズは，500名
各群の効果量とそれが定
められた経緯が示されて
いないので，つい計算不
可能。情報不足。
50
4. Non-inferiority of short-term urethral catheterization
following fistula repair surgery: study protocol for a
randomized controlled trial
(BMC Women’s Health, 2012; 12, Online)
非劣性試験
Patients: 産科ろう孔の女性患者
Interventions: 7日間の短期間カテーテル（導尿）
Controls: 14日間のカテーテル（導尿）
Outcome: カテーテル抜去後7日以内のろう孔の修繕
51
4. Non-inferiority of short-term urethral catheterization
following fistula repair surgery: study protocol for a
randomized controlled trial
(BMC Women’s Health, 2012; 12, Online)
[Rationale for the non-inferiority hypothesis and for sample size estimation]
The research question of interest is whether short-term catheterization is not
worse by more than a minimal relevant difference than longer-term catheterization
in terms of achieving fistula closure. This question lends itself to a non-inferiority
design.
The choice of a non-inferiority margin, i.e. the smallest clinical difference that is
acceptable between the two treatments, is based on a combination of clinical
judgment and statistical reasoning. Because there are no data from previous trials
to help define the clinical difference between treatments, we have relied on our own
and outside experts’ clinical judgment to determine that a margin of inferiority of
10% is an irrelevant small difference. In other words, if the two-sided 95%
confidence interval for the difference in fistula repair breakdown rates (“7-day”
minus “14-day”) lies fully to the left of the 10% non-inferiority margin, we will have
proved non-inferiority of the “7-day” procedure at the level of significance α =
0.025; superiority (as a bonus) will be demonstrated at the level of significance α =
0.05 if the two-sided 95%CI lies fully to the left of 0.
52
Analyses were conducted using preliminary data from the Fistula Care/USAID
prospective cohort study examining fistula repair outcomes in order to determine the
probability of successful closure in women with simple fistula catheterized for longer
periods of time (i.e. the equivalent of the “standard” treatment group in the study
outlined here). Among the women with simple repairs in the prospective study for
whom follow-up data were available (n = 145), 87% had a closed fistula at 3 months
follow up. Thus, we believe that it is reasonable to expect the failure rate (e.g.
proportion of fistula that are not closed) to be between 10 to 15%.
Assuming 13% failure rate in the control group, non-inferiority will be demonstrated
within the margin of 10% at a one-sided significance level of 0.025 and a power of
80% (calculated when failure rates in both arms are the same), with a sample size of
177 per arm (354 women in total). Adjusting by 20% for loss to follow-up and 19%
for protocol violations and withdrawals, this would result in a sample size of 507
women. Each site will randomize 64 women, for a total sample size of 512.
1.
2.
3.
4.
5.
6.
自分たち＋外部の臨床的判断により
marginδ = 10％
α = 0.025，1-β = 0.80，14日カテの修繕発生率 =13%（両群発生率は同じ）
計算上のサンプルサイズは，各群177名（総計354名）
20%の脱落と19%のプロトコル違反や撤回を想定，端数は割り切れるように設定
最終的なサンプルサイズは，512
53
Rでの計算例
##パッケージの読み込み
library(TrialSize)
####比率差の検定【優越性試験または非劣性試験】における例数設計
TwoSampleProportion.NIS(alpha, beta, p1, p2, k, delta, margin)
alpha: 有意水準
beta: 1-検定力（検出力）
p1: 新薬の平均反応率
p2: プラセボ薬の反応率
k 両群のサンプル割合
delta: p1-p2
margin: 優越性または非劣性のマージン
##例数設計＜α＝0.025，β＝0.20，p1＝0.13，p2＝0.13，margin＝0.10＞
x <- TwoSampleProportion.NIS(0.025, 0.20, 0.13, 0.13, 1, 0, 0.10)
x
→177.5413
54
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
55
5. Treatment of fast breathing in neonates and young
infants with oral Amoxicillin compared with penicillin –
Gentamicin combination (Study protocol for a
randomized, open-label equivalence trial)
(The Pediatric Infectious Disease Journal, 2013; 32(9), ppS33-38)
同等性試験
Patients: 頻呼吸の乳児
Interventions: 口腔摂取アモキシシリン
Controls: ペニシリン
Outcome: 入院後8日以内の治療失敗
56
5. Treatment of fast breathing in neonates and young
infants with oral Amoxicillin compared with penicillin –
Gentamicin combination (Study protocol for a
randomized, open-label equivalence trial)
(The Pediatric Infectious Disease Journal, 2013; 32(9), ppS33-38)
The sample size calculations assumed that the statistical analysis will be
based on a comparison between the failure rate observed with the
reference treatment regimen of injection penicillin and gentamicin for
7days (assumed treatment failure rate of 8 %) and the experimental
regimen of oral amoxicillin for 7days. A point estimate of the failure rate
difference (experimental – reference treatment) between the 2 treatment
regimens will be calculated together with a 2-sided 95% confidence
interval. The alternative treatment will be judged to be “of similar
effectiveness” to the reference treatment if the upper bound of the 95%
confidence interval lies below the allowed “similarity margin” of ＋4%. A
power of 90% to demonstrate the similarity of 2 treatments over the 7day period following randomization was required, assuming that the true
failure rates with the reference treatment and the experimental treatment
57
regimens will be identical (assumed to be 8%).
Using the above assumptions, the required sample size was determined to
be 1150 infants for each treatment group, which is likely to yield 970
“analyzable” infants per treatment arm.
1.
2.
3.
4.
5.
6.
×
marginδ = 4%（？）
α = ×，1-β = 0.90，ペニシリンの失敗率 =8%（両群失敗率は同じ）
計算上のサンプルサイズは，各群970名（？）
×
最終的なサンプルサイズは，各群1150名
各群の比率は書かれているが，定められた経緯について書かれていない。
またαの値も示されていないため，情報不足。
加えて，The alternative treatment will be judged to be “of similar effectiveness” to the
reference treatment if the upper bound of the 95% confidence interval lies below the allowed
“similarity margin” of ＋4%. 「95%信頼区間の上限が＋4%を超えていなければ」
という表現は，非劣性試験の考え方であり，同等性試験の書き方としては，
不適切な可能性あり。
58
6. Benefits and costs of home-based pulmonary
rehabilitation in chronic obstructive pulmonary disease
– a multi-centre randomised controlled equivalence
trial
(BMC Pulmonary Medicine, 2013; 13, Online)
同等性試験
Patients: 慢性閉塞性肺疾患（COPD）
Interventions: 自宅での肺（呼吸）リハビリテーション
Controls: 病院での肺（呼吸）リハビリテーション
Outcome: 6分間歩行距離（6MWD）
59
6. Benefits and costs of home-based pulmonary
rehabilitation in chronic obstructive pulmonary disease
– a multi-centre randomised controlled equivalence
trial
(BMC Pulmonary Medicine, 2013; 13, Online)
・・・. A per-protocol analysis will also be conducted to reduce the risk of
Type 1 error, as recommended in the CONSORT Extension for reporting
of non-inferiority and equivalence trials. Alpha will be set at 0.05.
Sample size calculation are based on the primary outcome of change in
6MWD. If there is truly no difference in the change in 6MWD between
home-based and hospital-based groups, then 144 participants are
required to be 80% sure that the 95% confidence interval will exclude a
difference in means of more than 25 metres. We have recently
demonstrated that this is the minimal important difference for 6MWD in
our population of patients with COPD[*]. This assumes a standard
deviation of the change in 6MWD of 51 metres. On the basis of our
previous trials in pulmonary rehabilitation[*], we expect 15% attrition from
the study; we sill therefore randomized a total of 166 participants.
60
1.
2.
3.
4.
5.
6.
自分たちが実施した先行研究をもとに設定
marginδ = 25m
α = 0.05，1-β = 0.80，平均値差 = 0?，標準偏差51m
計算上のサンプルサイズは，144名
自分たちの先行研究から，脱落率を15%と想定
最終的なサンプルサイズは，166名
61
Rでの計算例
##パッケージの読み込み
library(TrialSize)
####平均値差の検定【同等性試験】における例数設計
TwoSampleMean.Equivalence(alpha, beta, sigma, k, delta, margin)
alpha: 有意水準
beta: 1-検定力（検出力）
sigma: 標準偏差
K: 両群のサンプル割合
delta: 同等性のマージン
margin: 真の両群の平均値差
##例数設計＜α＝0.05，β＝0.20，sigma＝51，margin＝25＞
x <- TwoSampleMean.Equivalence(0.05, 0.20, 51, 1, 25, 0)
x
→71.27861
62
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
63
7. 治験実施計画書番号：5-78（商品名オルメス）
1.
2.
3.
4.
5.
6.
×
平均値差 3mmHg （標準偏差 6mmHg）
α = 0.05，β = 0.80
×
×
100
64
8. Comparing the feasibility, acceptability, clinical-, and
cost-effectiveness of mental health e-screening to
paper-based screening on the detection of depression,
anxiety, and psychosocial risk in pregnant women: a
study protocol of a randomized, parallel-group,
superiority trial.
(Trials, 2014; 15, Online)
・・・. The sample size calculation (Table 2) indicates that 261
women per group (n = 522) is required.
65
本日のもくじ
RCTにおける例数設計の前提
1. 目的・仮説・デザイン・エンドポイント
2. 臨床的に意味のある差
3. 計算に必要なパラメータ
RCTにおける例数設計の手続きと書き方
1.
2.
3.
4.
優越性試験における例数設計の書き方
非劣性試験における例数設計の書き方
同等性試験における例数設計の書き方
そのほかのこと
まとめと参考文献
66
RCTにおける例数設計の書き方
※目的と仮説を明確にし，優越性試験・非劣性試験・同等性試
験のいずれかを明記する【Introduction/Methods】
①どういう判断で（先行研究／予備試験／臨床的判断・・），効
果量や限界値の想定値を記載する
②臨床的に意味のある差（δ）の想定値を記載する
③α，β，各群の比率（名義変数），標準偏差（連続変数），の
想定値を記載する
④計算上のサンプルサイズを記載する
⑤脱落や欠損は，どういう判断で（先行研究／予備試験／臨床
的判断・・），いくつに見積もったかを記載する
⑥最終的なサンプルサイズを記載する
（⑦中間解析の際、再計算でサンプルサイズはいくつになった？）
67
おわりに
“For scientific and ethical reasons, the
sample size for a trial needs to be planned
carefully, with a balance between medical
and statistical considerations.”
「科学的かつ倫理的な理由で，サンプルサイズは医学的な検討
事項と統計学的な検討事項のバランスをとりながら，慎重に計
画される必要がある。」
(CONSORT 2010 Explanation and Elaboration: updated guidelines for
reporting parallel group randomised tirals)
68
参考文献
69
参考文献
Charles P. et al., Reporting of sample size calculation in
randomised controlled trials: review., BMJ, 2009, ,338, Online
Moher D. et al., CONSORT 2010 explanation and elaboration:
updated guidelines for reporting parallel group randomised
trials., BMJ, 2010, 340, Online
Piaggio G. et al., Reporting of noninferiority and equivalence
randomized trials. –Extension of the CONSORT 2010
statement-, JAMA, 2012, 308(24), pp2594-2604
Schiller P. et al., Quality of reporting of clinical non-inferiority
and equivalence randomised trials. –update and extension-,
Trials, 2012, 13, Online
70
参考文献
「医学的研究のデザイン―研究の質を高める疫学的アプローチ
― 第3版，メディカル・サイエンスインターナショナル，（木原雅
子／木原正博訳）
医学的介入の研究デザインと統計―ランダム化／非ランダム化
研究から傾向スコア，操作変数法まで―，メディカル・サイエンス
インターナショナル，（木原雅子／木原正博訳）
リサーチ・クエスチョンの作り方（臨床家のための臨床研究デザイ
ン塾テキスト），福原俊一
71