The $10 billion unicorn in INmune Bio: cognition matters - on EMACC, CDR-SB, and why others may have failed

Summary

Alzheimer’s is a disease of slowly fading cognition. If you would ask me how much my grandmother’s Alzheimer’s progressed over the years, I would not have been able to give you an answer. Truly, we only clearly saw it in the last years of her life. Only at the end, when she just sat there without saying much any more, did it become clear how much of her mind she had already lost.

Testing cognition matters, but cognition is a tricky thing. Trial failures in Alzheimer’s may be attributable in large part to testing with rating scales that are just not up for the job.

INmune tackles that, and of all the particularities of the trial, such as the fact that the trial is actually powered on CDR-SB, have probably taken me the longest to understand. How the company tackles that, and how essential sensitivity of cognitive testing is to prevent trial failure, is the subject of this blog. People in the discord group requested me to make this blog first, so here it is. This is a long but necessary post, so please bear with me.

Cassava Sciences, Athira Pharma and Cortexyme : some ADAS-Cog-failures

Introduction

For the past two decades, the 11-item Alzheimer’s Dis ease Assessment Scale–Cognitive subscale (ADAS-Cog 11) has been a nearly ubiquitous measure of cognition in clinical trials of putative new therapies for Alzheimer’s disease. The rating scale was originally designed to assess the severity of cognitive dysfunction from mild to severe AD. This is how that looks like: and note the subjectivity of assessment in some cases and ease of questions. E.g.: the subject is asked the data, month, year, days of the week, season, time of day, place and person, or different tasks relying on open-ended conversation.

Inaptitude for earlier stages of disease

However, use of this scale has revealed a number of limitations, particularly driven by a shift toward assessing and treating patients in earlier stages of the disease. ADAS-Cog is not sensitive to detect change over time at pre-dementia stages of the disease such as MCI. It seems most sensitive when used in patients in the moderate stage of AD.

Cognitive domains known to be impaired at the early stages, such as executive function, are also not captured by ADAS-Cog 11.

Ceiling effects

As a result, subjects with mild cognitive impairment (MCI) tend to score at ceiling (i.e., score of 0 for ADAS-Cog) on as much as eight of the 11 ADAS-Cog subtest items.

A ceiling effect occurs when a test cannot detect differences at the less/least impaired end of the scale, meaning it may not adequately distinguish between normal cognition and very mild impairment.

Floor effects occur where a test cannot distinguish differences at the severe end of impairment.

So ceiling effects are most important to avoid in early stages of disease.

Variance and subjectivity

Additionally, significant variance in administration procedures and materials used for the ADAS-Cog-11 across clinical trials has been found, which threatens inter observer, intra-observer, and test-retest reliability.

Learning and placebo effects

Learning effects may also be a concern, with possible placebo effects.

ADAS-Cog failures: Cassava, Athira and Cortexyme

On November 25 , 2024, Cassava Sciences announced that the Phase 3 trial ReThink-ALZ did not meet the trial’s co-primary endpoints. The co-primary endpoints were the change in cognition and function from baseline to the end of the double-blind treatment period at week 52, assessed by the ADAS-COG12 and ADCS-ADL scales, comparing simufilam to placebo. The press release mentioned: “Despite that, the loss of cognition in the placebo group was less pronounced than was previously reported in other placebo-controlled studies in AD. We are working to understand this better.” Cassava Sciences never understood this better.

On September 3, 2024, Athira Pharma announced topline results from its Phase 2/3 trial in mild to moderate Alzheimer’s disease. The trial did not meet primary endpoint of GST and key secondary endpoints of cognition (ADAS-Cog11) and function (ADCS-ADL23), but showed a numerically greater treatment effect in patients with moderate Alzheimer’s. All biomarkers such as Aβ42/40, p-Tau181, p-Tau217, GFAP and NfL showed directional improvements with fosgonimeton treatment.

On 26 October 2021, Cortexyme announced that its Phase 2/3 trial in mild to moderate Alzheimer’s disease failed to separate from placebo. The trial’s co-primary endpoints were ADAS-Cog11 and ADCS-ADL. In a subgroup with p.gingivalis infection, the drug’s target, the drug showed 57% slowing of cognitive decline. I plan to address that more in my upcoming blog on viruses and toxins as drivers of neurodegenerative diseases – and logically those diseases come with inflammation.

I quote the afore-mentioned examples because each of these companies have had high valuations of $4, $1 and $2 billion respectively based on expected efficacy, and at least some of them actually detected an efficacy signal in their trials. Each of these companies used an ADAS-Cog measuring scale, and each of them detected that something went wrong in the trial: whether it was placebo performing better than usual (Cassava Sciences), or better efficacy in patients with moderate Alzheimer’s, or better performance of patients that match the drug’s MoA.

That is not to say trials with ADAS-Cog will always fail; Biogen/Eisai and Eli Lilly have shown differently, in large long trials, in patients with MCI and mild AD. But many others did fail.

Trial design is crucial. I cannot underscore this enough. You are testing the function of a slowly decaying brain, and should do so as objectively and adequate as possible. The use of ADAS-Cog, probably even in drugs which may have some efficacy, may lead to failure of trials. Irrespective of XPro’s potential, I would not be an investor if INmune would have used that scale as its primary endpoint in MCI and early AD.

Inadequate adaptations

A number of adaptations have been created to deal with the afore-mentioned problems, such as the iADRS and the ADCOMS, but none of those has been found to be very adequate.

Rating scales for MCI and mild AD

The FDA’s request to find tools to assess cognition in early AD

FDA’s draft guidance on early AD relates to the stages of AD that occur before the onset of overt dementia. Stages 1 through 3 are collectively referred to as “early AD” . Stage 3 concerns patients with characteristic pathophysiological changes of AD, and mild but detectable functional impairment, and roughly corresponds with the syndrome of “mild cognitive impairment”. Stages 4, 5, and 6 concern patients with overt dementia, progressing through mild, moderate, and severe stages.

Specifically for these patients (with MCI), the FDA considers that many of the assessment tools typically used to measure functional impairment in patients with later dementia stages of AD (Stages 4 through 6) may not be suitable for use in early AD patients. FDA encourages the development of novel approaches to the integrated evaluation of subtle functional impairment that arise from early cognitive impairment (e.g., facility with financial transactions, adequacy of social conversation). In patients in the earliest clinical stages of AD, FDA will consider strong justifications that a persuasive effect on cognition alone as assessed by sensitive neuropsychological tests may provide adequate support a traditional approval, and surrogate endpoints or intermediate clinical endpoints that do not directly measure clinical benefit but that are considered reasonably likely to predict clinical benefit may support an accelerated approval.

CDR-SB

Introduction

Trials these days are focusing on treatment of patients earlier in the disease, because the later intervention occurs, the less the chances of full recovery. Biogen has used the Clinical Dementia Rating – Sum of Boxes or CDR-SB as primary endpoint, and this was also a secondary endpoint in Eli Lilly’s trial.

The CDR-SB is derived from the Clinical Dementia Rating (CDR) scale, which involves a semi-structured interview with the patient and a reliable informant (e.g., family member). The clinician rates six domains: memory, orientation, judgment/problem-solving, community affairs, home/hobbies, and personal care. Based on this input, the clinician makes up a score.

The trials for Leqembi and Kisunla were well overpowered

I explained in my blog post ‘How only INMB uses the recipe behind big pharma’s recent successes with anti-amyloid antibodies’ that, contrary to many trials before, the Leqembi and Kisunla trials enrolled patients whose biology matches the drug’s mechanism of action.

This is a necessary recipe for success that is also implemented by INmune, contrary to pretty much all other companies.

Even though these drugs have low efficacy, if one looks at the Phase 3 data from Leqembi and Kisunla, Biogen and Eli Lilly could have stopped those trials at the six-month timepoint or sooner with statistically positive results.

CDR-SB flaws: difficult to assess and potential subjectivity

The CDR-SB remains a scale which is not very precise and comes with flaws based on potential subjectivity.

Several articles have discussed that subjectivity arises from reliance on qualitative interviews, with informant-based assessments like CDR-SB being less objective than neuropsychological tests, carrying the risk of inter-rater inconsistencies especially in milder stages. The sources of subjectivity are related to informant reliability and clinical interpretation. The accuracy of informant reports can vary based on their relationship to the patient, observation skills, or emotional state. Scoring furthermore depends on the clinician’s experience and how they weigh patient and informant input, especially in borderline cases (e.g., CDR 0 vs. 0.5).

For example, an informant may ask about a patient’s skills, e.g. ‘do you experience difficulties with your memory’, or tell him about the events that the informant has shared. The clinician uses the information gathered from the informant and from the patient to make a decision about what is the right score for this patient, and will assign a stage staging score from zero to three. There is a lot of noise and variability, and sometimes the informant changes in the course of the trial.

Several factors, such as the relationship with the informant and the frequency of contacts with the informant, may have effects that are as large as the treatment differences over 18 months seen in the successful trials of Biogen and Eli Lilly.

There is, hence, a need for more objective measures. That’s where EMACC comes in.

The Early/Mild Alzheimer’s Cognitive Composite: EMACC

Introduction

The EMACC was designed by Lundbeck Pharma to specifically to serve as an outcome measure with the sensitivity required for detection of cognitive changes in short-term clinical trials in early AD. It was empirically derived from a battery of neuropsychological tests, and is a composite of six validated clinical and neuropsychological tests commonly used for assessment of cognitive function: International Shopping List Task, Digit Span-Forward and Backward, Category Fluency, Letter Fluency, Digit Symbol Coding and Average of Trail-Making Test Parts A and B. This is how that looks.

EMACC has been used or is in current use by three different pharmaceutical companies in five trials in early AD. Biogen’s Tango study was one of them.

EMACC solves the issues the other rating scales had

Over a 2-year timeframe, EMACC is much more precise than ADAS-Cog13, and slightly more precise than CDR-SB. There is no risk of learning or placebo effects, even with frequent testing. EMACC reduces such risk, because the test has alternate forms, so that for example the words on the word list are different every single time, and there are different versions of many of the tests that are given. There is no risk of inter-rater subjectivity as well.

EMACC seems to be exactly what the FDA is suggesting in its most recent guidance on early AD.

The MINDful Phase 2 trial

Introduction

INmune announced full patient enrollment in its placebo-controlled randomized MINDful trial in November 2024. The trial enrolled 208 patients with mild cognitive impairment (MCI) or mild Alzheimer’s disease (mild AD) for a duration of six months, and has the following efficacy endpoints: EMACC, CDR-SB, E-cog, and ADCS-MCI-ADL.

Baseline characteristics: CDR-SB similar to Leqembi’s study

These are the trial’s baseline characteristics on the MMSE (screening tool), CDR-SB and EMACC rating scales. CDR-SB score on study day 1 was 3.03 on average. A lower CDR-SB score indicates better cognitive and functional performance. A score of 0 means no cognitive impairment, 0.5–1 means very mild impairment, 1–2 means mild impairment, 2.5–4 means mild dementia, 4.5–9 means moderate dementia, and >9 means severe dementia.

That would mean the patients enrolled in INmune’s trial have cognition similar to those enrolled in Biogen’s trial for lecanemab. On CDR-SB, baseline data for Biogen’s Leqembi was 3.22 for placebo and 3.17 for treated patients.

It also means cognitive efficacy should be detectable. As a reminder, the adjusted mean difference versus placebo in lecanemab’s trial at 18 months was -0.45 points on CDR-SB, on an absolute mean change of 1.21 for lecanemab and 1.66 for placebo.

Baseline characteristics: APOE4 similar to Leqembi study

We know from my blog of October 2022 that APOE4 is the biggest genetic risk factor for AD and a large driver of inflammation. APOE4 carriers are inflamed, and the APOE4-positive patients in INmune’s Phase 1 trial were inflamed looking at their inflammatory cytokines.

For MINDful, 69% of patients were APOE4 carriers.

This is similar to the lecanemab trial, in which 68% of patients were APOE4 carriers.

Correlation between CDR-SB and EMACC scores at baseline and large effect size

In September 2024, INmune announced a highly significant correlation (p<0.001) between baseline scores on EMACC and CDR-SB, the secondary endpoint in the MINDful trial.

When EMACC was measured during the screening process and again at the first study visit before treatment, the correlation was found to be 0.93.

The difference in EMACC performance between patients with CDR global ratings of 0.5 (prodromal AD) and those rated 1.0 (mild dementia) was very large, with an effect size (Cohen’s d) of 0.87 (p<.0001).

No ceiling effects

The EMACC does not come with floor or ceiling effects, which means it is performing as it should for the right patient population. The test can pick up worsening and improvement in any patient in this study. That will make the trial sensitive to change.

No subjectivity issues

A correlation of 0.93 on cognitive tests is close to the maximum possible value of 1, which is exceptional and means the test is very reliable, without rater bias and learning effects.

Sensitivity

Patients who were classified with respect to stage 0.5 on CDR-SB score remarkably lower on EMACC than those classified with rating 1, meaning EMACC captures these subtle differences largely.

Consequence for the ongoing trial

The above are ‘just’ baseline data, but what they show is that EMACC is detecting changes in the trial’s patient population, without ceiling effects, inter-rater subjectivity or learning effect, and should be sensitive to detect even small changes in cognition.

INmune’s trial design based on ADNI

INmune’s trial design is very particular. INmune enrolled patients with biomarkers of inflammation, who typically progress faster (cfr. my blog post ‘On fast-progressors in AD, APOE4, TREM2 and EMACC’ and ‘inflammation drives cognitive decline’).

The ADNI-database, a long-standing and often used American initiative, is what informed the trial, showing a much larger effect size change compared to CDR-SB in patients with inflammation.

Patients with neuroinflammation have more rapidly progressive cognitive decline than patients without neuroinflammation, and that progression occurs with less variance.

Effect size is a statistical concept that measures the strength or magnitude of a relationship or difference between two groups. Unlike p-values, which tell you whether a result is statistically significant, effect size tells you how big the effect actually is.

Using the EMACC the effect size difference in 12 month change between biomarker groups was substantially larger than for ADAS-Cog or CDR-SB.

In patients that have biomarkers of inflammation, the EMACC was almost twice as good at capturing change in those patients than those that did not have inflammation.

Trial assumptions

On the basis of that, INmune made assumptions as to how placebo and XPro would work, and how many patients would be needed in a six-month trial to detect efficacy on EMACC, CDR-SB and ADAS-Cog13. The differences in sample size required to determine a treatment affect was substantial when comparing EMACC, CDR-SB and ADAS-Cog 13. For the upper and lower assumptions of meaningful difference between groups, sample size requirements were 56 for patients with inflammation on EMACC (86 for non-inflamed patients) and 126 for CDR-SB (224 for non-inflamed patients).

Powered on CDR-SB’s one-year change: 38.5% over-enrolled

Importantly, the trial is powered to the change of the CDR-SB in patients with inflammation over one year. Based on the ADNI cohort’s decline on CDR-SB, the trial would need to enroll 150 patients to be 80% powered to detect efficacy.

The trial enrolled 208 patients, so that is about 38.5% more than necessary.

The assumption for CDR-SB is that patients on placebo would only drop 0.2 points on CDR-SB on average. I believe that assumption is very much on the safe side. In fact, in a paper dedicated to modeling the natural cognitive decline of early to moderate AD using the ADNI database as well as placebo arms of several drug trials, a delta of 1 - 1.5 from baseline at 12 months seems to be common.

This also matches the placebo decline of roughly 1.2 from baseline for the placebo arm of the Leqembi trial, and the same goes for donanemab.

Actually, placebo dropped by about 0.9 points at 6 months in the donanemab trial.

Basically, both other research and the anti-amyloid trials indicate a placebo decline much steeper than the trial’s assumptions, meaning the trial’s assumptions may be on the extreme safe side of things.

Unnecessary to say, without only 66 patients assumed to be required to detect statistical significance on EMACC, the trial is overpowered to detect efficacy on that rating scale.

XPro’s power potential

What can we now expect from XPro?

XPro is the best selective anti-inflammatory drug fit for the central nervous system, and that may remain so for the foreseeable future.

As a reminder:

XPro’s MoA is fundamentally different from the old amyloid-based approach the field has been so reluctant to let go off, perhaps even with some fraud involved;
XPro’s biomarkers largely outperform those of Leqembi and Kisunla, and the FDA may push for accelerated approval based on NfL, ptau-217 or GFAP once these come out;
XPro is the only drug that leads to remyelination, which may be able to explain the fast recovery seen in patients;
The preclinical work shows that selective inhibition of soluble TNF is the way forward to correctly adapt the behavior of the immune cells of the brain.

It is logical, then, that XPro would largely outperform anti-amyloid antibodies.

The trial’s duration and size are the result of a combination of the expected power of XPro, in the right patient population with biomarkers of inflammation, with the most precise cognitive assessment tool.

How fast could statistical significance be achieved?

Sanofi’s SAR441566

Sanofi’s SAR441566 is a promising TNFR1 antagonist which inhibits solTNF signaling without affecting TNFR2 signaling. That comes close to XPro’s mechanism of action, although XPro is being tested for CNS indications. Sanofi is developing this to treat peripheral inflammatory conditions. Its results in psoriasis came back particularly interesting: in only 37 patients, SAR441566 was able to show statistical significance after only 2 weeks.

It’s unclear whether the primary endpoint also reached statistical significance at 2 weeks, but it certainly did at four weeks.

SAR441566’s results here could confirm the fast onset of efficacy and cognitive benefit as seen with XPro.

CervoMed’s neflamapimod

CervoMed recently also announced data from its anti-inflammatory compound neflamapimod as tested in patients with Dementia with Lewy Bodies on CDR-SB over 16 weeks, comparing 55 vs. 94 people. Now, this is not placebo-controlled, strictly speaking. The trial failed in first instance, allegedly due to an old batch with low bio-availability of neflamapimod. In the open label extension trial, CervoMed compared on the new batch versus those on the old batch, alleging that both groups serve as their comparators and hence there could not be placebo effects.

The difference on CDR-SB was -0.73 over 16 weeks, which is much larger than the 0.45 and 0.67 delta seen on CDR-SB with Leqembi and Kisunla respectively, and the p-value was an impressive p<0.001.

Neflamapimod is an anti-inflammatory drug, which is a brain-penetrant p38α MAPK inhibitor which has originally been developed by Vertex Pharmaceuticals for rheumatoid arthritis.

If CervoMed’s results could be confirmed in a placebo-controlled trial, then it would be yet another confirmation of the potential of XPro’s fast-onset and strong efficacy on cognition. The above may be an indication of fast-onset efficacy, and dosing dependence, from an anti-inflammatory treatment. But we’ll know about XPro’s efficacy long before that.

Conclusion

This was a long post. Of all particularities that come with INmune Bio, the company’s enamoration with EMACC took me the longest to understand, but it is integral to their potential of success.

ADAS-Cog is a recipe for failure in early AD

ADAS-Cog is a recipe for a high likelihood of failure, in any trial, in my eyes. Particularly for earlier stages of disease, that rating scale is simply not appropriate, and is prone to placebo effects. The more I have understood that, the more likely it became that companies measuring that rating scale had high chances of failure.

The FDA sees it like this for a while now, for early AD. It promotes the development of more sensitive rating scales for earlier stages of the disease, where intervention may yield more benefits than later.

CDR-SB is better, but imperfect

CDR-SB appears, by all means, more appropriate than ADAS-Cog. Even though it still comes with subjectivity and rather basic scoring, Leqembi and Kisunla have shown that CDR-SB can produce statistically significant results in MCI and early AD.

Trial design is essential - Biogen and Lilly lead the way

Both companies have followed a trial design similar to that of INmune Bio, enrolling only patients whose biology is in line with the drug’s MoA. That could explain many previous failures in the AD space, is part of INmune’s recipe for success, but many others did not use that recipe. Cortexyme did not only enroll patients with a p.gingivalis infection, but saw a 57% slowing of cognitive decline in that subpopulation. That is actually in strong support of XPro's MoA, which I will cover in my blog on toxins and viruses as drivers of neurodegenerative diseases.

Similarities at baseline bode well

INmune hopes to show the world that Alzheimer’s is an immunological disease when announcing its results in June 2025. The similarities of patients enrolled in MINDful and those enrolled in Leqembi's trial, from the perspective of CDR-SB rating and APOE4, already bode well.

The entirety of the data indicate this is the right approach

XPro’s mechanism of action is selectively anti-inflammatory, only blocking soluble TNF and thereby pro-inflammatory signaling through TNF receptor 1. The understanding of the deleterious actions of immune cells in an inflamed brain together with the entirety of the data, whether preclinical or clinical, biomarker-related, anecdotal or otherwise, including remyelination, indicates that this is the right approach.

MINDful's assumptions are on the safe side

EMACC is the trial’s primary endpoint, but the trial’s assumptions are based on the expectation of how CDR-SB will evolve over the course of six months. I believe these assumptions are on the safe side, as other trials have shown that placebo declines much more rapidly that the trial’s assumption. Faster declining placebo would be all the better to show effect size.

EMACC works as it should

The baseline data showed that EMACC performs as it should, without ceiling effects, with no inter-rater variability, with a correlation at baseline, and with good correspondence between EMACC and CDR-SB.

Potential indications of statistically significant fast-onset efficacy: Sanofi and CervoMed

Will XPro work fast? It has shown to do so in the past. If INmune’s data would not be sufficient, Sanofi’s small placebo-controlled trial of SAR441566 in psoriasis may give an indication, with statistical significance established as soon as 2 weeks after treatment. Otherwise, CervoMed claims to have proven efficacy stronger than anything anti-amyloid antibodies have ever shown, after 16 weeks as compared to a dose with lower bio-availability, with a brain-penetrant anti-inflammatory compound that was originally developed by Vertex Pharmaceuticals. But that result should be repeated in a blinded study.

Looking ahead

The results of XPro will come in June 2025. Shorts keep piling in, but if would be them, I would not be so sure this time. That’s now around the corner. I’m hopeful. And happy that that moment is finally around the corner now.

---

INmune Bio Investors Discord Server: https://discord.gg/JEA8r7wCGY