2017 Public Health Grand Rounds 02/17

PUBLIC HEALTH GRAND ROUNDS Linking Research to Community Health Improvement Jointly sponsored by the Department of Public Health Sciences & URMC Center for Community Health

- Good morning, how are you Ann?
So I'll start now, then.
Okay, welcome everyone to the Public Healths Grand Round.
So day our invited speaker is Dr. Laurent Glance
from anesthesiology.
And the topic is the use of performance measurement
for improving quality and reducing cost of care.
So Dr. Laurent Glance is a tenured professor
and a Vice-Chair for Research
in the Department of Anesthesiology
and Professor of Public Health Sciences at URMC here.
And also senior scientist at RAND Health.
Dr. Glance receive his undergraduate degree in physics
from Dartmouth and his doctor of medicine from Cornell.
Dr. Glance served his residency in anesthesiology
at the New York Hospital,
followed by a fellowship in critical care medicine
at Memorial Sloan-Kettering.
He is a past recipient of a Research Career
Development Grant from AHRQ,
which focused on the optimization
of risk-adjustment methodologies for measuring
intensive care unit quality.
Dr. Glance was the PI on the AHRQ-funded R01 Grant,
which used the National Trauma Databank as a platform
to determine whether providing hospitals
with trauma report cards would lead to improve
the population outcomes in trauma.
Dr. Glance was also the PI on AHRQ-funded R01
which focused on the implications
of the present-on-admission or POA indicator
in administrative data to differentiate
the pre-existing conditions from complications.
Dr. Glance is helping to spearhead the development
of the ACOG-ASA Maternal Quality Improvement Program,
a partnership between the ASA and the American Congress
of Obstetricians and Gynecologists to collect outcomes data
on the clinical course of childbirth in the United States.
This data will be used to improve the quality
of maternal care at the local level,
establish national performance benchmarks for obstetrics
and obstetrical anesthesia,
and facilitate a comparative effectiveness research.
So welcome, Larry.
(audience applauding)
- Thank you very much for that really nice introduction.
It's fun to be back here.
Just a little bit of a disclaimer,
my day job is I'm a cardiac anesthesiologist,
so I'll try not to put all of you folks to sleep
over the course of the next hour or so.
Sometimes it sort of helps to sort of think
a little bit outside the box,
in terms of some of the issues,
some of the challenges that we have in US healthcare system.
And this is from the IOM.
And they talk about some problems with communication,
coordination, transparency, accountability,
and standardization in US healthcare systems.
So if building a house was like the US healthcare system,
the carpenter, the contractor, the plumbers might not
always speak to each other,
and they might work from a different set of blueprints.
If shopping for groceries was like the US healthcare system,
the prices might not be posted, transparency.
Accountability, if shopping for a car
was like the US healthcare system,
if you had a problem with your car within the first 30 days,
you bring it back where you bought it,
and you would have to pay for the repairs.
And then finally, if flying an airplane was like
the US healthcare system, the pilot and co-pilot,
before taking off, wouldn't have to go through a checklist.
Okay, so what do I want to talk about?
So we're all familiar with the Affordable Care Act.
And it may not be with us for very long,
but the goal of the Affordable Care Act was
to expand access.
And doing that is actually pretty expensive.
So you've gotta find a way of rolling healthcare costs
or the increases in healthcare costs
that you're going to encounter when you increase access.
And the thinking was that if we improve quality,
if we prevent complications, if we reduce length of stays,
that we would be able to cut costs
and thereby fund the increases in healthcare access.
And if you're gonna do that, if you're gonna tie,
if you're gonna change the way you reimburse healthcare,
going from paying for volume to paying for quality,
you need to have a way of measuring the quality
of the healthcare that you deliver.
So this reminds me that I need to give you
a set of objectives for this talk,
and this is my daughter back when she was very small.
In my family, my wife always talks about going
from Point A to Point B, so I have to tell you
what Point B is going to be.
And Katie, when she was about this age,
we were all kayaking on Blue Mountain Lake at The Hedges
over in the Adirondacks.
And this was like one of her first times in a kayak
all by herself.
She was kayaking up this elderly couple
and she said to them, "Are you going to Point A
"or are you going to Point B?"
So I'll show you some of the Point Bs
that I'd like to cover.
First I'd like to talk a little bit about quality and cost.
And then I'd like to talk about how we can actually use
quality measurement to improve quality.
So first, is there a quality problem?
This is from a blog that Ashish Jha wrote.
And Ashish Jha, many of you probably know who he is.
He's a healthcare economist over at Harvard
who's very well-published
in some of the leading medical journals.
And he talks about what happened when his dad
was hospitalized for a stroke.
And he got this frantic phone call,
and he drives all the way back from New Jersey,
he was on the beach, back to Boston to be with his dad.
And he talks about hospitals as being
one of the most dangerous places where you can be.
And he says he was at the bedside,
and then he all of a sudden realized that his dad
was about to get an infusion of a medication
that really wasn't intended for him.
He asked the nurse to double-check and make sure
that he was getting the right medication,
in fact he was getting the wrong medication.
They stopped the infusion.
And then the nurse sort of said to him,
"Oh, don't worry, this kind of thing happens
"all the time."
This is something that came out recently in BMJ.
It's incredibly provocative.
The methods are somewhat controversial.
But it does make us think a little bit about medical errors
and about healthcare quality.
This is an older article.
And this is sort of similar to what we hear from the IOM,
and to err is human.
Basically what this article points out
is that there are a lot of medical errors,
and that there are issues with quality
in the US healthcare system.
So how do you quantify quality?
How do you look at whether or not
there's variability in quality?
One of the ways to do it is to basically take
some condition, in this case obstetrical care,
and look at whether or not there's a gap in performance
between high-performing and low-performing hospitals.
And if you quantify that gap,
and if there is in fact a gap, then that would suggest
that there are ways that we could improve quality
by taking some of the low-performing institutions
and the average performers
and making them higher performing.
So this is a study that we did
using the nationwide patient sample.
And what we did was we looked at the variability
in obstetrical complications for two different cohorts,
for women undergoing vaginal deliveries
and for women undergoing cesarean deliveries.
And what we saw was that there was
about a two-fold difference in complication rates
for vaginal deliveries, and about a five-fold difference
in complication rates for cesarean deliveries
in US hospitals.
Pretty striking.
And it turns out that we see this same thing
outside of obstetrics.
We see it for cardiac surgery,
where there's about a two-fold difference in mortality
for patients undergoing cardiac surgery.
This is from the STS,
from the Society for Thoracic Surgery database.
We see the same thing for non-cardiac surgery.
This is from the American College of Surgeons
National Surgical Quality Improvement Program.
This is a caterpillar graph.
Each of the points is a point estimate
for a hospital-adjusted odds ratio
for mortality and major morbidity.
On the y-axis you have essentially the O to E ratio.
At the left-hand side of this,
the left-hand side here,
you have the high-performing hospitals.
Their O to E ratios are around .5 or so.
At the other end, you have the low-performing hospitals.
Their O to E ratios are about 1.5.
So there's nearly about a three-fold difference
in the risk of mortality for patients undergoing
non-cardiac surgery, depending on whether they go
to a high-performing or a low-performing hospital.
Not only do we have problems with variability in quality,
we also have problems and issues
with disparities in healthcare.
We've known for a very long time that racial
and ethnic minorities tend to have worse outcomes
than white patients.
And in part, we believe that the mechanism for that
is that racial and ethnic minorities are treated
in hospitals that care for a disproportionately large
number of those patients.
And what we see is that those minority-serving hospitals
tend to have worse outcomes
than the non-minority-serving hospitals.
So in this case, for patients undergoing acute,
for patients who have had an acute myocardial infarction,
we see that the risk-adjusted mortality is about 24%
at minority-serving hospitals
versus 20% at white-serving hospitals.
We see the same thing in trauma.
This is a study that we did looking at trauma care
in trauma centers in Pennsylvania.
We found again that racial and ethnic minorities
are treated disproportionately at black-serving hospitals.
And when you look at mortality, major complications,
and failure to rescue, those minority-serving hospitals
consistently have much worse outcomes.
In this case, what you see is that the risk of mortality
at minority-serving hospitals is a bit, whoops,
is about 45% higher.
The risk of death or major complications
is about 70% higher.
And failure to rescue is about 40% higher.
So there does seem to be a quality problem
with the US healthcare system.
What about cost?
Cost is a major issue.
In the United States, the federal government spends
right now a little over a trillion dollars a year
for Medicare, Medicaid, and other healthcare costs.
And that's projected to increase
to about two trillion dollars,
or to double in the next 10 years or so.
The percent of the GDP that we spend
on federal spending is about 4 1/2% now.
That's projected to go up to 8% in the next 25 years.
All that contributes to our national debt.
And we see that our national debt currently
is almost as high as it's ever been in our history.
It's almost as high as it was during World War II.
So why do we spend so much money on healthcare?
Well, part of it is demographics.
Our patient population is getting older.
Part of it is that we're a lot better
at taking care of our patients.
Our technology that we're currently using
is a lot more expensive.
This is an example of treating people with heart failure.
Heart failure is a major problem in the US.
The old way of treating it, people don't do very well.
They usually died in about two years.
We've got 10% survival two years out
if you're diagnosed with heart failure.
The alternative is to put ventricular assist devices in.
These things work very well.
Two-year survival is about 60%, 60% versus 10%.
The problem is the cost.
So the first year or so of having
a ventricular assist device, the cost is about $230,000.
Heart failure therapy is about $100,000 a year.
So when you put a ventricular assist device in,
people live longer and it costs a lot more money.
So we do a lot better job taking care of people,
and that increases our healthcare spending.
So CMS needs to have a way
of constraining healthcare spending.
They can't just keep spending more and more
and more every year.
And it's not politically viable in this country
to ration healthcare.
You can't tell people,
no you can't have X and Y procedures.
We're gonna pay for heart transplants.
We're gonna for VADs for anybody on Medicare, on Medicaid.
So how do you control the rising cost of healthcare?
Well, the way you do it is you constrain
what you're going to pay hospitals
and what you're going to pay physicians and other providers.
And what CMS is doing, essentially,
is it's put into place pay-for-performance.
So we're redesigning the healthcare system,
so that instead of paying for volume, how much we do,
we're paying for outcomes.
And if you think about it, that makes sense.
It's sort of like motherhood and apple pie.
You pay for quality instead of just
for what you do to patients.
So CMS has got a bunch of programs that are in place.
These that you see up here for for hospital payments,
so the Hospital-Acquired Condition Reduction Program,
the Hospital Readmissions Reduction Program,
and value-based purchasing, that's for hospitals.
For physicians, we have NPS.
It's all the same, it's pay-for-performance.
If you're going to do pay-for-performance,
you need to have some way of measuring performance
or measuring quality.
And you can do this in two different ways.
You can look at, use process measures,
or you can use outcome measures.
Process measures look at what we do to our patients.
Outcome measures looks at the actual
what happens to our patients.
So the rationale for process measures
is that oftentimes physicians and other clinicians
are very slow to adopt best practices.
So it's out there for a long time,
but yet we don't change what we do.
And the idea is that if you take those best practices
and you measure whether or not people actually do
what they're supposed to do,
and I put supposed to in quotation marks,
then you can get more rapid adoption of best practices.
And if those best practices result in better outcomes,
you can improve overall healthcare quality.
So the rationale for standardization,
well, let me back-track a little bit.
So Einstein once said that the definition of insanity
is doing the same thing over and over and over again,
exactly the same thing,
and expecting to get better outcomes.
And I would say that the corollary to that in medicine
is you have a lot of different folks who do things
very differently, and they all expect to get
the best possible outcomes.
So the idea behind standardization is that, again,
if you can uncover best practices,
and you can get everybody to follow those best practices,
that you'll have better outcomes.
We have evidence of tremendous amount of variability
in how we do things in medicine.
So for example, transfusion, blood transfusion.
This is a study based on the STS data,
looking at patients undergoing CABG surgery.
And what they found was that you had some hospitals
that transfused
virtually everybody that was undergoing CABG surgery.
And other hospitals had transfused virtually no one.
So lots of variability.
So when you're going to come up with a process measure,
in the old days, a lot of it would be a bunch of people
sitting around a table and saying this is how we think
you should do it.
These were expert-based rules.
Now, we have more and more really good evidence
that we can use to base those rules on.
The Achilles heel of process measures
is that the evidence linking the best practices
to better outcomes is frequently not as good
as we would want it to be.
So let's take a look at one clinical example
where we think we have really good evidence,
and some people believe the time is actually right
to put a process measure in place.
This relates to blood transfusion
and the trigger for blood transfusion.
What hemoglobin you should use to decide
when to transfuse people.
So these are a set of guidelines that came out recently
from the American Association of Blood Bankers.
And what they tell us is that we should be
very conservative about when we transfuse people.
We should only transfuse people if they have
a hemoglobin of either seven or eight.
These followed a set of guidelines that were published
a couple years ago in the Annals of Internal Medicine.
And the evidence for those guidelines of using
a very conservative, or a restrictive approach
to transfusing people is based on lots and lots
of observational studies showing that patients
who get more blood tend to have worse outcomes.
And this is for surgery,
and this is for medical patients, as well.
The problem with these studies is that they don't always do
a very good job of controlling for confounding.
In the case of surgery, the patients who are getting
transfused more are patients who are bleeding more,
and we're not really controlling
for the amount of surgical bleeding.
So yes, if you get lots of blood, you have worse outcomes,
you're more likely to die,
you're more likely to have major complications.
But again, the reason those patients are being transfused
more is because they're having more trouble bleeding.
So there's other types of evidence that's a lot better
for transfusion medicine.
We have some randomized control trials.
This is one of the seminal studies.
This is a trick trial.
This was done in critically ill patients.
And in this study, they compared a restrictive
to a liberal transfusion strategy
in critically ill patients.
These are the types of patients that you would think
would benefit the most from having a higher hematocrit
from being transfused more.
And what they found in this study is in fact
that there was no difference in outcome
between patients who were randomized to the restrictive
versus the liberal transfusion strategy.
And there's a number of other randomized control trials
that have also come out that have shown
essentially the same thing.
So based on this, there is a very strong recommendation
that we should use a restrictive transfusion strategy.
When you look under the hood, though, however a little bit
on some of these studies, things are not quite as clear
as they might appear.
So for example, in the trick trial it turned out
that there were about 2,000 patients
who were deemed eligible for that study,
but only 800 were randomized.
And what people think is of those 1,200
that did not participate in the trick trial,
there might have been a lot of patients
whose physicians thought, look, they were too sick
to be randomized to be restrictive transfusion strategy.
They thought that they actually would need more blood
to have better outcomes.
And then you have at the other end of the spectrum,
you have some physicians who are caring for maybe younger,
healthier critically ill patients.
It's a little bit of an oxymoron.
But those physicians might have thought,
well I don't want my patients to be randomized
to the liberal transfusion strategy,
because I don't want them getting too much blood,
because then they may get complications
from having too much in the way of blood transfusion.
So what you're seeing is that even though randomized
control trials are thought to be the gold standard
for evidence, you can actually have selection bias
when you do these studies.
And very recently, there was a study that came out
looking at cardiac surgical patients, again comparing
a restrictive versus a liberal transfusion strategy.
And in that study, they found that for their primary outcome
which was the risk of a serious infection
or ischemic of that, that there was no difference
between liberal or restrictive transfusion strategy.
What they did also find was that in a secondary analysis,
patients who were randomized to the restrictive
transfusion strategy were more likely to die.
And then based on that, the authors of that study said
patients with cardiovascular disease may actually represent
a specific high-risk group for which a more liberal
transfusion thresholds are to be recommended.
So other process measures that a lot of us
are familiar with, skip measures.
Skip measures basically look at whether or not
we're giving antibiotics within a certain one-hour timeframe
for patients undergoing surgery.
And the idea here is that if you get the antibiotics,
you're less likely to have wound infections.
And these are being used as process measures.
And what we find over time is that the adherence
to those skip measures gets better.
But what we don't find over time
is that the infection rate gets better.
Other best practices that could be candidates
for process measures, surgical safety checklists.
Again, this is like motherhood and apple pie.
Initial studies showed dramatic reductions
in mortality and morbidity when you had
safe surgery checklisting in place.
Later studies were not able to replicate that.
So what's the point here?
The point that I'm trying to make is that one of the ways,
one of the most common ways that we have of measuring
performance is using process measures.
Again, process measures look at what we do to our patients.
And even in cases where we think we have really strong
quality evidence, for example, with blood transfusion,
where people are really pretty convinced
that transfusing people less will probably lead
to better outcomes.
They're certainly not worse outcomes.
It turns out that the evidence is not nearly as strong,
maybe not nearly as strong as we think it is.
So it's very hard to improve quality, improve outcomes,
by just getting people to follow best practices,
because oftentimes, those best practices may not be
actually linked to better outcomes.
So at the end of the day, if you're going to try to improve
quality, probably the way to do it is
instead of measuring what we do,
you ought to be measuring the actual outcomes,
the outcomes of interest.
So quality is one of those things where most of us,
whether we're physicians, nurses, clinicians,
health policy, we sort of have a sense
for what high quality work in our area of expertise is.
We sort of know it when we see it.
And whenever I give this talk, I always ask people,
I always ask the same question.
I always ask people to raise your hand.
I'm going to ask you guys.
Raise your hands if you think,
whatever your line of work is,
if you think that you're either below average or average.
So anybody who thinks they're average or below average,
please raise their hands.
Okay, this actually an unusual audience,
because absolutely nobody's raising their hands.
So this is sort of like Lake Wobegon,
where everybody is above average.
So if you wanna look at hospital or physician quality,
it's kind of challenging.
If you want to go out and buy a dishwasher or TV,
you go to the web,
and there's a ton of information out there,
all sorts of really detailed information,
and you can usually figure out
what dishwasher you're gonna buy.
If you want to look at physician quality,
well CMS has set up something called Physician Compare.
And I would encourage you to go to that website
and look at their physician report cards.
And basically, what you will find is nothing,
just information on physician demographics.
So how do we actually quantify quality?
We think we know it when we see it,
but is there a way to actually measure quality?
Well, there is.
The typical approach that we use
is we basically collect data on clinical outcomes.
So say we want to compare ourselves at Strong
to our outcomes for cardiac surgery,
for CABG surgery at Strong,
to the outcomes over at Rochester General Hospital.
Well one of the things that we can do is we could just look
at our crude unadjusted mortality rates.
So maybe our mortality rate's 4%.
Maybe their mortality rate's 2%.
So does that mean that we're doing a worse job
than Rochester General?
Probably not, because maybe our patients
are a lot sicker than their patients.
It's not really an apples to apples comparison.
So we have to have a way of risk-adjusting,
of adjusting for severity of disease, for co-morbidities,
that takes into account the fact that our case mix
may be different from their case mix.
So how do we do that?
Well, what we do is we collect information on outcomes,
things like mortality, that's what we're interested in.
And then we collect information on clinical risk factors,
the things that are more likely to lead to a bad outcome
in patients undergoing cardiac surgery.
The things like information on age, gender,
measures of heart function,
whether or not someone has a history of heart failure
in the past,
whether or not they've had previous cardiac surgery.
And then what you do is you take all that data
and you fit it to a regression model,
usually a logistic regression model.
And that model allows you,
for each patient who underwent cardiac surgery,
to predict their predicted probability of death,
conditional on their risk factors.
So now you can take all of those patients
who underwent cardiac surgery at Strong, say in 2015.
And for each one of those patients,
you calculate their predicted probability of death
and you average them together
and you get the expected mortality rate.
And then you calculate their actual mortality rate,
which is the observed mortality rate.
And now you have a way of looking at quality,
because you can compare observed to expected.
So if the observed mortality rate is significantly greater
than the expected mortality rate,
then you would be a low-quality outlier.
Conversely if you're observed as significantly less
than you're expected, then you're a high-quality outlier.
So you have a way of actually quantifying quality.
So there are some challenges with doing this.
One of the biggest challenges that we have is data quality.
So the data that we use to actually look at outcomes,
to quantify outcomes,
the information that we actually use to identify
risk factors for severity of disease and for co-morbidities.
So data comes in really two different versions,
two different flavors.
The first flavor is clinical data.
This is the data that folks
at the American College of Surgeons use,
the folks at Society of Thoracic Surgery use,
when they create their report cards.
And this is data that basically is collected
by clinicians, usually nurse extractors.
They go to the medical record.
They have a standardized data dictionary.
It's very high quality clinical data.
And we think it's probably the gold standard
in terms of data quality.
The alternative is to use administrative data.
Administrative data are ICD codes,
previously ICD-9 codes and now ICD-10 codes.
These are data that are coded by non-clinicians.
They're very good at what they do,
but they're not clinicians.
The problem with administrative data is
it's not as reliable as clinical data.
So when people have gone and compared the results
of clinical data collection to the actual ICD codes
that are coded in the administrative data,
what they find is that the administrative data
are frequently relatively insensitive
for picking up important diagnoses.
So for example, the sensitivity of the administrative data
in this particular study for picking up a history
of an old myocardial infarction was about 35%.
Why is that a problem?
Well, if you undercode your patient's comorbidities,
what will happen is your patients
will actually look healthier than they actually are,
so that when you compare your observed to your expected,
you're not gonna look as if you were doing as good of a job
as if you had fully coded all of your comorbidities.
So that some of the variability that we see
when we look at quality across different hospitals,
we see some hospitals that are better than other hospitals.
Some of that variability,
if you're using administrative data,
it may actually reflect differences in coding practices
across different hospitals,
as opposed to true differences in quality.
So why are we not just using clinical data?
Clinical data collection is really expensive.
So for the ACS NSQIP,
it costs about $100,000-$150,000 per year
per hospital to collect data
on a sample of patients undergoing non-cardiac surgery.
It's real expensive.
Administrative data, there's really no additional cost
to using administrative data for report cards
for benchmarking, because it's already being collected
by the hospitals, because hospitals can't bill
if they don't collect administrative data.
So administrative data is almost free,
from the standpoint of creating report cards.
CMS, when it does value-based purchasing,
when it does pay-for-performance when it does NPS,
it is using administrative data.
The cardiac surgeons and non-cardiac surgeons
which are doing primarily non-public recording
are using clinical data.
Now there's a third pathway to collecting data.
And this is what people are looking at in the future,
and that's extracting clinical data from the EMR.
And the idea here is that if you can get
high quality clinical data from the EMR,
it's gonna be pretty cheap.
If you can get your clinicians, your physicians,
your nurses to code the clinical information in the EMR
and you can take that information
and use that for benchmark,
now you've got the best of all possible worlds.
You've got clinical data, so it's more reliable.
And it's virtually free,
because it's already being collected, anyway.
The problem with the clinical data that you get
from the EMR, however, is that in fact it may not be
even as reliable as the information that you're getting
from the data quoters, when they're quoting ICD-9
and ICD-10 codes.
It's really hard to get clinicians to modify
their workflow to collect the data elements
that you and I might want to collect
in order to be able to do benchmarking.
And any of us who have worked with EMRs
and have looked at the problem lists that are generated
will see that frequently those problem lists
are used to generate the ICD-9 and now the ICD-10 codes
that are gonna be used for billing.
If you look at that list, in a problem list,
oftentimes it's fairly inaccurate.
So we still don't really have a great solution
for the data quality problems that we have
when we're trying to look at performance reporting.
Now let's move beyond data quality.
Let's say that you didn't have a problem with data quality.
Let's say that you had very high quality data.
Does that mean that now we can do accurate benchmarking?
We know from a seminal paper that resides
only out of Harvard, published back in the late '90s,
that if you take a data set
and you create different risk-adjustment models
to do the risk-adjustment,
to do the performance benchmarking,
that oftentimes what will happen is you will end up
identifying different hospitals as being
high-performing and low-performing.
If you think about it,
that doesn't really make a lot of sense.
So if you went out and bought three different yardsticks
and measured the perimeter of this auditorium
with three different yardsticks, you would expect
to get more or less the same measurement, right?
Well it turns out that if you use three different quality
yardsticks, those yardsticks will frequently not agree
on who are high quality and low quality.
So this is a study Iezzoni published more recently,
using data from Massachusetts.
And they used some of the leading risk-adjustment models
that were available.
And they found that in Massachusetts,
there were about 28 low-quality outliers
that were identified by one or more
of these risk-adjustment models that were then identified
as a high-performing outlier by one or more
of the other risk-adjustment models.
So again, the concept being that depending
on what you are using,
what risk factors you are using
in your risk-adjustment models,
you may end up getting different answers
about who's high-quality and who's low-quality.
Okay, does that mean that when we're measuring quality
that there's no quality signal?
No, there actually is a quality signal.
So there have been different studies that have been done,
looking at whether or not past performance
predicts future performance.
This is a study that we did looking at trauma centers.
And what we did was we basically identified
high quality or high performance and low performance
trauma centers using older data or historical data,
and then looked at whether or not those particular
trauma centers would have better or worse outcomes
using more recent data.
And what we found is that if you had two year-old data
or three year-old data,
and then you looked at contemporary data,
that the high-quality hospitals were still high quality,
the low-quality hospitals were still low quality.
Another way to look at whether or not there is
a quality signal in quality measurement
is to use a different way of measuring performance.
This is a study that came out of Michigan.
It's really a very ingenious study.
And what they did was they basically took a bunch
of surgeons who were performing bariatric surgery,
and they taped them.
And then they had a bunch of surgeons, blinded reviewers,
who graded the technical quality of the surgery.
And then they looked at the correlation
between the technical quality of the surgery
and risk-adjustment complication rates.
And what they found was that the risk-adjustment
complication rates were about three times higher
in the surgeons who were graded as having
low technical quality compared to the surgeons
who were graded as having high technical quality.
Okay, so we're back to Blue Mountain Lake.
This is telling you that we're going to shift gears
in just a little bit.
Okay, so there are some challenges
with measuring quality.
Those challenges revolve around data quality issues.
They revolve around the choice of risk-adjustment.
But nonetheless, there does seem to be some kind
of quality signal that we're measuring.
So the next question that I'd like to answer is,
okay, so we have a way of quantifying hospital performance.
Can we use that information to improve outcomes,
to improve quality?
So the early studies were incredibly promising.
This is from VA NSQIP,
the National Surgical Quality Improvement Program.
This is very old data.
Back in the late '90s, the federal government looked
at the quality of medical care in the VA hospital system
and said, "It's really bad.
"You guys have to find a way of fixing this."
So the VA created NSQIP.
And what they did was the basically started collecting
clinical data on outcomes on risk factors
and created these report cards and took these report cards,
and these were non-public reporting,
no one saw it, except for the surgical chairs.
And they fed that information back to surgical chairs
and they coupled it with a quality improvement effort.
And what they found is that over a fairly short time period,
that mortality and morbidity improved dramatically.
Mortality went down by about 25%.
Morbidity went down by about 45%.
Really, an incredible improvement in quality.
And then the American College of Surgeons saw this
and they said, "Wow, that's amazingly impressive.
"Let's see if we can do that in non-VA hospitals."
So they took NSQIP and they basically made it
available to non-VA hospitals.
We now have over 500 hospitals in the US
that are part of the NSQIP program.
And what they claim to have found,
using again, a pre- and post-study design,
was that over time, the O to E ratios,
the ratios of the observed to the expected mortality
or morbidity rates for hospitals that were participating
in NSQIP became smaller.
Suggesting again, that if you participate
in non-public reporting and use that information
to drive your quality improvement efforts,
that things will get better.
But that was a pre-imposed study.
And so then the group from Michigan said,
"Wait, hold on a second.
"You know, when you're doing these pre-imposed studies,
"it's really easy to get kind of a little bit confused.
"Maybe things were getting better, anyway, over time.
"So we need to look at this a little bit more carefully."
And what they did is they used a (audio cuts out)
kind of metric technique,
difference in difference methodology.
What they did is they look at outcome in hospitals
before and after they joined NSQIP.
But then they compared them to control group hospitals
over the same time period.
And what they found in their study is if you just looked
at the pre-imposed, the NSQIP hospitals
did get better over time.
But when you compared them to the control hospitals
that were not part of NSQIP, there was no improvement.
So this study suggested that
NSQIP in fact was not associated with improvements
in surgical outcomes.
But hold on, maybe the problem is
that it's not about non-public reporting.
If NSQIP is non-public reporting,
maybe that's why you're not seeing better outcomes.
Maybe you need to have more transparency,
you need to have public reporting.
So CMS has Hospital Compare in which it reports
risk-adjustment outcomes for AMIs, pneumonias,
and heart failure.
And what this group did is it said,
"Let's look at what happened to risk-adjustment
"mortality rates before and after public reporting started."
And what they found was that yes, mortality rates
were getting better before public reporting started.
And they kept getting better
after public reporting was instituted.
And if you look at the slope in the rate of improvement,
it's essentially the same.
So it seems that non-public reporting
doesn't really work very well.
Public reporting may not work all that well.
What about pay-for-performance?
Well, CMS has created some demonstration projects.
And in those demonstration projects,
one of them was called Premier.
And that had both an incentive and a penalty.
And what this group did is they compared the performance
of the hospitals that were part of the pay-for-performance
versus a group of control hospitals.
And again, what they found, using the same methodology
was that yes, outcomes got better over time,
but there was no difference in the improvement,
the rate of improvement, in hospitals that participated
in the pay-for-performance versus the hospitals
that did not.
And then they did a sensitivity analysis
in which they looked at hospitals
that were the lower performing groups.
These were the hospitals that one would think
would have the biggest incentive to improve their outcomes.
And even in those hospitals, there was no difference
between the hospitals that participated
and the ones that did not participate
in pay-for-performance.
Okay, so this is one of my other kids,
and this is also at Blue Mountain Lake, and he was fishing.
So, where does that leave us?
So CMS has got to have a way
of controlling healthcare spending.
We can't go from one trillion dollars of expenses
to two trillion dollars over the next 10 years.
And again, we can't control cost by rationing healthcare.
So we need to have a way of controlling costs.
How do we do that?
Well, we pay providers, we pay hospitals, pay physicians,
we pay people less for what they do.
And again the idea here was we were going to do this
with pay-for-performance.
I'm showing you some of the challenges
with measuring performance.
But what CMS did is, and this is kind of interesting,
this is the Hospital Readmission Reduction Program.
And in that program, there is only a penalty.
There is no incentive.
And what they've done is they've said basically,
"Look, if your O to E ratio for hospital readmissions
"is 1.01, you'll lose up to 1% of your CMS payments.
"If it's 1.02, you'll lose up to 2%.
"If it's 1.03, you'll lose up to 3%."
But if you think about it, an O to E ratio of 1,
that means you're average.
1.01, 1.02, 1.03, you're still average.
But despite that, the way that this system
has been set up, if you're just a little bit worse
than that 1.00, for an O to E ratio,
you're gonna be penalized.
So this is a way, and this is essentially a mechanism
for controlling healthcare spending,
and ostensibly linking it to quality,
but not really.
Okay, so where do we go from here?
I actually think that report cards can be used
to drive quality improvement.
And the reason I think that is because I think
that hospitals, physicians, clinicians
will respond to the incentives to the grades.
I think all of us are here, sitting in this auditorium,
because we've always responded to grades.
So when you were in high school, if you got a bad grade,
you responded to that bad grade by working harder
to try to improve your grades.
Even if you didn't think those grades were very fair,
you still took those grades and tried to improve them.
You did the same thing in college,
you did the same thing in graduate school.
So if you're a hospital and if your hospital performance
when you're being benchmarked is not as good
as you want it to be, you may not necessarily believe
that that grade that you're receiving
accurately measures your performance.
But despite that, you're still going to respond to that
and you're going to work to try to improve
the reported grades that you're receiving.
And the way you're going to do that is you're going to go,
in part you're going to go to the folks
who are at the tip of the spear,
the clinicians who are working in the operating rooms,
on the floors, and you're gonna say to them,
"Okay, what is it we can do to improve quality?"
And then you're going to provide those clinicians
with the resources that they need to improve performance,
Again, even if you don't really necessary think
that the report cards that you get capture
in a 100% accurate way the performance of your institution,
you're still going to work to improving your performance.
And I think that's why ultimately
performance reporting will deliver higher quality care.
Okay, I'm going to stop here and answer any questions
that people might have.
No questions?
Okay Bob?
- [Bob] How many years do you think it will take
before we get a good flow of electronically
explantive measures?
- So it's a real challenge.
What we're trying to do with MQIP,
which is the Maternal Quality Improvement Program,
is we've gotten a group of people together,
major stakeholders, from obstetrics, from anesthesiology,
to create a data dictionary,
a standardized data dictionary with outcomes
and with risk factors.
And then we've gotten some of the EMR vendors,
the biggest EMR vendors, so Epic and Cerner,
to agree to take that data dictionary
and to embed it in their EMR.
So the idea here is that essentially when clinicians
in the course of their normal workflow
are taking care of patients,
they're gonna enter the clinical data elements
that we need to use to create report cards for obstetrics.
The American College of Surgeons,
the Society of Thoracic Surgeons,
they're all trying to do the same things.
So in theory it sounds like a really, really great idea.
In practice, trying to get clinicians
to reliably code those structured data elements
is gonna be really, really hard.
Because, you know, when we're taking care of patients,
our goal is not to document.
Our goal is to take care of patients.
And anything that gets in the way of that,
that worsens our workflow,
is something that we don't really want to do.
So I don't know how many years it's gonna take.
I don't know how doable that's going to be,
to be honest with you.
I'm not sure that we're going to be able
to really ever migrate away from the current data sources
that we have, ie either administrative data,
or clinical data that's being collected by people
who are very expensive.
I mean, you as well as anybody knows
how accurate the problem lists are, right?
People just kind of, you know,
they don't really get modified.
And after a while, you have a problem that shows up
on your problem list in your EMR
that's been there for a very long time,
and it's no longer a problem, but it's still there.
Other problems with EMRs,
if you think, well let's just go to the medical record
and use natural language processing
to extract the clinical data from the actual note.
Because the note should be accurate, right?
I mean, the problem lists may not be,
but what you write every day in your medical record
should be accurate.
But it turns out that what we're seeing
is a lot of copy and pasting.
So that the medical notes, which used to be pretty accurate,
because people would have to write them down by hand,
now that's really easy to just copy and paste.
So I don't really know what the answer to your question.
I'm not as optimistic now as I might have been a year ago.
- [Man] Maybe this goes along with that.
I deal with a nine-town region of different hospitals,
different EMR vendors, you know workgroups,
how far do you think we are behind the eight-ball?
Maybe that's the mistake in questioning the terms of,
you know, I'm finding that people who want to extract
different data and it's different all across the board,
and so every time you switch to a new vendor,
it just wreaks havoc with everything.
And I think that's probably part of this, too.
I don't know (door slams) it just seems so far
behind the ball that we're EMR,
the stuff right now.
Maybe some of this should have been
thought of 10, 12 years ago (mumbles)
I'm not sure.
- I agree with you.
I think there's some very real challenges
in terms of working with EMR if you use it
as a source for clinical data.
- [Man] And it seems like in all counties,
everybody seems to (mumbles).
- Thank you for that comment.
- [Man] I think another problem is
I think we have in this country is
we want to (mumbles) everything.
So Center of Health in northern California
has a program where (mumbles off microphone)
make it easier for the (mumbles off microphone).
That's why creating a sense of what practice (mumbles)
creating the data that is important
that is integrated (mumbles off microphone).
- I think that's a very interesting point.
I think clinicians who,
again, I hate to use the expression,
but at the tip of the spear,
people know what the problems are.
And I think clinicians are incredibly good
at coming up with workarounds,
with relatively inexpensive approaches
to fixing those problems.
To me, the promise of, if you want to call it that,
of the redesign of the payment system,
one which works to incentivize higher quality,
is that all of a sudden, the hospital bottom line
is at risk when you get a performance reporting
that says that you're not doing as well as you should be.
Now again, that performance rating may not be accurate.
But at the end of the line, there's going to be
a lot of hospitals that are going to get penalized,
or at risk for getting penalized.
And when that happens, or even the threat of that happens,
of that happening, rather,
it incentivizes those hospitals to pour resources
into improving quality.
And so then, when you have, as you're mentioning,
when you have groups of clinicians who identify issues,
things that they want to work on,
now they actually get the resources that they need
to make those outcomes better,
whether it's pay-for-performance,
whether it's an alternative payment model,
where all of a sudden you're really liable.
If you spend more than what you're supposed to spend,
you're going to end up losing money.
So you have to find a way of making care more efficient
and higher quality and having fewer complications.
So yeah, I agree, changes at the local level.
But helping that along by giving more resources
to make that happen is really going to be,
I think, pretty effective.
- [Man] And and now from my previous question,
is there an all claims, all payer database (mumbles)?
(man speaking off microphone)
- [Man] If you wanted to compare different cities across--
- So what you can do is you can go to the HCUP
as the stating patient databases
which are all payer and population-based
in their administrative data.
And they're readily available to researchers.
(man speaking off microphone)
What's that?
So it's HCUP, H-C-U-P,
state in-patient database.
It's a federal-
state partnership to make all payer administrative data
available to researchers.
(man speaking off microphone)
Um, they have one for ambulatory,
so the state in-patient database is both, is in-patient.
There's one for emergency rooms.
There's one for ambulatory care, surgery.
I don't know that there's one for ambulatory care, per se.
- [Man] CMS does have the out-patient data
for Medicare (mumbles off microphone).
- I have a question, then. - Sure.
- [Man] So in the beginning of (mumbling off microphone).
- So that's a great topic.
So up until recently, CMS had directed NQF,
the National Quality Forum, to not allow for risk-adjustment
for socioeconomic status, for SES.
So that if you were caring for
a disproportionately large number of low-SES patients,
you couldn't adjust for that.
And there are lot of hospitals that felt
that was really unfair, because they felt that SES
was a proxy for co-morbidities,
for severity of disease that might not be
otherwise picked up by the ICD-9 and now the ICD-10 data.
So they felt that those hospitals were taken care of.
Those hospitals felt like they were taking care
of sicker patients, and they felt like they were being
penalized for that, because the risk-adjustment
wasn't taking that into account.
And what people have talked about is the fact,
well hold on, so you've got these hospitals,
you've got these safety net hospitals
that are taking care of these really sick people.
And under pay-for-performance, they're gonna be penalized.
So if you're trying to reduce racial and ethnic disparities,
and you're trying to raise the level of care
for those patients at minority-serving hospitals,
the last thing in the world that you want to do
is penalize them by not taking into account
the fact that they take care of sicker patients.
So with that now, NQF is now asking all (audio cuts out)
developers to look at whether or not
they should be, in fact, be adjusting for SES.
And there was a technical expert panel
that actually was convened by NQF and said yes.
This is something that you definitely need to look at.
I'm on one of the steering committees
at the National Quality Forum for the readmission measures.
And every time we evaluate now a readmission measure,
a hospital readmission measure, we consider SES.
But yeah, there's a whole lot of unintended consequences
when you don't adjust for SES.
But the flipside, and this is what CMS maintains,
is that if you adjust for SES,
then you end up adjusting,
you potentially end up adjusting away the fact
that maybe some of those minority-serving hospitals
are actually doing a worse job
taking care of those patients.
And then you don't recognize the fact
that they're doing a worse job and you're excusing it.
It's not a simple matter.
Does that answer your...?
- [Man] We ran a blood pressure collaborative
(mumbling off microphone).
We struggled with the claims adjustor
(mumbling off microphone).
So when you adjust for it that generally the SES
(mumbling off microphone),
the variation within the low SES benefits.
- When you adjust for it?
- [Man] When you don't adjust for it.
- When you don't adjust for it.
- [Man] You look at performance for all SES
(mumbling off microphone)
variations in what we're finding
(mumbling off microphone).
- Interesting.
- [Man] There's lots to learn from
(mumbling off microphone).
- Right, that's a really good point.
Some hospitals have said people who comment to the NQF
have said, look, what we really ought to do is
we ought to stratify the report cards,
based on whether or not you're a low-SES hospital
versus maybe not a low-SES hospital.
It's challenging.
What a lot of the measure developers have responded to CMS
is that if you incorporate SES in the models,
oftentimes it really doesn't make
that much of a difference.
And that is,
that is a little bit problematic as an assertion.
And the reason that they're able to make that assertion
is because most of these risk-adjustment models
are based on hierarchical modeling
with shrinkage.
And it turns out, and this maybe gets maybe a little bit
beyond what we want to be talking about here,
but it turns out that the effective SES
is just not that great and it gets overwhelmed
by some of the other considerations,
other methodological considerations.
So at the end of the day, oftentimes,
at least for readmission measures,
and for some of the other measures,
it doesn't make as much of a difference
as one would think it would.
- [Man] Any other questions?
Thank you all for coming.
(audience applauding)
- Thank you.