Jump to content

Data Conversion Validation


Recommended Posts

We are currently converting 20 plus years of Hemocare data into HCLL and the question I have is how many records should I check before I can say with confidence that all the data was converted. I originally thought that I would be OK if I checked 10 patients for each and every antibody and special instruction. But I have concerns about only checking 10 patients that have anti-K and assuming that the other 3,000+ patients with anti-K converted OK. I then thought that we should check 10% but to check 20 years of data could take me 20 more years, so logistically that's not going to happen. Has anyone out there had a similar problem?

Thanks

Tim

Link to comment
Share on other sites

Welcome to BoodBankTalk.

Excellent first question. Unfortunately, there is no one answer. You mentioned a key word in your post of "confidence". This will determine your final number.

Take a look around on the internet and you'll find even more information to confuse you.

Here is some information I have found, along with a recommendation to my facility (who is also converting data).

The confidence interval is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be "sure" that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer.

The confidence level (error) tells you how sure you can be. It is expressed as a percentage and represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95% confidence level.

Sample Size

The larger your sample, the more sure you can be that their answers truly reflect the population. This indicates that for a given confidence level, the larger your sample size, the smaller your confidence interval. However, the relationship is not linear (i.e., doubling the sample size does not halve the confidence interval). The sample size doesn't change much for populations larger than 20,000.

Many sites recommend a 95% (more typical) or 99% level with 5% error. The more critical the information, the higher the confidence level selected. We can edit this any way we want, 100% confidence with 0% error being the best and would require full validation of every record.

My recommendation (to my institution) is to select 660 random records based on:

Confidence Level 99%

Confidence Interval 5%

Population 117,416 combined donors (my institution)

Alternatively at 5% = 383 Confidence Level 95%

Confidence Interval 5%

Population 117,416 combined donors (my institution)

Link to comment
Share on other sites

Cliff, don't you think that for this type of project 666 would have been a more appropriate number than 660?

:P

I can't wait to go live. This has been a tough couple of years. We are converting data from two different systems, that already has data converted from other prior systems.

Link to comment
Share on other sites

Thanks for the response. I am going to check and see how many records we have. I am afraid its probably close to 300,000. Its amazing how much info can accumulate in 25 years.

Thanks

Tim

Link to comment
Share on other sites

You're welcome.

Don't get too fearful of the total number of records, it's fairly irrelevant past 20,000 records.

Let's say you select a confidence level of 95% and a confidence interval of 5%. Here is the difference in number of records you'd need to audit based on total records.

Total records = 1 | Number to audit = 1

Total records = 10 | Number to audit = 10

Total records = 100 | Number to audit = 80

Total records = 1,000 | Number to audit = 370

Total records = 10,000 | Number to audit = 337

Total records = 300,000 | Number to audit = 384

Total records = 300,000,000,000,000 = Number to audit = 384

Link to comment
Share on other sites

  • 4 weeks later...

In HEMOCARE this gives you the number of internal numbers assigned

Example record 14 Last donor ID assigned---<$> 0021204 # of donors registered

It will not be exact because of merges.

SU Setup / Utilities / Mail Box

CR Control/Track File Setup

TP Track File Print

Track File Report

record 1 - patient and inv numbers

Last pat num assigned----<!> 166820

Last inv loc num assigned<!> 190053

record 4 - last donor number assigned

Last donor num assigned--<@> 20284

Last mobile donor num----<@> 0

record 14 - donor ID number (7 char)

Last donor ID assigned---<$> 0021204

Greatest donor ID--------<$> 9999999

Link to comment
Share on other sites

  • 2 years later...

Hello,

It seems as though I am bringing up a very confusing subject but I have been researching this data conversion confidence and I can honestly say, I just don't get it. Cliff, how did you arrive at 660 records for that number of donors? Is there an actual "formula" for this?

We are converting over 350,000 patient records.

Thanks for the help (I think) :)

Link to comment
Share on other sites

As Cliff mentioned, for more precise validation, the sample size is based on your margin of error, confidence level and population size. Luckily, there are online calculators for sample size, for ex at http://www.surveysystem.com/sscalc.htm and http://www.surveysystem.com/sscalc.htm.

According to the above calculators, and as Cliff mentioned, for a 95% confidence level with a 5% margin of error and a population of 350,000 records you need a sample size of 384.

Just FYI at the BB where I work, we used a 3rd party vendor to validate HCLL and the Hemocare to HCLL data conversion. Mediware only checks 10 patients per site when they do the conversion. The 3rd part vendor did a thorough job and checked a couple of hundred records. They arrived at the number of records to check using the FDA rule of thumb for electronic spreadsheets, "the square root of the actual number of clients plus one." The FDA formula can be found in http://pharmtech.findpharma.com/pharmtech/Technical-Considerations-for-the-Validation-of-Ele/ArticleStandard/Article/detail/42756. This rule of thumb yields a relatively low sample size for smaller numbers of patients. So for something like 20,000 patient records, they checked 142 patients.

However, this rule of thumb has been criticized, for example here http://pharmtech.findpharma.com/pharmtech/article/articleDetail.jsp?id=56537

"The accuracy of the 95% confidence

probability statement for mean was

compared for three distributions for sample

size obtained from the square root of N plus

one rule with the Edgeworth approximation

derived sample size.Results showed that the

sample size obtained from this rule is not

even enough to declare less than 20% of

defectives in a moderate size population

with a high degree of confidence.Therefore,

the author concludes this rule should not be

used to select a sampling plan to infer a

population defective rate.

...

A simple method for choosing a sample size from a population

is through what quality engineers refer to as the square root

of N plus one sampling rule. This rule is apparently not statistically

motivated nor is it mentioned by sampling theorists,

practitioners, or reviewers of the field"

Edited by angonzalez
Link to comment
Share on other sites

Thank you angonzalez! :)

Finally the links to the articles that are discussing these topics! I finally (after many hours of searching) found the article that Cliff was referring to also. In addition I found the on-line calculators.

I love these additional articles that you are citing and I will check them out as well. We have determined (using the on-line calculator) that we will validated 663 (or so) records based on a sample size of 348,000. We are doing our own validation for a data conversion from a home-grown system to SoftBank.

Thank you so much again! This forum is truly a treasure!

Laura

Link to comment
Share on other sites

Do you have donor records or just patient records? If only patient records can your IS system choose not to convert records of patients it has flagged as deceased? Would this drop your numbers significantly? An institution i worked for went through a couple of such conversions and for one of them we did ,in fact,, check every patient record that had an antibody, but we had culled the deceased and we were a Blood Bank, not a donor center. It was still very tedious and we designated one tech to the project.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Advertisement

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.