In our current digital world, organizations are experiencing an increasing consumption of test data in the software testing life cycle. As testers maintain data from existing sources as well as generating massive volumes of data to ensure the quality of delivery, it is vital that they apply good testing data management.
Hence, we have talked with experts in the industry to shed light on the role of Test Data in Software Testing.
What is Test Data?
Test Data are any forms of data (documents, pictures, videos, any other media) used in testing different applications, according to Glenn Nino Martinez, Senior Test Lead at Flexisource IT. These data are used to check if the application under test works as expected and are also used to test the applications limits or its breaking points via the boundary value test.
A Test Data is an input that is required to run your software program (website or application) to complete the software test execution, Mansoor Ahmed, Test Lead at NTT Data, continues.
In simple terms, once a product is developed, it needs to be tested using a wide range of data to ensure that the program is working as expected. Test Data ensures complete testing coverage is achieved. A set of test data must ensure positive, alternative and negatives flows/scenarios are covered during the validation of the software program
Test data creation and management is often the biggest challenge when creating both manual and automated tests and is one of the most important aspects of testing. Therefore, testers must continuously explore, learn, and apply the most efficient and effective approaches for data creation, maintenance, automation, and comprehensive data management for any type of functional and non-functional testing.
Data that is used by the testers for testing purposes is called Test Data, Lavanya Vijayakumaran, Offshore QA Lead at ALTEN Calsoft Labs, adds. Test data is a commonly used term in a tester’s day to day life. Even developers create test data for their Unit testing or analysis purposes.
While executing test cases, the tester needs some data to input to get the desired output. Test data needs to be precise and exhaustive to uncover the defects. It can be used once or can be used in different iterations as per the Software/Product requirements.
Hence, Test data is data that is needed in order to execute test cases properly and verify the expected output in any software
The different types of test data
Test data comes in different forms depending on the application under test. Indeed, as Glenn points out, there are so many types of test data, which still evolving since the applications it’s being used on are also evolving.
Test data can be in the form of a word or excel document with tables holding essential information like usernames, passwords and other necessary fields that could build a database. These documents carry different values that can test the positive results or even negative scenarios of the application.
There are also test data in the form of pictures or images that are used in applications that verifies different objects or different human facial expressions, as well as videos used as test data that are used on applications for data analysis and for automating guidance systems of future vehicles.
An application exists that monitors human behaviour security cameras that used videos as test data. Test data using different audio are also used on applications likes google home or Alexa that recognizes different voice commands.
According to Mansoor, a set of test data to test a given functionality should ideally be of the below types to make sure there is adequate test coverage.
Consider the example where only numbers from 1 to 5 are valid for the program to function and anything else is rejected
Normal Test Data: Typical or usual data that is accepted by the system and well within the testing scope/limits.
Example: Since only numbers from 1 to 5 are allowed as input then ‘3’ is considered as a Normal Test Data
Extreme Test Data: Data of upper or lower limits of expectation that is accepted by the system
Example: Since only numbers from 1 to 5 are allowed as input then ‘1’ and ‘5’ is considered as Extreme Test Data
Boundary value Test Data: It is pair of data at each end of the range. It contains data at upper and lower limits of expectation that should be accepted and immediate data before or beyond the limit of expectation should be rejectedÂ
Example: Here ‘1’ and ‘5’ are considered as accepted Extreme Test Data and ‘0’ and ‘6’ are considered as rejected Extreme Test Data
Abnormal/Enormous Data: Data that falls outside of required/accepted value and such data should be rejected by the system
Example: Here ‘7.5’ and ‘100’ are considered as Extreme Test Data
Moreover, Lavanya showcases four different types of test data:
- Blank Test Data: This does not provide any inputs in the application but simply runs the process to check if the appropriate messages are displayed or not.
- Valid or Positive Test Data: This is the data that is expected to be entered into the software product to get the desired output. The program needs to run without any errors.
- Invalid or Negative Test Data: This is data that the program rejects as invalid, due to the wrong data type. It might be because it contains characters that are not allowed or the value falls outside the accepted parameters of the program. Invalid test data is mainly used to see if the system is breaking when the wrong input is given. If invalid data is presented, then the application needs to handle it. Usually, the user is told that the data provided has been rejected.
- Boundary Value Test Data: This is testing the very boundary of acceptable data. Borderline data is still acceptable, and it will be processed in the same way as normal data. It is excellent for testing the hard limits written into the software and that the application still runs properly when handling it.
Creating test data
The most common way to create or prepare test data is by manually entering it if it’s in a form of a document, Glenn underlines.
He adds that collecting different images, audio or video is also done manually with the help of different people being outsourced by the companies and then all this data is being collated by certain individuals to categorize the files. Automation is also used to create large test data which is time-consuming when done manually.
Mansoor gives different ways to create test data:
Manual Test Data Creation:
This method is a simple way of generating data and it can be created with the knowledge and skills of the testing team. Some of the types of test data included in this method are valid, invalid, null, standard production data, and data set for performance. The challenge with this type of technique is that it is a slow process and results in less productivity.
Import from the production environment
This method is the most accurate method as this kind of data is an actual backup from the data that is used in real-time in production. however, production data cannot simply be copied as-is. Amongst the new regulation introduced by GDPR is the right of restriction on the use of personal data information. If production data is sourced for testing, data managers need to use anonymization techniques, applying to all personally identifiable information, and this process must be irreversible. This emphasizes the need for good documentation of data flows, data models, and adequate test data profiling.
Back-end Data Injection:
In this method, back-end servers with a huge database are used. In this data generation method, the need for having front-end data entry is eliminated and helps to inject the data quickly. Also, this method doesn’t need experts to help and create backdated entries. However, there are even drawbacks that can create a risk to the database and application if the technique is not implemented correctly.
Automated Test Data Generation:
In this method, automation scripts are used to create data. Here Web Services APIs and Selenium are popularly preferred tools. The advantage is that the data achieved with the automation will be of high level and accurate. There isn’t any need for human invention, the speed of delivering output is faster. However, there are even disadvantages such as cost factors and the need to have skilled resources.
Third-party Tools:
By opting for third-party tools, it becomes easier to create data into the system. As these tools have complete knowledge of the back-end applications, this will help in achieving the data that is much like the real-time. The benefit of these tools is data accuracy and the scope the users are provided for executing the required tests on the historical data. And, the disadvantage with this method is that it is too expensive and there is a limitation provided to work.
Moreover, creating test data can be very time-consuming, as Lavanya points out. Hence, it can be created with:
- Manual Test Data Generation
- Automated Data Generation
- Manipulated Test Data
- Use Prod Data into Test Environment
- Create Data in Backend
- Use any existing data from the legacy system.
Test Data might then be used to verify the expected results or to challenge the ability of the program to respond to unexpected input.
Test data & software testing
According to Glenn, test data is very important in testing, because it determines if an application works as expected and it also helps in catching bugs if the data entered is alphanumeric but the requirements only says it should only accept numbers then the application did not throw an error. Negative scenarios can also be done using test data to capture the limits of the application.
Creating test data is equally important as development or testing activities, Lavanya points out. Indeed, poorly designed test data will not provide accurate test results. It is then important to create tests properly as test data is the Input feed for testing the application, as well as to check that the outputs are derived correctly.
Today, the credibility and reliability of the test data are seen as an uncompromised element for business owners. Besides, considering the significance of test data, the vast majority of software owners do not accept the tested applications with fake data or less in security measures.
With data being more sensitive in the current world, it is vital that software must be tested properly with a proper set of test data. Nowadays, accurate, relevant, high-quality data is essential for Continuous Delivery, Test Coverage, Automation and Continuous Testing.
With quality data, you can find defects earlier in the development lifecycle for a cheaper fix and less risk of bugs in production. Hence, if testing and QA fails due to poor data quality, the end-product will fail as well.
For Mansoor, test data is a very important aspect of software testing, without proper test data, it becomes difficult for the testing team to make sure complete coverage is achieved. Test Data helps to ensure the quality of the product is achieved.
At times when testing coverage is not completely covered due to lack of test data, we have had disastrous complications. One such example is the opening of Heathrow Terminal 5, the UK in 2008. Due to improper testing, the baggage handling system couldn’t cope up when it faced some real-life scenarios, which resulted in a complete shutdown of the system. Over the following 10 days, some 42,000 bags failed to travel with their owners, and over 500 flights were cancelled. All this is due to the failure of engineers to carry out the test coverage of possible real-life scenarios.
Test data management helps organizations create better quality software that will perform reliably on deployment. It prevents bug fixes and rollbacks and overall creates a more cost-efficient software deployment process. It also lowers the organization’s compliance and security risks.
Benefits & Drawbacks
Test data, as Glenn notes, helps testers come up to a conclusion if the system under test is releasable or not.
The correct test data for different scenarios will determine if the testing covered all essential functions as per the application’s requirements. Besides, test data can also check for negative scenarios based on the boundary test data injected into the application.
Performance/Stress testing can be done by injecting large test data in the application and see if the system breaks at a certain limit.
According to Lavanya, using test data has significant benefits for the organization and the customer. It can help increase test data coverage by having traceability of the test data to test cases and requirements to provide a clearer picture as well as reducing costs by finding bugs earlier. Test data also allow quality data and data coverage, hence ensuring more customer’s trust in the organization.
Moreover, testers can archive the data in a central repository for future use. So, that whenever there is a requirement for reusable data, they can use the archived data.
If a testing team has a good set of test data before they commence testing activities, then we can rest assured that the quality of the product to be delivered will be up to the mark making sure all scenarios are covered, Mansoor highlights. Here are then few of the advantages of having such sets of test data:
Eliminates defects at early stages
You can identify gaps in requirements, test cases and defects at the early stages of your product development life cycle. It saves you from a lot of obstruction later.
Better coverage
Good test data results in better Test coverage. This leads to fewer defects and works to do at later stages. Moreover, you get to increase customer satisfaction with a refined product.
Higher ROI
If a set of test data is reused and maintained efficiently then we will end up with less or no defects in production and the same set of data can be used in regression as well for a similar project in future and hence All the resources which would’ve been spent on creating duplicate test data for every project and addressing defects, now translate into profits.
Smoother testing cycles
Well maintained test data can be used for functional as well as regression testing resulting in better test case prioritization, test suite augmentation, and test suite minimization. All this leads to smoother yet efficient testing cycles.
On the other hand, the challenges of test data are the time it takes when creating it, Glenn states.
Indeed, this will eat a lot of resources when not carefully planned and when not done correctly will end up useless. Maintaining test data is also a challenge especially when not all testers are knowledgeable enough in the system under test.
Moreover, it can also be challenging when the test data sources are not accessible which makes testing fail. It is then better to always plan the creation accordingly and make it easy to maintain in order to save time and effort in the long run.
As Lavanya stated, creating test data can be very time-consuming. A 2016 IBM research revealed that 30% to 60% of the tester’s time is spent on searching, maintaining, and generating data for testing and development.
Mansoor also points out that the process of obtaining and maintaining test data can be challenging for various reasons. Some of the most common challenges faced by testing teams are:
- Testing Team may not have access to create test data and hence at times this is time-consuming and an unavoidable dependency
- Same data may be used by multiple teams in the same environment and resulting in data corruption.
- Review and reuse of test data is rarely realized and leveraged
- There may be scenarios when larger data may be required at a shorter period, and this can be a great challenge to succeed if there aren’t the required tools to help the testing teams
- There may be a huge challenge of affecting the software if the defects in the data are not identified at the earliest
- Test data management demands the testing team to have an in-depth knowledge of alternate data creation solutions and this may not be available for every tester
The future of test data in software testing
Glenn believes that the role of test data in software testing will evolve in the next few years as more applications are going to develop that requires unique test data and applications evolve so as the data required to make it work.
Lavanya also thinks that test data management is key for successful product delivery. With new emerging technologies, using test data in testing will become a necessity for any high-performing large-scale software service.
Mansoor believes that as we are now moving towards more automation and shift left testing approach, effective test data management improves the quality of testing results and improved results lead to an improved product and higher return on investment.
He underlines that in the future, there will be more Test Management tools to develop and come into use for testers to easily maintain and create data. We will move to a position where test data creation and management lies in the hands of the testing team, this would not only save time but allow the test team to reuse and leverage test data to optimum use.
Once Test Data Management is optimized, an increase in productivity, results, and profitability should quickly manifest, allowing more resources and focus can be utilized on continuing quality products and services.
Special thanks to Glenn Nino Martinez, Mansoor Ahmed, and Lavanya Vijayakumaran