This article will cover SQL bulk insert operations deterministic outcomes and responses covering not allowing any bad data to allowing all data to be inserted, regardless of errors.Read more »
He has spent a decade working in FinTech, along with a few years in BioTech and Energy Tech.He hosts the West Texas SQL Server Users' Group, as well as teaches courses and writes articles on SQL Server, ETL, and PowerShell.
In his free time, he is a contributor to the decentralized financial industry.
View all posts by Timothy Smith
Latest posts by Timothy Smith (see all)
- Situations When We May Want Higher Azure Costs - June 17, 2019
- Finding Unused Resources Impacting Azure Costs - June 10, 2019
- Handling Unused and Unnecessary Resources Impacting Azure Costs - June 5, 2019
One of the challenges we face when using SQL bulk insert from files flat can be concurrency and performance challenges, especially if the load involves a multi-step data flow, where we can’t execute a latter step until we finish with an early step. We also see these optimization challenges with constraints as well, as fewer steps to complete a data flow results in saved time, but possibly less accurate data.Read more »
In this article, we’ll discuss security implications of using SQL Bulk Insert and how to mitigate those risks.Read more »
In the first part of reviewing the basics of bulk insert, we looked at importing entire files, specifying delimiters for rows and columns, and bypassing error messages. Sometimes we’ll want to skip first and ending lines, log errors and bad records for review after inserting data, and work with data types directly without first importing using a varchar and converting to the data type later. In this part, we look at these techniques using T-SQL’s native bulk insert.
We’re facing a challenge with several of our data flows that use more time than they have in the past and we’re not sure when this trend started. We know in the past month, our reports have been delayed by over a day from the start to the finish. For some of our data flows we use SQL Server Agent that calls SSIS packages or procedures, while some of them use a custom data import and reporting application we’ve created. How can we track the length of time for these data flows, since we’re using a combination of tools for importing data?
From troubleshooting many data flow applications designed by others, I’ve seen a common pattern of over complexity with many designs. Putting aside possible risks by introducing too much complexity, troubleshooting these designs often involves opening many different applications – from a notepad file, to SSIS, to SQL Server Management Studio, to a script tool, etc. It may sound like many of these are doing a hundred steps, yet many times, they’re simply importing data from a file, or calling five stored procedures and then a file task of moving a file. This complexity is often unnecessary, as is opening many different tools when we can use a few tools and solve issues faster.Read more »
In this article, we are going to talk about discovering and archiving SQL references to invalid procedures.
We’ve recently had production failures because our developers changed an important reference. In this case, we had a view which several procedures and views referenced. A developer made a change to the referenced view by removing columns, which caused several procedures and a view that referenced it to fail. We’re considering whether we should stop this practice, or if there are other ways we can prevent changes to an object that’s being referenced by other objects (in our case, a view).Read more »
When we’re developing solutions, we can sometimes forget useful commands we can use in T-SQL that make it convenient to remove data, eliminate objects, or carefully remove data. We look at three of these commands with a few examples of where we might consider using them in development, or in rare production cases. While they may offer us speed and convenience in some cases, we also look at some situations where they may not be the best tool to use.Read more »
A client recently discovered a discrepancy on one of our reports that showed an improvement in performance metrics but was inaccurate. Our reports came from a software tool, which showed the average performance throughout the day. It derived this number from periodic checks and the frequency changed, which affected our report. When we showed an improvement in the metrics on a report, the client showed us that the frequency change may have impacted this, not necessarily any improvement in performance. In situations where we’re measuring values and comparing them to other values, how can we prevent a change in measurement from impacting our reports? Read more »
As our company has grown, we’ve recently added developers to our team who want to use open source tools (open source languages and libraries). In the past, we built and used our own custom libraries, but our new developers to prefer to use open source libraries or add new languages that require new libraries. We’re concerned that the use of open source libraries may not be secure and may introduce new inputs and outputs in our system that we don’t fully understand. What should we consider when we think about allowing open source software, tools or languages in our environment from the standpoint of security? Read more »
Our organization must restrict permissions and prove to an independent party that we investigate access and restrict permissions. We restrict permissions to objects using least permissions and give full access to an exceptional few based on an organizational design that follows best practices. What can we do additionally that will help us prevent unauthorized access or catch when someone who shouldn’t have access is able to infiltrate our systems, especially in the context of showing a third party that we track this.Read more »
This article explains data security for accessing sensitive data and restricts access in application using SQL Views and stored procedures.
We recently faced a leak of information for one of our employees that involved data which caused a conflict within our company, even if it was not personally identifiable information (PII data). When we investigated the issue, we uncovered that we need to organize data access for our teams and review who has access to what information. As of right now, all our users either have access to all tables directly or a subset of all our tables. What are some practices we can use in SQL Server to avoid giving direct table access to our users? Read more »
We receive many database alerts with many of the alerts logging some of these same alerts or information to files or tables. What we’ve found over time is that the logging is now costing us quite a bit of resource. Our logging server (where both files and table logging are stored) has had a few outages related to conflicts from messages for other servers. We’ve considered scaling the alerting by environment type, but we’ve also considered that we may be logging too much information about our databases. In addition, since we receive so many alerts each day, it’s impossible for us to resolve them and assist with other issues that arise. What are some techniques that we can use to help us with the issue of too much logging information and too many alerts? Read more »
We have some custom files that we receive from different providers and for these situations we are unable to use standard ETL programs without any customization. Since we’re expanding our ability to read these custom files with .NET, we’re looking for efficient ways to read files with PowerShell that we can use in SQL Server Job Agents, Windows Task Schedulers, or with our custom program, which can execute PowerShell scripts. We have many tools for parsing data and wanted to know efficient ways of reading the data for parsing, along with getting specific lines of data from files by number, or by the first or last line of the file. For reading files efficiently, what are some functions or libraries we can use?Read more »
We manage data in a growing environment where our clients query some of our data, and on occasion will query past data. We do not have an environment that scales and we know that we need to archive some of our data in a way that allows clients to access it, but also doesn’t interfere with current data clients are more interested in querying. With the current data in our environment and new data sets will be using in the future, what are some ways we can archive and scale our environment?Read more »
We recently inherited a database environment where we’re facing significant data growth with limits on the sizes we can allow our databases to grow. Since we maintain multiple development, QA and production environments and these environments must be sized appropriately. We set a few standards about tables that exceed certain sizes – rows, data or both, or have certain growth patterns and our standards force compression at thresholds we set for our different environments (including using page, row, or clustered columnstore index compression) We’re looking for how we can get information about data and compression in tables and options we have so that we can quickly determine candidates that don’t currently match our best practices design we’ve set for our environments.Read more »
This year, digital security has risen as a top concern for many people, especially after the recent security compromises with Cloudflare in February and ransomware in May. Throughout the last two years, we’ve also seen the rise of sim-swapping where a hacker is able to extract a user’s cell phone data and compromise all emails and two-factor authentication. Unfortunately, I expect these security issues to continue and cause compromises for user’s data while negatively impacting companies involved.Read more »
As a database administrator, sometimes you need to identify details about a problem as quickly as possible and being able to build and analyze data for analysis will help you solve the problem. From getting information about the latest backups, to saving information about waits or indexes and comparing that to other captured metrics, we will run into issues where being able to get, store and analyze data are important for decisions to solve urgent problems.Read more »
When we’re architecting or troubleshooting issues in an environment, understanding how the log is set up and its history helps us identify if improving it will carry significant impacts overall. We have some built in tools that we can use, such as DBCC LOGINFO and DBCC SQLPERF and in this tip we look at how we can use them when we’re debugging issues, designing our logs for scale, or monitoring our current setup. We also look at some ways we can retain the information these provide so that we can use this for historic measurements and benchmarking, as well as using them for potentially alerting.Read more »
In multi-user environments, changes may occur frequently to the architecture, data, or overall structure that creates work for other users. In this series, we look at some ways that we can track changes on the data and architecture layer for pin-pointing times, changes, and using the information for alerting, if changes should be kept to a minimum. SQL Server comes with some built-in tools that allow us to monitor changes, and depending on the architecture, we can create tools that allow us also to monitor and identify changes near the time that they occur.Read more »
From time to time, I’ve run into replication issues in inherited environments that I did not architect and some of these environments experienced errors in replication because of how it was constructed from the beginning. In this tip, we look at some of the basics in replication architecture and then at solving some of these problems. Some of the replication issues I’ve seen are caused by misunderstanding what is impossible and possible with replication.Read more »
On occasion, I’ll see waits that exceed what I expect well above normal and a few of them have some architecture and standards to consider following when troubleshooting, though like most waits’ issues, there can be other underlying factors that are happening as well. In this article, I investigate the three waits ASYNC_NETWORK_IO and WRITELOG. In general, waits vary by environment and server, so before reading this article an immediate question to ask is, “Do you know what’s normal for yours?” When a wait suddenly spikes, or if the architecture is designed in a manner that should prevent a specific wait from consuming time, and yet you see that the wait does, I would be concerned. In addition, because applications and environments differ by architecture, you may want to consider other troubleshooting steps, as these may not apply to your situations.Read more »
One of the most popular complaints from developers to DBAs involves alerting, whether from third party tools or alerting built by other developers or DBAs in the environment. Building or using alerts for important applications, data layers, or processes within a SQL Server environment offer everyone benefits, but can become noisy if they’re architected poorly, or the purpose isn’t considered. In this article, we look at considerations for building effective alerts that tell us when something is wrong without creating situations where we learn to disregard them. We want to make sure that we respond when we need to, and not always be on high alert when there is no issue.Read more »
If an environment chooses to use snapshot or transactional replication, one useful exercise is to ask the technical end user (or client) what they think replication does. If you have access to a white board, you can even ask them to demonstrate what they think replication will do for their data. Generally, these technical end users will plot something similar to the below image, where we see a table with data being copied to another table with data.Read more »