In Python, using mutable default arguments in function definitions is considered an anti-pattern because it can lead to unintended behaviour.
To showcase this let’s write a simple function that appends a number to a list. Notice that my_list
defaults to an empty list.
def append_to_list(number, my_list=[]):
my_list.append(number)
return my_list
list1 = append_to_list(1)
print(list1)
list2 = append_to_list(2)
print(list2)
One might expect that a new empty list is created on each function call, with list1
and list2
being separate lists, and the output to be:
[1]
[2]
What actually happens is that Python creates a new list only once, when the function is defined. It then reuses the same list on subsequent calls, so list1
and list2
are, in fact, the same list object. The actual output is:
[1]
[1, 2]
The recommended approach is to default my_list
to None
and conditionally assign the empty list object inside the function body. This way, Python creates a new list object on every function call.
def append_to_list(number, my_list=None):
if my_list is None:
my_list = []
my_list.append(number)
return my_list
list1 = append_to_list(1)
print(list1)
list2 = append_to_list(2)
print(list2)
Now list1
and list2
are separate list objects and the output is:
[1]
[2]
My own experience
I had to deal with a similar situation when writing an AWS Lambda function that would read an S3 file in chunks, validate the data, and then perform an S3 multipart upload. The code looked something like this (simplified for illustration purposes):
class S3MultipartUploader:
def __init__(self, parts=[]):
self.parts = parts
self.s3_client = boto3.client('s3')
def upload_part(self, part):
part_number = len(self.parts) + 1
response = self.s3_client.upload_part(
Body=part, PartNumber=part_number
)
self.parts.append({'PartNumber': part_number, 'ETag': response['ETag'], })
def complete_upload(self):
return self.s3_client.complete_multipart_upload(
MultipartUpload={'Parts': self.parts, }
)
def lambda_handler(event, context):
multipart_uploader = S3MultipartUploader()
for chunk in read_s3_file():
multipart_uploader.upload_part(chunk)
multipart_uploader.complete_upload()
The code defines a S3MultipartUploader
class that uploads data in chunks, and then completes the upload. And it was working great, until it didn’t. At random times the Lambda would throw the following error:
"An error occurred (InvalidPart) when calling the CompleteMultipartUpload operation: One or more of the specified parts could not be found."
. After some debugging, I discovered that the issue originated from a combination of mutable default arguments and Lambda container reusage.
The __init__
method was using a mutable default argument by setting parts
to an empty list. When the Lambda had a cold start in a new container, multipart_uploader
was instantiated with parts
as an empty list and there were no issues. However, when the Lambda was reusing a previous container, parts
retained data from that container. As a result, multipart_uploader
was instantiated with leftover data from a previous execution context, similar to our first example where list2 = append_to_list(2)
was reusing a non-empty my_list
that was mutated in a previous function call. This led to multipart_uploader
attempting to complete the upload with parts that didn’t exist in the current execution context, causing the Lambda to fail.
The fix was easy:
def __init__(self, parts=None):
if parts is None:
parts = []
Conclusion
Mutable default arguments are rarely needed, if ever. If you want your code to behave as you intend, avoid hours of debugging unexpected behavior, and ensure your peace of mind, it’s best to avoid them and follow best practices. The alternative presented above is a reliable, battle-tested approach.